refactor(hash): extract jsonMD5Raw helper and fix DeterministicUUID#308
refactor(hash): extract jsonMD5Raw helper and fix DeterministicUUID#308
Conversation
…o use raw bytes Extract jsonMD5Raw as an internal helper function to compute MD5 digest without hex encoding. This allows DeterministicUUID to use the raw 16-byte digest directly instead of incorrectly parsing hex-encoded strings as UUID bytes, fixing a regression where UUID bytes were ASCII codes of hex digits. Also add comprehensive test coverage for JSONMD5Hash and DeterministicUUID, including passthrough behavior for UUID values, pointers, byte arrays, and strings. Update CI workflow to use gavel for testing and linting with PR comment support. Fixes incorrect UUID generation from non-UUID seeds.
WalkthroughUpdates the test workflow to use the Gavel action for testing and reporting, refactors hash computation to separate raw MD5 digest generation from UUID derivation with UUID passthrough detection, adds comprehensive test coverage for hash functions, and makes minor configuration and formatting adjustments to Changes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Gavel summary
Totals: 358 passed · 0 failed · 11 skipped · 6.4s |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test.yml:
- Around line 16-18: The workflow currently grants pull-requests: write while
calling flanksource/gavel@main with version: latest; pin the Gavel action and
its tool version to immutable refs (e.g., replace flanksource/gavel@main with a
specific tag or commit SHA and set version: to a fixed released semver or
digest) so the step that uses the action and the version field are both
immutable; update the action reference and the version value in the workflow
where flanksource/gavel and version: latest are declared.
In `@hash/hash_test.go`:
- Around line 27-32: The determinism test TestJSONMD5Hash_Deterministic (and the
similar test around lines 67-80) currently compares returned hashes without
checking errors, which can mask failures when both calls return an error and
zero values; update these tests to check the returned error from JSONMD5Hash and
t.Fatalf on non-nil errors before comparing a and b, i.e., fail fast if err !=
nil for either call (reference JSONMD5Hash, TestJSONMD5Hash_Deterministic and
the other affected test) so the comparison only runs when both calls succeeded.
In `@hash/hash.go`:
- Around line 167-170: The string branch currently treats any 32-char hex string
as a UUID because uuid.Parse accepts compact hex; update the passthrough in
DeterministicUUID (hash.go) to only return the parsed uuid when the parsed
UUID's canonical string equals the input (use case-insensitive comparison, e.g.,
strings.EqualFold(id.String(), v)), otherwise fall through to hashing; also add
the provided TestDeterministicUUID_CompactHexStringStillHashes unit test to
ensure compact 32-char hex strings are hashed, not passed through.
🪄 Autofix (Beta)
❌ Autofix failed (check again to retry)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6f0f8c0a-ddbc-4ec3-9cf8-f9dadb1fe99a
📒 Files selected for processing (5)
.github/workflows/test.yml.gitignorehar/pretty.gohash/hash.gohash/hash_test.go
| permissions: | ||
| contents: read | ||
| pull-requests: write # required when comment=true so the action can post/update the sticky PR comment |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
#!/bin/bash
set -euo pipefail
# Expect: no matches for mutable action refs or latest tool versions.
rg -n --glob '.github/workflows/*.yml' --glob '.github/workflows/*.yaml' \
'uses:\s+[^#]+@(main|master|latest)\b|version:\s*["'\'']?latest["'\'']?\b'Repository: flanksource/commons
Repository: flanksource/commons
Exit code: 2
stderr:
No files were searched, which means ripgrep probably applied a filter you didn't expect.
Running with --debug will show why files are being skipped.
Script executed:
find . -type f -name "*.yml" -o -name "*.yaml" | head -20Repository: flanksource/commons
Repository: flanksource/commons
Exit code: 0
stdout:
./.github/dependabot.yml
./.github/workflows/codeql.yml
./.github/workflows/hx-test.yml
./.github/workflows/lint.yml
./.github/workflows/release.yml
./.github/workflows/scorecard.yml
./.github/workflows/test.yml
./.golangci.yml
./certs/fixtures/file.yml
./certs/fixtures/literal.yml
./certs/fixtures/remote.yml
./http/testdata/docker-compose.yml
Script executed:
ls -la .github/workflows/ 2>/dev/null || echo "Directory not found"Repository: flanksource/commons
Repository: flanksource/commons
Exit code: 0
stdout:
total 28
drwxr-xr-x 2 jailuser jailuser 160 Apr 22 05:15 .
drwxr-xr-x 3 jailuser jailuser 80 Apr 22 05:15 ..
-rw-r--r-- 1 jailuser jailuser 2513 Apr 22 05:15 codeql.yml
-rw-r--r-- 1 jailuser jailuser 830 Apr 22 05:15 hx-test.yml
-rw-r--r-- 1 jailuser jailuser 709 Apr 22 05:15 lint.yml
-rw-r--r-- 1 jailuser jailuser 5568 Apr 22 05:15 release.yml
-rw-r--r-- 1 jailuser jailuser 3172 Apr 22 05:15 scorecard.yml
-rw-r--r-- 1 jailuser jailuser 1250 Apr 22 05:15 test.yml
Script executed:
cat -n .github/workflows/test.yml 2>/dev/null | head -50Repository: flanksource/commons
Repository: flanksource/commons
Exit code: 0
stdout:
1 on:
2 push:
3 branches:
4 - master
5 pull_request:
6 types: [opened, synchronize, reopened]
7 name: Test
8 jobs:
9 test:
10 strategy:
11 matrix:
12 platform: [ubuntu-latest]
13 go-version:
14 - 1.25.x
15 runs-on: ${{ matrix.platform }}
16 permissions:
17 contents: read
18 pull-requests: write # required when comment=true so the action can post/update the sticky PR comment
19 steps:
20 - name: Harden the runner (Audit all outbound calls)
21 uses: step-security/harden-runner@f4a75cfd619ee5ce8d5b864b0d183aff3c69b55a # v2.13.1
22 with:
23 egress-policy: audit
24
25 - name: Install Go
26 uses: buildjet/setup-go@555ce355a95ff01018ffcf8fbbd9c44654db8374 # v5.0.2
27 with:
28 go-version: ${{ matrix.go-version }}
29 - name: Checkout code
30 uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
31
32 - uses: flanksource/gavel@main
33 with:
34 args: test --lint
35 version: latest
36 json-file: gavel-results.json
37 html-file: gavel-results.html
38 summary-file: gavel-summary.md
39 artifact-name: gavel-results
40 comment: "true"
41 comment-header: gavel
42 fail-on-error: "true"
Script executed:
cd .github/workflows && for f in *.yml; do echo "=== $f ==="; grep -n '@main\|@master\|version:.*latest' "$f" || echo "No mutable refs found"; doneRepository: flanksource/commons
Repository: flanksource/commons
Exit code: 0
stdout:
=== codeql.yml ===
No mutable refs found
=== hx-test.yml ===
No mutable refs found
=== lint.yml ===
No mutable refs found
=== release.yml ===
No mutable refs found
=== scorecard.yml ===
No mutable refs found
=== test.yml ===
32: - uses: flanksource/gavel@main
35: version: latest
Pin the Gavel action and tool version before granting write permissions.
This job grants pull-requests: write (line 18) but runs flanksource/gavel@main (line 32) with version: latest (line 35). Pin both to immutable references to prevent supply-chain risk when using a write-scoped token.
🔒 Proposed hardening
- - uses: flanksource/gavel@main
+ - uses: flanksource/gavel@<commit-sha-or-release-tag>
with:
args: test --lint
- version: latest
+ version: <pinned-version>Other actions in this workflow are pinned to commit SHAs; apply the same hardening to Gavel.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/test.yml around lines 16 - 18, The workflow currently
grants pull-requests: write while calling flanksource/gavel@main with version:
latest; pin the Gavel action and its tool version to immutable refs (e.g.,
replace flanksource/gavel@main with a specific tag or commit SHA and set
version: to a fixed released semver or digest) so the step that uses the action
and the version field are both immutable; update the action reference and the
version value in the workflow where flanksource/gavel and version: latest are
declared.
| func TestJSONMD5Hash_Deterministic(t *testing.T) { | ||
| a, _ := JSONMD5Hash(map[string]string{"k": "v"}) | ||
| b, _ := JSONMD5Hash(map[string]string{"k": "v"}) | ||
| if a != b { | ||
| t.Fatalf("expected deterministic hash, got %q vs %q", a, b) | ||
| } |
There was a problem hiding this comment.
Check errors in determinism tests before comparing zero values.
These tests can pass falsely if both calls return an error and the zero value. The other tests already fail fast on unexpected errors; do the same here.
✅ Proposed test hardening
func TestJSONMD5Hash_Deterministic(t *testing.T) {
- a, _ := JSONMD5Hash(map[string]string{"k": "v"})
- b, _ := JSONMD5Hash(map[string]string{"k": "v"})
+ a, err := JSONMD5Hash(map[string]string{"k": "v"})
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+ b, err := JSONMD5Hash(map[string]string{"k": "v"})
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
if a != b {
t.Fatalf("expected deterministic hash, got %q vs %q", a, b)
}
} func TestDeterministicUUID_Deterministic(t *testing.T) {
- a, _ := DeterministicUUID("same-seed")
- b, _ := DeterministicUUID("same-seed")
+ a, err := DeterministicUUID("same-seed")
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+ b, err := DeterministicUUID("same-seed")
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
if a != b {
t.Fatalf("expected deterministic UUID, got %q vs %q", a, b)
}
}
func TestDeterministicUUID_DifferentSeedsProduceDifferentUUIDs(t *testing.T) {
- a, _ := DeterministicUUID("seed-one")
- b, _ := DeterministicUUID("seed-two")
+ a, err := DeterministicUUID("seed-one")
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+ b, err := DeterministicUUID("seed-two")
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
if a == b {
t.Fatalf("expected distinct UUIDs for distinct seeds, got %q twice", a)
}
}Also applies to: 67-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hash/hash_test.go` around lines 27 - 32, The determinism test
TestJSONMD5Hash_Deterministic (and the similar test around lines 67-80)
currently compares returned hashes without checking errors, which can mask
failures when both calls return an error and zero values; update these tests to
check the returned error from JSONMD5Hash and t.Fatalf on non-nil errors before
comparing a and b, i.e., fail fast if err != nil for either call (reference
JSONMD5Hash, TestJSONMD5Hash_Deterministic and the other affected test) so the
comparison only runs when both calls succeeded.
| case string: | ||
| if id, err := uuid.Parse(v); err == nil { | ||
| return id, true | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Expect: existing tests do not cover 32-char compact hex string behavior yet.
rg -n -C3 --glob '**/*_test.go' \
'32|compact|[[:xdigit:]]{32}|NonUUIDStringStillHashes|DeterministicUUID' hashRepository: flanksource/commons
Length of output: 6466
🏁 Script executed:
cat -n hash/hash.go | sed -n '150,180p'Repository: flanksource/commons
Length of output: 664
🏁 Script executed:
rg -n 'uuid.Parse' hash/hash.go -A 2 -B 2Repository: flanksource/commons
Length of output: 167
🏁 Script executed:
rg -n 'compact\|32.*hex\|[a-f0-9]{32}' hash/hash_test.goRepository: flanksource/commons
Length of output: 45
🏁 Script executed:
cat > /tmp/test_uuid.go << 'EOF'
package main
import (
"fmt"
"github.com/google/uuid"
)
func main() {
// Test with canonical UUID format
canonical := "550e8400-e29b-41d4-a716-446655440000"
id1, err := uuid.Parse(canonical)
fmt.Printf("Canonical UUID parse: err=%v, String()=%s\n", err, id1.String())
fmt.Printf("Canonical == input: %v\n", id1.String() == canonical)
// Test with 32-char compact hex (no hyphens)
compact := "550e8400e29b41d4a716446655440000"
id2, err := uuid.Parse(compact)
fmt.Printf("\nCompact hex parse: err=%v, String()=%s\n", err, id2.String())
fmt.Printf("Compact == input: %v\n", id2.String() == compact)
// Test with a valid MD5-like hex string
md5Like := "5d41402abc4b2a76b9719d911017c592"
id3, err := uuid.Parse(md5Like)
fmt.Printf("\nMD5-like hex parse: err=%v, String()=%s\n", err, id3.String())
fmt.Printf("MD5-like == input: %v\n", id3.String() == md5Like)
// Show that both canonical and compact parse to same UUID
fmt.Printf("\nCanonical UUID == Compact UUID: %v\n", id1 == id2)
}
EOF
cd /tmp && go run test_uuid.goRepository: flanksource/commons
Length of output: 230
🏁 Script executed:
cat > /tmp/test_uuid.go << 'EOF'
package main
import (
"fmt"
"github.com/google/uuid"
)
func main() {
// Test with canonical UUID format
canonical := "550e8400-e29b-41d4-a716-446655440000"
id1, err := uuid.Parse(canonical)
fmt.Printf("Canonical UUID parse: err=%v, String()=%s\n", err, id1.String())
fmt.Printf("Canonical == input: %v\n", id1.String() == canonical)
// Test with 32-char compact hex (no hyphens)
compact := "550e8400e29b41d4a716446655440000"
id2, err := uuid.Parse(compact)
fmt.Printf("\nCompact hex parse: err=%v, String()=%s\n", err, id2.String())
fmt.Printf("Compact == input: %v\n", id2.String() == compact)
// Test with a valid MD5-like hex string
md5Like := "5d41402abc4b2a76b9719d911017c592"
id3, err := uuid.Parse(md5Like)
fmt.Printf("\nMD5-like hex parse: err=%v, String()=%s\n", err, id3.String())
fmt.Printf("MD5-like == input: %v\n", id3.String() == md5Like)
// Show that both canonical and compact parse to same UUID
fmt.Printf("\nCanonical UUID == Compact UUID: %v\n", id1 == id2)
fmt.Printf("Canonical String() == Compact String(): %v\n", id1.String() == id2.String())
}
EOF
cd hash && go run /tmp/test_uuid.goRepository: flanksource/commons
Length of output: 516
🏁 Script executed:
cat > /tmp/test_impact.go << 'EOF'
package main
import (
"fmt"
"github.com/google/uuid"
"strings"
)
func testAsUUID(seed any, withFix bool) (uuid.UUID, bool) {
switch v := seed.(type) {
case string:
if id, err := uuid.Parse(v); err == nil {
// Current code (no fix)
if !withFix {
return id, true
}
// Proposed fix
if strings.EqualFold(id.String(), v) {
return id, true
}
}
}
return uuid.Nil, false
}
func main() {
// Test with 32-char MD5-like hex (should NOT be treated as UUID)
md5Like := "5d41402abc4b2a76b9719d911017c592"
// Current behavior (without fix)
id, ok := testAsUUID(md5Like, false)
fmt.Printf("Current code with MD5-like hex: ok=%v, id=%s\n", ok, id)
// With proposed fix
id, ok = testAsUUID(md5Like, true)
fmt.Printf("Proposed fix with MD5-like hex: ok=%v, id=%s\n", ok, id)
// Test with canonical UUID (should still pass through)
canonical := "550e8400-e29b-41d4-a716-446655440000"
// Current behavior
id, ok = testAsUUID(canonical, false)
fmt.Printf("Current code with canonical UUID: ok=%v, id=%s\n", ok, id)
// With proposed fix
id, ok = testAsUUID(canonical, true)
fmt.Printf("Proposed fix with canonical UUID: ok=%v, id=%s\n", ok, id)
// Test with compact hex UUID (should NOT pass through without hyphens)
compact := "550e8400e29b41d4a716446655440000"
// Current behavior
id, ok = testAsUUID(compact, false)
fmt.Printf("Current code with compact hex: ok=%v, id=%s\n", ok, id)
// With proposed fix
id, ok = testAsUUID(compact, true)
fmt.Printf("Proposed fix with compact hex: ok=%v, id=%s\n", ok, id)
}
EOF
cd hash && go run /tmp/test_impact.goRepository: flanksource/commons
Length of output: 552
🏁 Script executed:
# Check if there are any tests that would fail if we reject compact hex
rg -n 'compact\|32.*hex\|without.*hyphen' hash/hash_test.goRepository: flanksource/commons
Length of output: 45
🏁 Script executed:
# Check if DeterministicUUID currently has a test with MD5-like hex string
rg -n '5d41402abc4b2a76b9719d911017c592|[a-f0-9]{32}' hash/hash_test.goRepository: flanksource/commons
Length of output: 45
🏁 Script executed:
# Let's see the full context of the TestDeterministicUUID_NonUUIDStringStillHashes test
sed -n '173,184p' hash/hash_test.goRepository: flanksource/commons
Length of output: 412
Add EqualFold validation to prevent 32-character hex strings from bypassing hashing.
uuid.Parse accepts raw 32-character hex strings without hyphens, causing MD5 hashes and other 32-char hex inputs to be treated as UUID passthroughs instead of being hashed. This allows non-UUID seed data to incorrectly bypass the hashing logic.
Restrict the string-to-UUID passthrough to canonical UUID format by validating that the parsed UUID's string representation matches the input:
Proposed fix
import (
"crypto/md5"
"crypto/sha256"
"encoding/hex"
"encoding/json"
+ "strings"
"github.com/google/uuid"
) case string:
- if id, err := uuid.Parse(v); err == nil {
+ if id, err := uuid.Parse(v); err == nil && strings.EqualFold(id.String(), v) {
return id, true
}Add a test case for compact hex strings to prevent regression:
func TestDeterministicUUID_CompactHexStringStillHashes(t *testing.T) {
// Ensure 32-char hex strings (e.g., MD5 hashes) without hyphens are hashed, not treated as UUIDs
compactHex := "5d41402abc4b2a76b9719d911017c592"
got, err := DeterministicUUID(compactHex)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
data, _ := json.Marshal(compactHex)
sum := md5.Sum(data)
if !bytes.Equal(got[:], sum[:]) {
t.Fatalf("compact hex strings must hash; want %x, got %x", sum, got[:])
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hash/hash.go` around lines 167 - 170, The string branch currently treats any
32-char hex string as a UUID because uuid.Parse accepts compact hex; update the
passthrough in DeterministicUUID (hash.go) to only return the parsed uuid when
the parsed UUID's canonical string equals the input (use case-insensitive
comparison, e.g., strings.EqualFold(id.String(), v)), otherwise fall through to
hashing; also add the provided TestDeterministicUUID_CompactHexStringStillHashes
unit test to ensure compact 32-char hex strings are hashed, not passed through.
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. ⛔ CodeRabbit does not have permission to push changes to this repository. Please ensure CodeRabbit has Contents: Read and write permission. |
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. ⛔ CodeRabbit does not have permission to push changes to this repository. Please ensure CodeRabbit has Contents: Read and write permission. |
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. ⛔ CodeRabbit does not have permission to push changes to this repository. Please ensure CodeRabbit has Contents: Read and write permission. |
…o use raw bytes
Extract jsonMD5Raw as an internal helper function to compute MD5 digest without hex encoding. This allows DeterministicUUID to use the raw 16-byte digest directly instead of incorrectly parsing hex-encoded strings as UUID bytes, fixing a regression where UUID bytes were ASCII codes of hex digits.
Also add comprehensive test coverage for JSONMD5Hash and DeterministicUUID, including passthrough behavior for UUID values, pointers, byte arrays, and strings. Update CI workflow to use gavel for testing and linting with PR comment support.
Fixes incorrect UUID generation from non-UUID seeds.
Summary by CodeRabbit
Release Notes
Improvements
Chores
Tests