Add cache and parallel run to Git Count Contributors#701
Add cache and parallel run to Git Count Contributors#701
Conversation
| log.Info(logPrefix + fmt.Sprintf("Done scanning repository %q: %d commits found", repo, result.totalCommits)) | ||
|
|
||
| // Persist to cache (even skipped=false means success here). | ||
| if cacheDir != "" { |
There was a problem hiding this comment.
| if cacheDir != "" { | |
| if cacheDir != "" && cacheValidity > 0 { |
Step 1 — buildRepoScanTasks (line 356):
if vcc.params.CacheValidity >= 0 { // 0 >= 0 → true
cacheDir = dir // cacheDir gets set to a real path
}
cacheValidity = 0 * 24 * time.Hour // = 0
Step 2 — scanRepo tries to read cache:
if cached := readRepoCache(cacheDir, repo, cacheValidity=0); ...
// Inside readRepoCache:
if maxAge <= 0 { return nil } // 0 <= 0 → returns nil immediately
So nothing is read from cache. ✓ correct so far.
Step 3 — scanRepo does a full scan, then tries to write cache:
if cacheDir != "" { // cacheDir was set in step 1 → true
writeRepoCache(...) // writes results to disk
}
So the user said "skip cache", got a full re-scan as expected, but their disk now has fresh cache files written to it that will never be read (since they always pass --cache-validity 0). On the next
run the same thing happens — full scan again, write again, never read.
eranturgeman
left a comment
There was a problem hiding this comment.
LGTM! see my comments

feat(git): Add parallel scanning and caching to Git Count Contributors
depends on:
Summary
Introduces parallel repository scanning and a file-based cache layer to the
git count-contributorscommand, significantly improving performance for large organizations with many repositories. Repositories are now scanned concurrently (configurable via--threads, default 10) and results are cached to disk (configurable via--cache-validity, default 3 days) to avoid redundant API calls on repeated runs.Changes
scanAndCollectCommitsInfoto scan repositories concurrently usinggofrog/parallel.Runner. Each repo is scanned independently, and results are merged after all tasks complete.cache.gomodule persists per-repo scan results as JSON files under~/.jfrog/contributors-cache/<hash>/. Cache entries are keyed by a SHA256 hash of (scm-type, scm-api-url, owner, months) to avoid collisions. Writes are atomic (tmp + rename).--cache-validity— number of days a cached result remains valid (0 = skip cache, default 3).--threads— number of parallel threads for scanning (default 10).repoScanResultandvcsServerScanResultstructs to cleanly aggregate parallel results, along withmergeContributors,mergeDetailedContributors, andmergeDetailedReposhelper functions.GetContributorsCacheDir()inutils/paths.go.mockVcsClienttest double.Files changed
cli/docs/flags.goCacheValidity,GitThreadsflagscli/gitcommands.gocommands/git/contributors/cache.gocommands/git/contributors/cache_test.gocommands/git/contributors/countcontributors.gocommands/git/contributors/countcontributors_test.gocommands/git/contributors/mock_vcs_client_test.goutils/paths.goGetContributorsCacheDirhelperTesting
Notes
monthsparameter in the key, so changing--monthsautomatically invalidates stale cache.--cache-validity 0fully bypasses caching (no file I/O at all).devbranch.go vet ./....go fmt ./....