v1.0.0#15
Conversation
cyclone-github
commented
May 22, 2026
- added flag "-text-match" to filter page text matches
- memory and performance optimizations for -file and -url modes
- -file mode streams wordlists from disk instead of loading entire files into RAM
- reduced RAM usage for large -sort wordlists
- default -timeout increased from 1 to 10 seconds
- progress bars, stats, and errors now write to stderr
- sanitize url fragments for dedup and extension checks
- updated default User-Agent
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28de35e28d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func countNgramsFromStream(r io.Reader, fileSize int64, ngramMin, ngramMax int, uniqueWords map[string]bool, ngramCounts map[string]int, trackUnique bool, progress func(processed, total int)) error { | ||
| cr := &countingReader{r: r} | ||
| scanner := bufio.NewScanner(cr) | ||
| scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024) |
There was a problem hiding this comment.
Remove hard 1MB word cap in file streaming mode
Using bufio.Scanner with scanner.Buffer(..., 1024*1024) introduces a hard 1MB maximum token size in -file mode, so any input containing a single whitespace-delimited token larger than 1MB (for example long base64 blobs, minified assets, or machine-generated text) now fails with bufio.Scanner: token too long and exits. This is a regression from the previous os.ReadFile + strings.Fields path, which did not impose this per-token cap, and it can break real-world large text processing unexpectedly.
Useful? React with 👍 / 👎.