Skip to content

feat(cli): add samples upload-batch from a CSV sample sheet#16

Merged
mhusbynflow merged 2 commits into
masterfrom
mhusbynflow/samples-py-comment
Jun 10, 2026
Merged

feat(cli): add samples upload-batch from a CSV sample sheet#16
mhusbynflow merged 2 commits into
masterfrom
mhusbynflow/samples-py-comment

Conversation

@mhusbynflow

Copy link
Copy Markdown
Collaborator

What changed

Implements User Story 4 (Phase 7) of the flowbio CLI — flowbio samples upload-batch --sheet x.csv --sample-type T: upload many samples from a CSV sample sheet, applying one sample type to every row.

  • flowbio/cli/_sheet.py (new) — CSV-only parsing: separates reserved columns (name, reads1, reads2, project, organism) from metadata identifiers, resolves reads paths relative to the sheet's own directory, drops empty cells, and validate_row() collects all per-row pre-flight errors up front (missing required values, reads files not on disk, names with spaces, closed-option violations, required-for-type metadata, annotation-companion rules).
  • flowbio/cli/_samples.pyupload-batch handler + _BatchResult: validate every row before any upload, then upload valid rows sequentially in sheet order. --skip-invalid skips invalid rows instead of aborting; --stop-on-error aborts on the first upload failure. The __annotation suffix is now a single shared constant used by both the batch-template producer and the sheet parser.
  • docs/cli.mdsamples upload-batch section (concepts + worked example).

Exit-code / output contract

  • 0 all uploaded · 2 pre-flight validation failure or non-CSV sheet · 1 any upload failure.
  • --json emits the {uploaded, failed, skipped, counts} document on stdout; the pre-flight abort writes a JSON error document to stderr with stdout empty.
  • Sample type is sent as-is (validated server-side), per the CLI's "server is source of truth" convention.

Verification

  • Test-first throughout. tests/unit/cli/test_sheet.py (17) + TestSamplesUploadBatch in test_samples.py (10, covering all 7 US4 scenarios incl. JSON-mode abort).
  • Full CLI + v2 samples unit suite: 166 passed.
  • Passed an adversarial review (1 round, approved): added a test pinning "no server pre-validation of --sample-type", and unified the __annotation suffix constant.

Out of scope

  • Phase 8 (Polish): consolidated docs overview, uvx install-closure check, edge-case regression tests, quickstart end-to-end run.
  • No flowbio/v2 library changes.

🤖 Generated with Claude Code

Implements User Story 4 (Phase 7) of the flowbio CLI: upload many samples
from a CSV sample sheet, applying one sample type to every row.

Adds `flowbio/cli/_sheet.py` — CSV-only parsing that separates reserved
columns from metadata identifiers, resolves reads paths relative to the
sheet's own directory, drops empty cells, and collects every per-row
pre-flight error up front (missing required values, reads files not on
disk, names with spaces, closed-option violations, required-for-type
metadata, and annotation-companion rules) so a batch never half-uploads
before surfacing a fixable mistake.

Adds the `upload-batch` handler and `_BatchResult` to `_samples.py`:
validate all rows first, then upload the valid rows sequentially in sheet
order. `--skip-invalid` skips invalid rows instead of aborting;
`--stop-on-error` aborts on the first upload failure. Exit codes follow the
contract — 0 all uploaded, 2 pre-flight failure (or non-CSV sheet), 1 any
upload failure — and `--json` emits the uploaded/failed/skipped/counts
document on stdout while the pre-flight abort writes a JSON error document
to stderr.

The `__annotation` suffix is now a single shared constant in `_sheet.py`,
used by both the batch-template producer and the sheet parser so the sheet
round-trips without silent divergence.

Sample type is sent as-is (validated server-side), consistent with the
CLI's "server is source of truth" convention.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mhusbynflow

Copy link
Copy Markdown
Collaborator Author

@claude please review this PR

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Claude finished @mhusbynflow's task in 2m 3s —— View job


PR Review: samples upload-batch

  • Read changed files (_sheet.py, _samples.py, tests, docs)
  • Review for correctness, edge cases, and bugs
  • Check adherence to CLAUDE.md conventions (typing, comments, testing)
  • Post review feedback

Overall this is a clean, well-structured change. The separation of CSV parsing (_sheet.py) from the command handler (_samples.py) is tidy, the validate_row "collect every error up front" approach matches FR-028, the docstrings explain why rather than narrate flow, and the test coverage is thorough (all 7 US4 scenarios + the nice negative test pinning "no server pre-validation of --sample-type"). Sharing ANNOTATION_SUFFIX between the template producer and the parser is the right call.

A few things worth considering before merge.

1. UTF-8 BOM will break Excel-exported CSVs (likely real-world bug)

flowbio/cli/_sheet.py:68

with path.open(newline="") as handle:

The entire workflow is "fill in the batch-template output in a spreadsheet, export to CSV" — and the error message in parse_sheet literally says "Export your spreadsheet to CSV first." Excel's "CSV UTF-8" export (and many other tools) prepends a UTF-8 BOM. Path.open() with the default encoding does not strip it, so the first header becomes "name". That falls out of RESERVED_COLUMNS, so:

  • name is treated as a metadata column,
  • every row reports missing required value: name,
  • and a spurious name field gets forwarded to the server.

The user did everything right and gets a confusing wall of validation errors. Opening with encoding="utf-8-sig" transparently strips a BOM if present and is otherwise identical to plain UTF-8:

with path.open(newline="", encoding="utf-8-sig") as handle:

Worth a regression test writing a sheet with a leading ``. Fix this →

2. Duplicate sample names within a sheet aren't detected

validate_row validates each row independently, so two rows with the same name both pass pre-flight and both upload. Since the per-row philosophy here is otherwise "catch fixable mistakes before any upload," a duplicate name is exactly the kind of mistake a user would want surfaced up front rather than after a half-batch. If the server rejects duplicates this is just a worse error path; if it doesn't, you get silent collisions. Could be intentional ("server is source of truth") — if so, fine to leave, but worth a deliberate decision.

3. Unknown / mistyped metadata columns pass silently

A metadata column that matches no server attribute (e.g. a typo cell_typ) is never flagged by _metadata_errors and gets forwarded as a form field. A typo on a required attribute is caught indirectly (the real attribute reports "missing required metadata"), but a typo on an optional one is silently sent. This is consistent with the server-is-truth convention, so probably acceptable — flagging mainly so it's a conscious choice.

4. Minor

  • _sheet.py:161 — the closed-option condition is a long single line; splitting the if would match the wrap style used elsewhere in the file.
  • Empty sheet (header only, no rows) returns exit 0 with Uploaded 0, failed 0, skipped 0. — harmless, but a one-line advisory ("sheet contained no data rows") might save a confused user.
  • Slight UX asymmetry: the batch path rejects names-with-spaces client-side (_sheet.py:100) while samples upload defers to the server (_samples.py:87). The batch behavior is arguably better; just noting the inconsistency.

Conventions check

Typing, comment discipline (why-not-what), and test-through-public-surface all look consistent with CLAUDE.md. Nothing blocking there.

None of these are blockers except possibly #1, which I'd recommend fixing given how directly it hits the documented happy path.
· mhusbynflow/samples-py-comment

Spreadsheet tools — notably Excel's "CSV UTF-8" export — prepend a UTF-8
BOM. parse_sheet opened the file with the default encoding, so the BOM
survived and the first header parsed as "name"; that fell out of the
reserved-column set, so `name` was treated as a metadata column and every
row reported "missing required value: name" on the documented happy path
(fill in the template in a spreadsheet, export to CSV).

Open with utf-8-sig, which transparently strips a leading BOM and is
otherwise identical to UTF-8. Also wrap the closed-option validation
condition to match the file's wrap style.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mhusbynflow mhusbynflow merged commit 297d3e1 into master Jun 10, 2026
4 checks passed
@mhusbynflow mhusbynflow deleted the mhusbynflow/samples-py-comment branch June 10, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant