Skip to content

feat(cli): add samples batch-template command#14

Merged
mhusbynflow merged 8 commits into
masterfrom
mhusbynflow/samples-comment-review
Jun 10, 2026
Merged

feat(cli): add samples batch-template command#14
mhusbynflow merged 8 commits into
masterfrom
mhusbynflow/samples-comment-review

Conversation

@mhusbynflow

Copy link
Copy Markdown
Collaborator

What

Implements User Story 3 of the flowbio CLI: flowbio samples batch-template --sample-type T.

  • Human mode: a CSV header — reserved columns (name,reads1,reads2,project,organism), then one column per metadata attribute, each followed by a <id>__annotation companion when the attribute allows annotation. There is no sample_type column (the type is supplied via --sample-type). A required-vs-optional summary is written to stderr.
  • --json mode: a per-column descriptor list (name, kind, required, options, description) on stdout and no CSV, so an agent can build rows directly.
  • -o/--output PATH: writes the CSV template to a file.

The one library change is the additive MetadataAttribute.allow_annotation field, which mirrors the /samples/metadata payload key directly (defaults to False when absent, so it is backwards-compatible).

Verification

  • Test-first: 7 new tests (tests/unit/v2/test_samples.py, tests/unit/cli/test_samples.py); full suite 230 passed.
  • Smoke-tested live against https://app.flow.bio/api — confirmed the annotation companions resolve and strandedness derives as required for rna_seq.

Out of scope

User Story 5 (annotation-template / upload-multiplexed, tasks T020–T024) and User Story 4 (upload-batch) remain unimplemented; this PR is Phase 6 only.

🤖 Generated with Claude Code

@mhusbynflow

Copy link
Copy Markdown
Collaborator Author

@claude please review this PR

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown

Claude finished @mhusbynflow's task in 2m 3s —— View job


Review: samples batch-template (PR #14)

  • Read changed files & gather context
  • Review library change (MetadataAttribute.allow_annotation)
  • Review CLI command (_samples.py)
  • Review tests
  • Check docs & spec consistency

Solid, focused PR. The allow_annotation addition is genuinely additive and backwards-compatible, the registration refactor is clean, and the test matrix covers the human/--json/--output permutations well. A few things worth addressing before merge — one substantive, the rest minor.


🟠 The --sample-type is never validated — but the docs say it is

_batch_template_command only calls client.samples.get_metadata_attributes() (_samples.py:163), and that endpoint (/samples/metadata) takes no sample type. The type is used purely to decide which columns are required:

required = (
    attribute.required or sample_type in attribute.required_for_sample_types
)

So a typo'd or unknown --sample-type produces a perfectly valid-looking template with nothing marked required-for-type, and exits 0. Meanwhile docs/cli.md states flatly:

Exit codes0 success; 2 missing --sample-type; 4 unknown sample type; ...

That 4 is unreachable here — NOT_FOUND (4) only comes from a NotFoundError on a transport call, and the one call made doesn't depend on the type. The contract spec hedges correctly (4 unknown sample type "(if surfaced by the lookup)"), but the user-facing doc drops the hedge and overstates the guarantee.

I'd either soften the docs line (e.g. note the type isn't validated until upload-batch) or, if early validation is desired, look it up against client.samples.get_types() and raise NotFoundError/CliUsageError on a miss. The silent acceptance is the kind of thing that bites an agent building a sheet for a mistyped type. Fix this →


🟡 CSV header isn't quoted; column-name collisions aren't guarded

The header is ",".join(column.name ...) (_samples.py:142). Two edge cases:

  • A metadata identifier containing a comma or quote would corrupt the header. Identifiers are presumably slugs, so low risk — but csv.writer would make this robust for free.
  • A metadata attribute whose identifier collides with a reserved name (name, reads1, …), with sample_type, or with a generated <id>__annotation would silently produce a duplicate/confusing column. The docs promise "no sample_type column", but an attribute literally named sample_type would defeat that.

Both are unlikely given how identifiers are minted server-side; worth at most a defensive note rather than a blocker.

🟡 --output is a bare str (repo convention prefers Path)

CLAUDE.md says to prefer pathlib.Path over bare str for filesystem paths. The option is registered without a type, so args.output is a str and you reconstruct Path(args.output) at use (_samples.py:144). Adding type=Path to the argument would align with the convention and drop the reconstruction.

🔵 Confirm intent: --json --output FILE writes the file and prints JSON

Given if output.json_mode or args.output is None:, the combination --json -o file.csv writes the CSV to the file and emits the JSON descriptor list to stdout. That seems reasonable (json mode always emits its document), but it's the one branch not exercised by a test — worth a one-line confirmation or a test if it's a deliberate contract.


Nits / positives

  • Nice reuse of emit_result(human_line=header, document=descriptors) to drive both modes from one call.
  • allow_annotation has both a "populated from payload" and a "defaults to false when absent" test — good backwards-compat coverage.
  • The spec/data-model allows_annotationallow_annotation rename is consistent across all the touched docs. 👍

Nothing here is a hard blocker; the docs/exit-code mismatch is the one I'd want resolved (even just by softening the prose). I didn't run the suite locally — the PR reports 230 passed and the new tests look internally consistent with the implementation.
· branch mhusbynflow/samples-comment-review

Implements User Story 3: `flowbio samples batch-template --sample-type T`
emits a sample-sheet template for a sample type — a CSV header in human
mode (reserved columns, then one column per metadata attribute, each
followed by a `<id>__annotation` companion when the attribute allows it,
with no `sample_type` column), or a per-column descriptor list under
`--json`. A required-vs-optional summary goes to stderr, and `-o/--output`
writes the CSV to a file. With `--json -o FILE`, the CSV is written to the
file and the descriptor list is emitted to stdout (pinned by a test).

Adds the additive `allow_annotation` field to `MetadataAttribute`, the
only library change the template needs. It mirrors the `/samples/metadata`
payload key directly and defaults to `False` when absent.

The sample type is not validated by this command (it only decides which
columns are required), so the docs no longer claim exit 4 for an unknown
type — that is validated at `upload-batch`.

Also adopts argparse `type=Path` for the CLI's path options (data upload
PATH, samples upload --reads1/--reads2, batch-template --output) so
handlers receive real Path objects. Existence validation stays in the
handler rather than argparse `type=`, because parse-time errors bypass the
Output renderer and would break the --json structured-error contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mhusbynflow mhusbynflow force-pushed the mhusbynflow/samples-comment-review branch from c13f33d to b4f466d Compare June 9, 2026 15:35
mhusbynflow and others added 2 commits June 9, 2026 16:40
Extends the path-typing convention to the annotation-template and
upload-multiplexed commands so every path option in the CLI is parsed
straight to a pathlib.Path: annotation-template --output and
upload-multiplexed --reads1/--reads2/--annotation. The handlers drop the
now-redundant Path(...) reconstruction. Behaviour-preserving — existence
validation stays in the handlers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adversarial review caught that `batch-template -o` to an unwritable path
raised an uncaught OSError out of main() as a raw traceback, while the
sibling `annotation-template` mapped the same case to exit 2. Wrap the CSV
write in the same try/except OSError -> CliUsageError guard so an
unwritable destination is a clean usage error (exit 2), and add the
matching negative-path test.

Also tighten `_TemplateColumn.kind` to a Literal of the closed set
(reserved/metadata/annotation) it echoes into the JSON descriptor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mhusbynflow

Copy link
Copy Markdown
Collaborator Author

@claude please review this PR

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Claude finished @mhusbynflow's task in 2m 16s —— View job


Review: samples batch-template (PR #14)

  • Read changed files & gather context
  • Review library change (MetadataAttribute.allow_annotation)
  • Review CLI command (_samples.py)
  • Review tests
  • Check docs & spec consistency

This is a re-review against the current branch. Every substantive point from the earlier review has been addressed — nice follow-through. The PR is in good shape; I'd merge it with at most one optional defensive tweak.


✅ Resolved since the last review

  • The --sample-type / exit-code mismatch (was 🟠). docs/cli.md no longer overstates the guarantee — the exit-code table drops the 4 unknown sample type line, and the prose now reads plainly: "The sample type is not checked against the server here, so an unknown type still exits 0; the type is validated when you run samples upload-batch." (docs/cli.md:259-262, _samples.py:234-237). The doc now matches the implementation. 👍
  • --output is a Path (was 🟡). Both batch-template and the US5 commands register type=Path (_samples.py:178-183), and _data.py got the same treatment — the Path(args.path) reconstruction is gone. Consistent with the CLAUDE.md convention.
  • Unwritable output path (new guard). args.output.write_text(...) is wrapped and re-raised as CliUsageError (_samples.py:317-323), with a test asserting exit 2 and no traceback leak (test_unwritable_output_path_is_usage_error).
  • --json --output FILE contract (was 🔵). Now exercised by test_json_with_output_writes_csv_file_and_emits_descriptors — confirms the file is written and the descriptor list still goes to stdout. The previously-untested branch is now pinned.

🟡 Remaining (optional, non-blocking): header still uses ",".join(...)

The CSV header is built with ",".join(column.name for column in columns) (_samples.py:316). Two theoretical edge cases survive:

  • A metadata identifier containing a comma/quote would corrupt the header (csv.writer would quote it for free).
  • An identifier colliding with a reserved name (name, reads1, …) or with a generated <id>__annotation would silently emit a duplicate column.

Both depend on server-minted identifiers being well-behaved slugs, so the real-world risk is low — this is a defensive nicety, not a blocker. I'd leave it as-is unless you want belt-and-braces robustness. Harden this →


Positives

  • allow_annotation is genuinely additive — default=False plus the dedicated "populated from payload" / "defaults to false when absent" tests give clean backwards-compat coverage (test_samples.py:204-250).
  • The allows_annotationallow_annotation rename is consistent across the contract, data-model, and code.
  • _TemplateColumn + descriptor cleanly drives both human (header) and --json (descriptors) modes from one emit_result, and the test matrix covers header order, the annotation companion, the required summary, -o, --json, --json -o, the unwritable path, and missing --sample-type.

Nothing here blocks merge. I read all nine changed files but did not run the suite locally (the pytest invocation needs approval in this environment); the PR reports 230 passed and the new tests are internally consistent with the implementation.

· branch mhusbynflow/samples-comment-review
·

@mhusbynflow mhusbynflow left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Comment thread docs/cli.md Outdated
Comment thread flowbio/cli/_samples.py Outdated
Comment thread flowbio/cli/_samples.py Outdated
mhusbynflow and others added 4 commits June 10, 2026 17:15
Build the reserved sample-sheet columns as _TemplateColumn instances up
front instead of (name, required, description) tuples converted on use, so
_template_columns just copies them rather than re-mapping. No behaviour
change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
An unrecognised --sample-type previously produced a valid-looking template
at exit 0, with nothing flagged required-for-type — a quiet trap when an
agent builds a sheet for a mistyped type. Validate the type against
client.samples.get_types() up front and, on a miss, fail with a usage
error (exit 2) listing the available identifiers. This shares the same
identifier space as the required-column derivation (sample_type_links), so
a type that validates is also one the derivation understands.

Docs and the batch-template contract are updated to document the
validation and drop the earlier "not checked here" wording.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the bare `str` used for sample-type identifiers with a
`SampleTypeId = NewType("SampleTypeId", str)`, per the repo convention of
naming primitive identifiers. Applied across the v2 public surface
(`SampleType.identifier`, `MetadataAttribute.required_for_sample_types`,
`upload_sample`, `get_annotation_template`, `_build_sample_fields`) and the
CLI handlers, which wrap the argparse string at the boundary as they
already do for `Token`/`BaseUrl`. Exported from `flowbio.v2`.

Named `SampleTypeId` (not `SampleType`) to avoid colliding with the
existing `SampleType` model. Behaviour-preserving — `NewType` is the
underlying `str` at runtime.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tion

The module docstring claimed --sample-type is always sent as-is and not
pre-checked by the CLI, which stopped being true once batch-template gained
up-front type validation. Note batch-template as the local-validation
exception.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread flowbio/cli/_samples.py Outdated
"""
destination = args.output
template = client.samples.get_annotation_template(args.sample_type)
template = client.samples.get_annotation_template(SampleTypeId(args.sample_type))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of casting args.sample_type, could we not instead add SampleTypeId as the type to add_argument?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — set type=SampleTypeId on the three --sample-type arguments (upload, annotation-template, batch-template) and dropped the SampleTypeId(args.sample_type) casts from the handlers, mirroring the type=Path treatment of path options. argparse applies the type to the annotation-template 'generic' string default too, so it stays consistently typed. Behaviour-preserving; 251 tests pass.

Set type=SampleTypeId on the --sample-type arguments (upload,
annotation-template, batch-template) so argparse produces the named type
directly, mirroring the type=Path treatment of path options. Drops the
three SampleTypeId(args.sample_type) casts in the handlers. argparse
applies the type to the annotation-template "generic" default too, so it is
typed consistently. Behaviour-preserving.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mhusbynflow mhusbynflow merged commit d0c2f83 into master Jun 10, 2026
5 checks passed
@mhusbynflow mhusbynflow deleted the mhusbynflow/samples-comment-review branch June 10, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant