Skip to content

feat: add move_project_data management command#1192

Open
mihow wants to merge 19 commits intomainfrom
move-project-data
Open

feat: add move_project_data management command#1192
mihow wants to merge 19 commits intomainfrom
move-project-data

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Mar 25, 2026

Summary

Management command to move deployments and all associated data from one project to another. Motivated by a request to move northern Quebec stations (Umiujaq, Kuujjuaq, Kangiqsujuaq, Inukjuak) from "Insectarium de Montreal" into a new "Nunavik" project.

This is a comprehensive data transfer that handles 16 categories of related data:

  • Deployments, Events, SourceImages, Occurrences, Jobs (direct project FK updates)
  • Detections, Classifications, Identifications (indirect, follow via FK chains)
  • Shared resources: S3StorageSources, Devices, Sites (clone if shared, reassign if exclusive)
  • Mixed SourceImageCollections (split into source/target copies)
  • Pipeline configs and ProcessingService links
  • Taxa M2M links for occurrence determinations
  • Identifier users added to target project with their existing role preserved
  • TaxaLists linked to target project
  • Default filter config copied (score threshold, include/exclude taxa, default pipeline)
  • Cached field recalculation on deployments, events, and both projects

Usage

# Dry run (default) — shows full before/after analysis
python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284

# Execute
python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284 --execute

Features

  • Dry run by default with full scope analysis before any changes
  • Conditional scope warning showing data weight (lightweight for unprocessed images, detailed for processed data with identifications)
  • Per-deployment before/after snapshots with row counts for all model types
  • Conservation checks verifying source + target = original totals
  • Validation suite covering FK integrity, indirect access consistency, leak detection, and collection integrity
  • Atomic transaction — rolls back cleanly on any failure (validation runs inside the transaction)

Fixes from review

  • Mutual exclusion check for --target-project / --create-project
  • Pipeline config clone preserves enabled and config fields
  • Collection clone preserves method and kwargs fields
  • Validation queries scoped to moved deployments (not entire target project)
  • Validation failure now raises CommandError (rolls back transaction)
  • Project creation uses Project.objects.create(create_defaults=True) via ProjectManager
  • Bulk taxa linking via target_project.taxa.add(*taxa_ids)
  • Removed unused Taxon import and f-prefix on strings without placeholders

Takeaway-review fixes (2026-05-01)

  • Job→Collection cross-link fixed. When a SourceImageCollection was mixed (referenced by both moved and staying deployments), the script cloned it to the target project but moved Jobs still pointed at the source collection. Now: collection_clone_map is built during the split step and moved Jobs are remapped to the cloned target collection. With --no-clone-collections, the FK is nulled instead so no cross-link survives.
  • Validation now fails+rolls back if any moved Job ends up cross-linked to a SourceImageCollection on a non-target project.
  • session_time_gap_seconds is now copied to the target project (was the one ProjectSettingsMixin field missing from the default-filter copy block).
  • --fire-post-save flag added. Off by default. Bulk .update() bypasses post_save on Event/SourceImage/Occurrence/Job; no listeners exist on those models today, but the flag is the opt-in path if any are added later. Calls full Deployment.save() post-commit (also queues Celery event regrouping).
  • Operator notes printed after MOVE COMPLETE: cache-lag warning, signal status, DataExport residue count, source-side residue (cloned shared resources, multi-project TaxaLists, taxa M2M).
  • clone_or_reassign helper collapses 3× S3/Device/Site loops.
  • Step numbers replaced with semantic labels ([create-target], [move-deployment], ...) — old [N/12] and [N/16] ranges were both used and 5 different ops shared [10/12].

Testing & validation

Automated tests (41 tests)

Test file: ami/main/tests/test_move_project_data.py

docker compose -f docker-compose.ci.yml run --rm django python manage.py test ami.main.tests.test_move_project_data --keepdb -v2
Test class Count Coverage
TestMoveToExistingProject 4 Basic move, taxa linking, conservation counts, jobs moved
TestDryRun 1 No DB mutations without --execute
TestCreateProject 2 --create-project flag, member copying
TestErrorHandling 8 All input validation error paths (incl. source==target, duplicate deployment ids)
TestSharedResourceCloning 7 Device/Site/S3 clone-vs-reassign, external device unchanged
TestCollectionHandling 5 Exclusive reassign, mixed split (preserves method/kwargs), --no-clone-collections, mixed-split remaps moved Jobs, --no-clone-collections nulls moved Job collection FK
TestPipelineConfigCloning 3 Config clone with fields, existing not duplicated, --no-clone-pipelines
TestProcessingServiceLinking 2 Services linked via raw SQL, no duplicates
TestIdentifierRolePreservation 2 Identifiers added to target, already-member not duplicated
TestTaxaListLinking 1 TaxaLists linked to target project
TestDefaultFilterCopying 2 Score threshold, include/exclude taxa copied
TestEdgeCases 4 Empty deployment, move all, multiple deployments, pre-existing taxa

Manual validation (pre-production)

Before running on production data, validate on a staging DB or local backup:

  1. Dry run first — review the full output, check per-deployment counts match expectations

    python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284
  2. Execute — review the validation output at the end

    python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284 --execute
  3. Post-move spot checks in Django admin or shell:

    • Source project's deployment list no longer shows moved deployments
    • Target project's deployments, events, and occurrences are browsable
    • Images load correctly in the UI (S3 paths are unchanged)
    • Processing pipelines are available on the target project
    • Occurrence filters and taxa lists work on the target
    • Identifier users can log in and see their identifications in the target project
  4. If validation fails, the transaction is rolled back automatically and a CommandError is raised with details. No manual cleanup needed.

Test plan

  • 41 automated tests passing
  • Tested locally on BCI project (2 deployments, 73k images, 57k occurrences, 81k classifications, 751 identifications)
  • All per-deployment row counts preserved after move
  • Conservation checks passed (source + target = original)
  • FK integrity verified (all moved data points to target project)
  • Indirect access verified (detections, classifications, identifications resolve correctly via project)
  • No data leaked to source project
  • Mixed SourceImageCollections split correctly
  • Shared devices cloned, NULL-project devices left as-is
  • Test on staging with real Nunavik data before production use

Summary by CodeRabbit

  • New Features

    • New command to move one or more deployments between projects with optional atomic execution
    • Dry-run preview mode and option to create the target project
    • Options for cloning or reassigning shared resources, splitting mixed image collections, copying pipeline configs, linking processing services, and preserving identifier roles and default filters
    • Comprehensive in-transaction validation and post-move recalculation
  • Documentation

    • Added deployment reassignment guide and project portability specification
  • Tests

    • Extensive end-to-end test coverage for move scenarios and edge cases

mihow and others added 6 commits March 25, 2026 15:18
Management command to move deployments and all related data between
projects. Handles all direct and indirect relationships including
Events, SourceImages, Occurrences, Jobs, Detections, Classifications,
Identifications, SourceImageCollections (with mixed-collection splitting),
pipeline configs, processing services, and taxa M2M links.

Features:
- Dry-run mode by default, --execute to commit
- Per-deployment before/after snapshots with row counts
- Conservation checks (source + target = original)
- FK integrity and indirect access validation
- Shared resource handling (clone vs reassign devices, sites, S3 sources)
- Raw SQL for ProcessingService M2M to avoid ORM column mismatch

Co-Authored-By: Claude <noreply@anthropic.com>
Documents the full relationship map, edge cases, and validation
checklist for moving deployments between projects.

Co-Authored-By: Claude <noreply@anthropic.com>
The reassign_deployments command now also recalculates:
- Event cached counts (captures_count, detections_count, occurrences_count)
- Both source and target project related calculated fields

Co-Authored-By: Claude <noreply@anthropic.com>
- Renamed from reassign_deployments to move_project_data to better
  communicate the scope of the operation (all occurrences, identifications,
  classifications, etc. — not just deployment records)
- Added conditional scope warning that shows full data weight when
  processed data exists, or a simple note for unprocessed transfers
- Added identifier membership: users who made identifications on moved
  data are auto-added to the target project with Identifier role
- Added default filter config copy (score threshold, include/exclude
  taxa, default processing pipeline)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
- Identifier users now get their existing source project role preserved
  (e.g. ProjectManager stays ProjectManager), falling back to Identifier
  role only for users who aren't source project members
- TaxaLists linked to source project are now also linked to target
  project (M2M add, not move — TaxaLists can be shared)
- Dry-run output shows TaxaLists and role assignments that will be made

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 25, 2026 23:49
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 25, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit d90d7f7
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69f4c26a3bc9d10008bf2341

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 25, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit d90d7f7
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69f4c26a6f9a120008a83b73

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c3df104a-0484-449c-b4e6-4fd8ad73ff50

📥 Commits

Reviewing files that changed from the base of the PR and between ae41cb2 and d90d7f7.

📒 Files selected for processing (4)
  • ami/main/management/commands/move_project_data.py
  • ami/main/tests/test_main.py
  • ami/main/tests/test_move_project_data.py
  • docs/claude/planning/deployment-reassignment-guide.md
✅ Files skipped from review due to trivial changes (1)
  • docs/claude/planning/deployment-reassignment-guide.md

📝 Walkthrough

Walkthrough

Adds a new Django management command move_project_data to atomically move selected Deployments and their related DB records between Projects (with dry-run option), plus comprehensive tests and two documentation drafts: a deployment reassignment guide and a project portability specification.

Changes

Cohort / File(s) Summary
Deployment Reassignment Command
ami/main/management/commands/move_project_data.py
New, large management command implementing CLI parsing, pre-move snapshots, in-transaction move/create target, shared resource clone-or-reassign logic, collection split/clone handling, pipeline/taxa/identifier linkage, extensive in-transaction validations, commit/after-commit recalculation, and reporting.
Tests for Move Command
ami/main/tests/test_move_project_data.py
New end-to-end test suite exercising dry-run and execute flows, error cases, create-target behavior, resource cloning semantics (S3/device/site), collection splitting vs. removal, pipeline/processing-service linking, taxa/identifier linking, default-filter copying, and many edge cases.
Deployment Reassignment Documentation
docs/claude/planning/deployment-reassignment-guide.md
New guide documenting models affected, shared-resource behaviors, collection-splitting strategy, validation checklist, and operational notes for reassigning deployments without moving physical S3 data.
Project Portability Specification
docs/claude/planning/project-portability-spec.md
New draft spec describing UUID-based natural keys, optional Organization namespace, export/import command schemas and ordering, FK/UUID mapping rules, conflict modes, and Darwin Core/export integration considerations.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant CMD as move_project_data\n(Command)
    participant DB as Database
    participant S3 as S3 (objects)
    participant PROC as ProcessingServices

    CLI->>CMD: invoke (source, deployments, target/create, flags)
    CMD->>DB: validate source, deployments, collect snapshots
    CMD->>DB: begin transaction
    CMD->>DB: create or select target project
    CMD->>DB: clone/reassign shared resources (device, site, storage)
    CMD->>DB: update deployments.project_id and related FKs (events, images, occurrences, jobs)
    CMD->>DB: split/clone or reassign source image collections as needed
    CMD->>DB: clone pipeline configs & link processing services (raw SQL)
    CMD->>DB: link taxa, taxa lists, and identifier memberships to target
    CMD->>DB: run in-transaction validations (counts, FK integrity, residue checks)
    CMD->>DB: commit transaction
    DB-->>CMD: transaction committed
    CMD->>DB: recompute cached fields and post-commit validations
    CMD->>CLI: output summary, diffs, and operator notes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐰 Hopping rows from one project to the next,
I count each event and mend what’s perplexed,
I split collections and stitch every link,
With careful hops and an atomic wink —
A rabbit’s nod to data moved with respect.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a Django management command called move_project_data that enables moving deployments between projects.
Description check ✅ Passed The PR description is comprehensive and follows the template structure with Summary, List of Changes, Related Issues (section present but no specific issue referenced), Detailed Description, How to Test, and Checklist sections all completed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch move-project-data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an operator-facing management command to move one or more deployments (and their dependent data) from a source project to a target/new project, plus a written guide documenting the relationship map and validation steps for doing so safely.

Changes:

  • Added move_project_data Django management command implementing deployment reassignment with dry-run analysis, execution, and post-move validation.
  • Added planning/ops documentation describing the full relationship map, edge cases (shared resources, mixed collections), and a validation checklist.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
docs/claude/planning/deployment-reassignment-guide.md New guide documenting deployment reassignment relationships, strategies, and validation checklist.
ami/main/management/commands/move_project_data.py New management command to analyze and execute moving deployments and related data between projects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/claude/planning/deployment-reassignment-guide.md Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py
Comment thread ami/main/management/commands/move_project_data.py Outdated
mihow and others added 2 commits March 26, 2026 12:21
Design spec for making projects portable between Antenna instances.
Covers UUID fields, Organization model, natural keys, export/import
commands, and Darwin Core integration. Includes 21 research areas
spanning internal data validation, biodiversity standards (GBIF,
iNaturalist, BOLD, Camtrap DP), and patterns from non-biodiversity
applications (GitLab, WordPress, Notion, Jira).

Status: draft — pending research and validation.

Co-Authored-By: Claude <noreply@anthropic.com>
Documents concrete use cases for each model's UUID beyond export/import:
Darwin Core field mappings (occurrenceID, eventID, etc.), ML pipeline
coordination (Pipeline/Algorithm slug collision risk), device tracking
across institutions, and scientific reproducibility requirements.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/management/commands/move_project_data.py`:
- Line 344: The code is logging user.email via self.log(...) and sending it to
logger.info, which exposes PII; update the occurrences that reference user.email
(the self.log(...) call and the logger.info usage) to avoid storing emails by
either logging a non-PII identifier (e.g., user.id or user.pk), a masked email
(e.g., mask local part), or a stable hash of the email, while preserving
role_to_assign.display_name and role_source in the message; ensure both the
self.log(...) invocation and the corresponding logger.info(...) call are changed
consistently to use the non-PII value and not user.email.
- Line 333: These lines use f-strings with no interpolation (triggering F541)
and likely loop-captured variables in inner scopes (B007); replace f-string
literals like self.log(f"\n  Identifiers (users with identifications on moved
data):") with plain string literals (self.log("\n  Identifiers (users with
identifications on moved data):")) and similarly update the other occurrences at
the reported locations (lines showing self.log(f"...")); search for all
self.log(f"...") instances in move_project_data.py and convert those with no
placeholders to normal strings, and where B007 arises, ensure any loop variables
used inside nested functions/comprehensions are passed as defaults or copied
into a new local variable before inner usage.
- Around line 511-515: The cloned SourceImageCollection created in
SourceImageCollection.objects.create currently only copies name, project_id and
description which drops sampling metadata; update the creation to also copy the
sampling fields from the original collection by including method and kwargs
(e.g., set method=coll.method and kwargs=coll.kwargs) so the new_coll preserves
the original collection semantics when moved.
- Around line 175-187: The command currently prefers create_project when both
--create-project and --target-project are passed; update the options handling in
move_project_data (around create_project_name, target_project logic) to detect
when both options are provided and raise a CommandError (e.g., "Specify only one
of --target-project or --create-project") instead of silently preferring
create_project; keep the existing branches for the single-option cases
(resolving target_project via Project.objects.get and logging, or setting
target_project=None and create_project_name) unchanged.
- Around line 695-723: The validation wrongly compares all records in the target
project to only the moved deployments; update the *_via_project queries
(dets_via_project, cls_via_project, idents_via_project) to also filter by
deployment_ids so they count only records for the moved deployments in the
target project (e.g. add source_image__deployment_id__in=deployment_ids to
Detection/Classifications filters and
occurrence__deployment_id__in=deployment_ids to Identification filters while
keeping source_image__project_id=target_id / occurrence__project_id=target_id).

In `@docs/claude/planning/deployment-reassignment-guide.md`:
- Around line 36-37: Update the wording so it matches the actual command
behavior: change the Device and Site guidance that currently suggests setting
their FK to NULL to instead state that the command clones or reassigns Device
and Site records (do not describe NULL as an option), and update the
project-creation description to indicate the command only sets the target
project's name and owner (omit mention of description). Ensure the edits touch
the same Device/Site lines and the project creation sentences referenced (the
blocks around the Device/Site rows and the project creation paragraph) so the
doc reflects cloned/reassigned ownership and limited project fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fbde3add-8c7c-4079-bf02-a2aa73a1845a

📥 Commits

Reviewing files that changed from the base of the PR and between 5af9ce7 and b859a99.

📒 Files selected for processing (3)
  • ami/main/management/commands/move_project_data.py
  • docs/claude/planning/deployment-reassignment-guide.md
  • docs/claude/planning/project-portability-spec.md

Comment thread ami/main/management/commands/move_project_data.py
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread ami/main/management/commands/move_project_data.py
Comment thread ami/main/management/commands/move_project_data.py Outdated
Comment thread docs/claude/planning/deployment-reassignment-guide.md Outdated
mihow and others added 2 commits March 31, 2026 19:18
Fixes from Copilot and CodeRabbit review:
- Mutual exclusion check for --target-project / --create-project
- Pipeline config clone preserves enabled and config fields
- Collection clone preserves method and kwargs fields
- Validation queries scoped to moved deployments (not entire target)
- Validation failure raises CommandError (triggers transaction rollback)
- Project creation uses ProjectManager (create_defaults=True)
- Bulk taxa linking via target_project.taxa.add(*taxa_ids)
- Remove unused Taxon import and f-prefix on plain strings

Add 37 automated tests covering:
- Basic move, dry run, --create-project
- All 6 error handling paths
- Shared resource clone-vs-reassign (Device, Site, S3StorageSource)
- Collection split/reassign logic
- Pipeline config cloning
- ProcessingService linking
- Identifier role preservation
- TaxaList linking, default filter copying
- Edge cases (empty deployment, move all, multiple deployments)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
ami/main/management/commands/move_project_data.py (1)

346-346: ⚠️ Potential issue | 🟠 Major

Avoid logging identifier email addresses.

These two lines still emit user.email to both command output and logger.info, which creates unnecessary PII retention for an admin migration. Log user.pk or a masked address instead.

Also applies to: 576-576

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/management/commands/move_project_data.py` at line 346, The log
currently prints full PII via user.email in the self.log call; change it to use
a non-PII identifier (e.g., user.pk) or a masked email instead of user.email.
Update the occurrences where user.email is used in move_project_data.py (the
self.log call that formats f"{user.email}: {role_to_assign.display_name}
({role_source})" and the similar occurrence around line 576) to log user.pk or a
masked value while preserving role_to_assign.display_name and role_source.
🧹 Nitpick comments (1)
ami/main/tests/test_move_project_data.py (1)

325-362: Replace msg= parameter with assertRaisesRegex to actually verify exception messages.

The msg= parameter in assertRaises is only used when the assertion fails (i.e., no exception is raised). It does not verify the exception message contents. Use assertRaisesRegex instead to match both the exception type and message text.

Example fix
-        with self.assertRaises(CommandError, msg="does not exist"):
+        with self.assertRaisesRegex(CommandError, "does not exist"):
             _run_command(*self._base_args(), "--target-project", "99999")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/tests/test_move_project_data.py` around lines 325 - 362, Replace
uses of self.assertRaises(..., msg="...") with
self.assertRaisesRegex(CommandError, r"pattern") in the tests listed
(test_source_project_not_found, test_target_project_not_found,
test_deployment_not_found, test_deployment_wrong_project,
test_both_target_and_create, test_neither_target_nor_create) so the exception
message is actually verified; for example call
self.assertRaisesRegex(CommandError, r"Source project 99999 does not exist")
around the _run_command in test_source_project_not_found and use appropriate
regexes like r"does not exist", r"not found", r"not in source project", r"not
both", and r"Must specify" in the corresponding tests when invoking
_run_command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/management/commands/move_project_data.py`:
- Around line 593-603: The current copy step merges taxa and only conditionally
copies the pipeline, allowing stale values to persist; instead clear and replace
the target's defaults: call clear() on
target_project.default_filters_include_taxa and
target_project.default_filters_exclude_taxa, then add all taxa from
source_project.default_filters_include_taxa and
source_project.default_filters_exclude_taxa; always assign
target_project.default_filters_score_threshold =
source_project.default_filters_score_threshold and unconditionally set
target_project.default_processing_pipeline =
source_project.default_processing_pipeline (even if None) before saving the
project in move_project_data.
- Around line 143-166: The deployment ID parsing is brittle: parse
options["deployment_ids"] into a cleaned, deduplicated list (e.g., split on ',',
strip tokens, ignore empty tokens, attempt int() inside a try/except to raise
CommandError on non-integer values) and assign to a new variable like
deployment_ids_normalized (or replace deployment_ids); then use
Deployment.objects.filter(pk__in=deployment_ids_normalized) and compare
deployments.count() against len(deployment_ids_normalized) (not the raw parsed
list) so duplicates/trailing commas don't cause false failures; ensure any
ValueError from int() is converted into a CommandError with a clear message.
- Around line 173-189: After resolving target_project (after the block that sets
target_project from options or when create_project_name is None), add a
validation that rejects using the same project as both source and target: check
options.get("source_project") (or the earlier source_project variable if
present) against the resolved target_project.pk and raise CommandError("Target
project must be different from source project") if they match; ensure this check
runs only when target_project is not None (i.e., when --target-project was
provided).
- Line 411: The validation that raises CommandError must run inside the same
transaction.atomic() so failures roll back; move the validation logic currently
after the transaction.atomic() block (the block starting at the
transaction.atomic() call and ending at that block's closing) into the atomic
block before it exits—specifically relocate the validation checks (the section
that raises CommandError) into the transaction.atomic() scope in
move_project_data.py so any CommandError thrown triggers a rollback; ensure all
subsequent writes and the commit point occur only after validation completes
successfully.

---

Duplicate comments:
In `@ami/main/management/commands/move_project_data.py`:
- Line 346: The log currently prints full PII via user.email in the self.log
call; change it to use a non-PII identifier (e.g., user.pk) or a masked email
instead of user.email. Update the occurrences where user.email is used in
move_project_data.py (the self.log call that formats f"{user.email}:
{role_to_assign.display_name} ({role_source})" and the similar occurrence around
line 576) to log user.pk or a masked value while preserving
role_to_assign.display_name and role_source.

---

Nitpick comments:
In `@ami/main/tests/test_move_project_data.py`:
- Around line 325-362: Replace uses of self.assertRaises(..., msg="...") with
self.assertRaisesRegex(CommandError, r"pattern") in the tests listed
(test_source_project_not_found, test_target_project_not_found,
test_deployment_not_found, test_deployment_wrong_project,
test_both_target_and_create, test_neither_target_nor_create) so the exception
message is actually verified; for example call
self.assertRaisesRegex(CommandError, r"Source project 99999 does not exist")
around the _run_command in test_source_project_not_found and use appropriate
regexes like r"does not exist", r"not found", r"not in source project", r"not
both", and r"Must specify" in the corresponding tests when invoking
_run_command.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21aec0e3-2104-4d9f-abb2-c75687561306

📥 Commits

Reviewing files that changed from the base of the PR and between b859a99 and ae41cb2.

📒 Files selected for processing (3)
  • ami/main/management/commands/move_project_data.py
  • ami/main/tests/__init__.py
  • ami/main/tests/test_move_project_data.py

Comment thread ami/main/management/commands/move_project_data.py
Comment thread ami/main/management/commands/move_project_data.py
Comment thread ami/main/management/commands/move_project_data.py
Comment thread ami/main/management/commands/move_project_data.py Outdated
mihow and others added 3 commits April 17, 2026 13:55
- Move all validation checks inside transaction.atomic() so failures
  trigger a full rollback instead of leaving partially-moved data
- Add source==target project guard (prevents self-move corruption)
- Deduplicate --deployment-ids at parse time
- Remove PII (email addresses) from log output, use user IDs instead
- Update deployment-reassignment-guide to match shared-vs-exclusive logic
- Add tests for source==target and duplicate deployment IDs (39 total)

Co-Authored-By: Claude <noreply@anthropic.com>
Resolves Django test loader collision with the new ami/main/tests/
package (added for test_move_project_data). Both can't coexist —
loader raises 'tests module incorrectly imported'.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow closed this Apr 29, 2026
@mihow mihow reopened this Apr 29, 2026
mihow and others added 6 commits April 29, 2026 13:38
…tep numbers

- Pull S3StorageSource/Device/Site clone-vs-reassign loop into a single
  module-level helper. Each call site collapses to ~5 lines.
- Replace numeric `[N/16]` step prefixes with semantic labels
  (`[create-target]`, `[move-deployment]`, etc). Numbering had drifted
  to two ranges and 5 different ops shared `[10/12]`, which made log
  bisection painful. Labels are stable when steps are added or removed.

Co-Authored-By: Claude <noreply@anthropic.com>
When a SourceImageCollection is mixed (referenced by both moved and staying
deployments), the script clones it into the target project. Previously the
Job's `source_image_collection_id` was untouched, so a moved Job was still
pointing at the source-project collection — re-running the Job would have
processed the wrong dataset.

- Track `collection_clone_map` (old_pk -> new_pk) during the split step.
- Remap moved Jobs to the cloned target collection PKs.
- For `--no-clone-collections`, null out the FK on moved Jobs instead of
  letting them cross-link (Job loses its rerun target, which is documented
  in the warn log).
- Surface DataExport residue in the log: moved Jobs whose `data_export.project_id`
  still points at source. Historical artifacts; not migrated.
- Validation: fail+rollback if any moved Job ends up cross-linked to a
  SourceImageCollection on a non-target project.
- Tests: assert remap on mixed-split, and that --no-clone nulls instead of
  leaving cross-link.

Co-Authored-By: Claude <noreply@anthropic.com>
ProjectSettingsMixin defines five settings fields. The default-filter copy
block was only carrying four (score threshold, include/exclude taxa,
default_processing_pipeline). `session_time_gap_seconds` controls Event
grouping for new captures and was silently inheriting the default (2h)
on a newly-created target instead of the source's value.

Co-Authored-By: Claude <noreply@anthropic.com>
…triggering

Bulk .update() bypasses post_save on Event/SourceImage/Occurrence/Job. The
script's compensation only covers cached-count fields. There are no
post_save listeners on those models in the codebase today, but if any are
added later (search index, audit log, cache warmer) they would silently
miss moved data.

`--fire-post-save` calls full Deployment.save() after commit, which fires
post_save on each moved deployment AND queues Celery event regrouping via
regroup_async=True. Off by default to keep big moves fast.

Co-Authored-By: Claude <noreply@anthropic.com>
Surface non-obvious post-move state directly in command output rather than
expecting operators to read PR descriptions or session notes:

- Cache lag: cachalot and UI may show stale counts briefly after commit
- Signal status: whether post_save fired (and what it implies)
- DataExport residue: count of moved Jobs whose data_export still points
  at another project (historical artifacts; intentionally left in place)
- Source-side residue: shared resources cloned, TaxaLists shared, taxa M2M
  copied — clarifies that the source project is not gutted of these

These are the things a human re-reading the log a week later would otherwise
have to dig out of code or docs.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants