Skip to content

Allow users to customize the time gap between sessions#1292

Open
mihow wants to merge 4 commits intomainfrom
feat/configurable-session-gap
Open

Allow users to customize the time gap between sessions#1292
mihow wants to merge 4 commits intomainfrom
feat/configurable-session-gap

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented May 5, 2026

Closes #906.

Summary

The session_time_gap_seconds project setting was added in #918 but was never read by anything — group_images_into_events() still hardcoded a 120-minute gap, so changing the field had no effect.

This PR:

  • Reads deployment.project.session_time_gap_seconds inside group_images_into_events() when no explicit max_time_gap is passed. Falls back to 120 minutes if the deployment has no project, or if the configured value is None / 0 / negative (with a warning log). See ami/main/models.py:1395-1414.
  • Adds the field as a numeric input on the project's Processing settings page (ui/src/pages/project-details/processing-form.tsx) so admins can edit it without touching Django admin.
  • Adds a POST /api/v2/deployments/<pk>/regroup-sessions/ action that queues the existing ami.tasks.regroup_events Celery task. Lets admins re-trigger session grouping after changing the project setting (the existing auto-regroup only fires when new/ungrouped images are detected, so a setting change alone won't trigger it).
  • Adds a regroup_sessions_deployment custom Project permission, granted to the ProjectManager role, so non-superusers can hit the new endpoint. Mirrors how sync_deployment is wired.
  • Wraps ami.tasks.regroup_events in a per-deployment cache lock (cache.add(), 1 h TTL). Concurrent enqueues — double-clicks, Deployment.save() autoregroup racing with a manual admin trigger, etc. — collapse to a single run instead of overlapping on the same rows. Lower-level than the view, so all callers are protected (new API action, autoregroup, admin bulk action).
  • Adds five TestImageGrouping tests covering cross-midnight burst patterns, same-date merging behavior, the project-setting fallback, the non-positive clamp, and the lock idempotency.

One DB migration (0085_add_regroup_sessions_deployment_permission) registers the new custom permission and grants it to existing ProjectManager role groups via a data migration. The session_time_gap_seconds field itself already existed with default 7200 seconds (= 2 hours = 120 minutes), matching the previous hardcoded behavior.

Use case this fixes

A user reported that monitoring schedules with multi-burst nights (e.g. 2 hours on, 2 hours off, 2 hours on) sometimes group into one Event and sometimes into two, depending on whether the off-window crosses midnight. With this PR, raising the gap setting past the longest off-window collapses cross-midnight bursts into a single Event consistently.

The new test test_cross_midnight_bursts_split_by_short_gap documents this: at the default 2 h gap, two bursts spanning midnight produce 2 Events; at a 4 h gap the same captures produce 1 Event.

API usage

Request

curl -X POST \
  -H "Authorization: Token <your-token>" \
  -H "Content-Type: application/json" \
  https://your-antenna-host/api/v2/deployments/<deployment_pk>/regroup-sessions/

No request body required.

Response

202 Accepted:

{
  "task_id": "f3a1c2b8-9d4e-4f7a-bc01-2e3a4b5c6d7e",
  "deployment_id": 74,
  "project_id": 24
}

The response returns immediately; regrouping runs as a background Celery task. On the test deployment used here (~1.2 k captures), each task completed in 14–22 seconds. Track progress via Flower (http://<host>:5555) or the worker logs.

If a regroup is already in progress for the same deployment, subsequent calls still return 202 with a fresh task_id, but the task short-circuits at the worker (logged: regroup_events skipped for deployment <id>: another run is in progress.) without touching the DB.

Error responses

  • 404 Not Found — deployment pk does not exist or is not visible to the requesting user.
  • 401 Unauthorized / 403 Forbidden — token missing or user lacks the regroup_sessions_deployment permission on the deployment's project (granted to the ProjectManager role and superusers).

Caveats — what this PR does and doesn't change

The setting is now wired through, but Event keying still collapses some timestamp groups. Below are the scenarios actually tested and what they show.

Production data — 5-setting cycle on a real deployment

Cycled session_time_gap_seconds through five values via the new endpoint, on a deployment with ~1.2 k captures currently grouped into 69 Events at the default 2 h gap. Each task completed in 14–22 s.

Setting Observed events Change vs default
30 min 69 none
1 hour 69 none
2 hour (default) 69
4 hour 69 none
24 hour 60 merged 9 cross-midnight nights

Reading: tightening below ~24 h produced no observable Event-count change on this deployment. Loosening to 24 h merged 9 cross-midnight nights (−13 %).

Why tightening doesn't always change Event count

group_images_into_events() keys Events by group_by = start_date.date() and uses Event.objects.get_or_create(deployment, group_by) (ami/main/models.py:1455-1460). Two timestamp groups produced by a tighter gap that start on the same calendar date collide back into a single Event. So the gap setting only changes Event count when:

  1. The off-window between bursts crosses midnight, and
  2. The new gap value is longer than that off-window (so the bursts merge into one timestamp group whose start date stays on the earlier calendar day).

The existing comment at models.py:1516 already flags this: "#904 is expected to rework this reuse path more thoroughly."

Synthetic unit tests added in this PR

ami/main/tests.py::TestImageGrouping:

Test Pattern Setting Result
test_cross_midnight_bursts_split_by_short_gap Bursts at 22:00 day N and 01:00 day N+1, off-window 2 h 35 min 2 h gap 2 Events (split — user's pain)
↑ same ↑ same 4 h gap 1 Event (this PR's fix)
test_same_date_bursts_merge_regardless_of_gap Two bursts at 20:00 and 22:00 same calendar date, off-window 1 h 35 min 1 h gap 1 Event in DB (date collision masks gap — known limitation)
test_session_time_gap_seconds_is_used_when_no_explicit_gap Cross-midnight bursts project setting 7200 s 2 Events
↑ same ↑ same project setting 14400 s 1 Event
test_invalid_session_time_gap_falls_back_to_default Cross-midnight bursts project setting 0 / -1 / -7200 2 Events (default-gap split, not 12)
test_regroup_events_task_is_idempotent_under_concurrent_calls Lock pre-taken before second call Second call exits early; group_images_into_events not invoked

What this PR proves on its own

  • The setting is wired through (24 h → 60 events vs default 69 events confirms group_images_into_events reads it).
  • The new API action and Celery task path both succeed end-to-end.
  • Concurrent enqueues for the same deployment don't run overlapping regroups.
  • True per-session granularity below the day boundary is blocked on Fix incorrect session grouping  #904, which removes the group_by field and replaces date-keyed reuse with time-overlap matching. This PR is the prerequisite that lets Fix incorrect session grouping  #904's reuse-path rework actually act on a configurable threshold.

Related PRs

  • Fix incorrect session grouping  #904 — "Fix incorrect session grouping". Removes group_by field, adds use_existing flag, replaces unique constraint. Complementary to this PR: Fix incorrect session grouping  #904 makes the gap setting visibly meaningful below the day boundary.
  • A separate cleaner fix — keying Events by a noon-anchored "monitoring night" instead of calendar date — has been parked as a follow-up. See docs/claude/planning/session-grouping-strategies.md for the broader inventory.

Test plan

  • yarn build
  • docker compose build django ui
  • Pre-commit hooks
  • manage.py test ami.main.tests.TestImageGrouping — 11/11 pass (5 new + 6 existing)
  • manage.py test ami.main.tests.TestProjectPermissions ami.main.tests.TestRolePermissions — 13/13 pass (no regression from the new permission)
  • Functional verification end-to-end: 5 settings (30 min, 1 hr, 2 hr, 4 hr, 24 hr) cycled through POST /regroup-sessions/. Endpoint returns 202, tasks complete in 14–22 s, 24 hr setting confirmed to merge cross-midnight nights (see table above).

Notes / follow-ups

  • The numeric input submits a string ("7200") because <input type="number"> returns DOM strings. DRF IntegerField parses it transparently. Same pattern is used in 3 other forms in this codebase (capture-set, deployment-location, job-details). Standardizing the form helper to coerce numbers is a separate cleanup, not addressed here.
  • Field unit stays seconds (matches DB column name). The original issue requested minutes; if a rename is preferred, that's a separate PR with a migration.
  • The new regroup-sessions action could be surfaced as a row-action on the Stations list and as a button on the project settings screen — separate UI follow-up.
  • deployment_events_need_update() does not detect a setting change as a reason to regroup, so admins must hit the new API action (or the Django admin bulk action) after editing the gap. Auto-detection on setting change is a possible follow-up.
  • Wrapping regroup_events in the Job framework so admins see progress/logs in the Jobs UI — exploratory plan parked at docs/claude/planning/regroup-events-job-wrapper.md.

Co-author

Co-Authored-By: Claude noreply@anthropic.com

Summary by CodeRabbit

  • New Features

    • Regroup sessions action to reorganize deployment source images into sessions using a configurable session time gap
    • New project-level session time gap setting exposed in the processing form
  • Improvements

    • Regroup task made idempotent to prevent duplicate concurrent runs
  • Permissions

    • Added permission to allow managers to trigger regrouping
  • Tests

    • Added tests covering image grouping, cross-midnight and same-date behavior, defaults and idempotency

mihow and others added 3 commits May 4, 2026 13:16
The session_time_gap_seconds project setting was added in #918 but
never used. group_images_into_events() still hardcoded a 120-minute
gap. This wires the field through and exposes it in the Processing
settings UI so each project can override the session boundary.

Closes #906

Co-Authored-By: Claude <noreply@anthropic.com>
POST /api/v2/deployments/<pk>/regroup-sessions/ queues the existing
ami.tasks.regroup_events Celery task for the deployment. The task
calls group_images_into_events(), which now reads the project's
session_time_gap_seconds setting.

User-facing terminology is "sessions"; "Event" is retained as the
internal model name for backwards compatibility.

Returns 202 with task_id, deployment_id, project_id.

Refs #906

Co-Authored-By: Claude <noreply@anthropic.com>
Three new TestImageGrouping cases that exercise the new
session_time_gap_seconds wiring:

- test_cross_midnight_bursts_split_by_short_gap: documents the user-
  reported pain (2 bursts with off-window > default gap, crossing
  midnight, split into 2 Events on different dates) and proves that
  bumping max_time_gap past the off-window collapses them to 1 Event.
- test_same_date_bursts_merge_regardless_of_gap: documents the inverse
  behavior — bursts on the same calendar date collide on group_by
  regardless of gap setting. Acts as a regression target for any future
  rework of the date-keyed Event reuse path (per the #904 caveat at
  models.py:1516).
- test_session_time_gap_seconds_is_used_when_no_explicit_gap: verifies
  group_images_into_events falls back to the project setting when no
  max_time_gap is passed, including refreshing the cached relation
  after the setting changes.

Refs #906

Co-Authored-By: Claude <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 50d618e
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69fc2e4d35502600084962c5

@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 50d618e
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69fc2e4efc7cb700083d7631

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR makes the session time gap threshold configurable per project, adds a regroup-sessions API endpoint to enqueue regrouping, updates grouping logic to use project settings with sensible fallbacks, adds a UI form field, creates a migration and permission, and adds tests for grouping and idempotency.

Changes

Session Grouping Configuration

Layer / File(s) Summary
Data Shape / Model
ami/main/models.py
group_images_into_events optionally accepts `max_time_gap: timedelta
Background Task / Locking
ami/tasks.py
regroup_events uses a per-deployment cache lock (REGROUP_EVENTS_LOCK_TTL) via cache.add() to avoid concurrent runs; early-returns when locked and ensures lock cleanup in a finally block.
Tests: Grouping + Idempotency
ami/main/tests.py
New _create_burst helper and tests for cross-midnight splitting, same-date merging, project-driven default gap, invalid setting fallback, and idempotency under concurrent calls.
API Integration
ami/main/api/views.py
Removed sync action; added regroup_sessions POST action on DeploymentViewSet that enqueues regroup_events and returns task/deployment/project IDs with HTTP 202.
Permissions / Migration
ami/main/migrations/0085_add_regroup_sessions_deployment_permission.py, ami/users/roles.py
Add regroup_sessions_deployment Project permission, migration to grant it to *_ProjectManager groups with rollback, and add the permission to ProjectManager.permissions.
UI Rendering
ui/src/pages/project-details/processing-form.tsx
Add sessionTimeGapSeconds to ProcessingFormValues, form config entry with label/description/validation, default from project.settings.sessionTimeGapSeconds, and render FormField in the processing form.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API
  participant Celery as Worker
  participant Cache
  participant DB
  Client->>API: POST /deployments/{id}/regroup-sessions
  API->>Worker: enqueue regroup_events(deployment_id)
  API-->>Client: 202 Accepted + task_id
  Worker->>Cache: cache.add(regroup_lock_deployment_{id}, ttl)
  alt lock acquired
    Worker->>DB: load Deployment, call group_images_into_events(...)
    DB-->>Worker: grouped events result
    Worker->>Cache: cache.delete(regroup_lock_deployment_{id})
  else lock exists
    Worker-->>Worker: log and return early
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • RolnickLab/antenna#1268: Also modifies group_images_into_events in ami/main/models.py for grouping logic changes, with potential overlap in session grouping behavior.

Poem

🐰 I hopped through gaps of time and code,
Tucked sessions neatly in a row;
A lock to keep the workers civil,
A form to set how far they go—
Now projects choose the pause we show.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly captures the main PR objective of making the session time gap customizable per project.
Linked Issues check ✅ Passed The PR fully implements issue #906 requirements: moves max_time_gap into the configurable Project model, wires it into group_images_into_events(), provides admin UI access, and adds a mechanism to re-trigger grouping after setting changes.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #906: the API endpoint, permission migration, UI field, task idempotency lock, and tests all support the configurable session gap objective with no extraneous additions.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering summary, detailed changes, use cases, API documentation, testing, and deployment notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/configurable-session-gap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mihow mihow changed the title feat(projects): wire session_time_gap_seconds into event grouping Allow users to customize the time gap between sessions May 5, 2026
@mihow mihow marked this pull request as ready for review May 6, 2026 12:51
Copilot AI review requested due to automatic review settings May 6, 2026 12:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/api/views.py`:
- Around line 320-323: The view currently enqueues regroup_events_task
unconditionally allowing overlapping runs; modify the handler that calls
self.get_object() and regroup_events_task.delay(deployment.pk) to acquire a
per-deployment in-flight lock (e.g., a Redis/cache-based lock keyed by
f"regroup:{deployment.pk}") before queuing; if the lock cannot be obtained
return an early HTTP 409 (or similar) Response and log that a regroup is already
running, otherwise enqueue the task, keep the lock for the duration of the task
(or set a TTL and have the task release the lock) and ensure regroup_events_task
(or its wrapper) releases the lock on completion/failure so duplicate triggers
are prevented.

In `@ui/src/pages/project-details/processing-form.tsx`:
- Around line 30-35: The sessionTimeGapSeconds field currently validates but
will submit a string because React Hook Form doesn't coerce number inputs;
update the field's rules for sessionTimeGapSeconds to include a setValueAs that
converts the incoming value to a number (e.g., Number or parseInt) so the form
value is stored as a numeric type; modify the rules object for
sessionTimeGapSeconds to add setValueAs and keep required/min validations
intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80284157-fc1e-41e0-8398-96a13aa37201

📥 Commits

Reviewing files that changed from the base of the PR and between 183f487 and 55cef8b.

📒 Files selected for processing (4)
  • ami/main/api/views.py
  • ami/main/models.py
  • ami/main/tests.py
  • ui/src/pages/project-details/processing-form.tsx

Comment thread ami/main/api/views.py
Comment thread ui/src/pages/project-details/processing-form.tsx
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires the existing per-project session_time_gap_seconds setting into backend session (Event) grouping, exposes the setting in the UI, and adds an API action to trigger regrouping asynchronously so admins can re-cluster historical captures after changing the threshold.

Changes:

  • Backend: group_images_into_events() now derives its default max_time_gap from deployment.project.session_time_gap_seconds when not explicitly provided.
  • API: adds POST /api/v2/deployments/<pk>/regroup-sessions/ to enqueue the existing ami.tasks.regroup_events Celery task.
  • UI + tests: exposes the setting as a numeric input on the project Processing settings form and adds unit tests covering cross-midnight behavior and the project-setting fallback.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
ami/main/models.py Uses the project’s session_time_gap_seconds as the default grouping gap instead of a hardcoded 120 minutes.
ami/main/api/views.py Adds a deployment-level action endpoint to queue regrouping in Celery.
ui/src/pages/project-details/processing-form.tsx Adds a numeric form field to edit sessionTimeGapSeconds in the Processing settings UI.
ami/main/tests.py Adds tests documenting cross-midnight split/merge behavior and verifying the project setting is used when no explicit gap is passed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/api/views.py
Comment thread ui/src/pages/project-details/processing-form.tsx
Comment thread ami/main/models.py Outdated
…oup task

Addresses takeaway-review feedback on PR #1292:

- Add `regroup_sessions_deployment` custom Project permission so non-superusers
  with the ProjectManager role can hit `POST /deployments/<pk>/regroup-sessions/`
  (mirrors `sync_deployment`). Without this, `BaseModel.check_custom_permission`
  computes the codename `regroup_sessions_deployment` and the perm doesn't
  exist, so the action 403s for everyone except superusers.
- Wrap `ami.tasks.regroup_events` in a per-deployment cache lock. Concurrent
  enqueues (double-clicks, sync_captures auto-regroup racing with manual
  trigger) collapse to a single run instead of overlapping on the same rows.
  Lower-level than the view so all callers (Deployment.save, admin bulk
  action, new API action) are protected.
- Defensive clamp on `session_time_gap_seconds` — fall back to the historical
  120-minute default if the value is 0 / negative, with a warning log. Stops
  pathological "every gap = new event" behavior when admins enter bad values.
- Migration 0085 adds the new perm to `Project.Meta.permissions` and grants it
  to existing `ProjectManager` role groups via a data migration (mirrors how
  0084 revoked `delete_job`).
- Tests:
  - test_invalid_session_time_gap_falls_back_to_default exercises 0 / -1 / -7200
  - test_regroup_events_task_is_idempotent_under_concurrent_calls pre-takes
    the cache lock, asserts a second call skips group_images_into_events

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/tasks.py`:
- Around line 106-119: The current lock uses cache.add(lock_key, 1, ...) and
unconditionally cache.delete(lock_key) which can remove a newer owner's lock;
change to an ownership token pattern: when acquiring the lock (where cache.add
is called) generate a unique token (e.g. UUID) and store that token as the cache
value instead of 1 (use REGROUP_EVENTS_LOCK_TTL as before), keep the token in a
local variable, and on release only remove the key if the stored cache value
equals your token (fetch with cache.get(lock_key) and only call
cache.delete(lock_key) when it matches) or use the cache backend's atomic
compare-and-delete if available; update the lock handling around lock_key,
cache.add, cache.get and cache.delete to use this token ownership check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 251316be-c276-4473-a3fc-36a150e056ed

📥 Commits

Reviewing files that changed from the base of the PR and between 55cef8b and 50d618e.

📒 Files selected for processing (5)
  • ami/main/migrations/0085_add_regroup_sessions_deployment_permission.py
  • ami/main/models.py
  • ami/main/tests.py
  • ami/tasks.py
  • ami/users/roles.py
✅ Files skipped from review due to trivial changes (1)
  • ami/users/roles.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ami/main/models.py

Comment thread ami/tasks.py
Comment on lines +106 to +119
if not cache.add(lock_key, 1, timeout=REGROUP_EVENTS_LOCK_TTL):
logger.info(f"regroup_events skipped for deployment {deployment_id}: another run is in progress.")
return
logger.info(f"Grouping captures for {deployment}")
events = group_images_into_events(deployment)
logger.info(f"{deployment} now has {len(events)} events")
try:
try:
deployment = Deployment.objects.get(id=deployment_id)
except Deployment.DoesNotExist:
logger.error(f"Deployment with id {deployment_id} not found")
return
logger.info(f"Grouping captures for {deployment}")
events = group_images_into_events(deployment)
logger.info(f"{deployment} now has {len(events)} events")
finally:
cache.delete(lock_key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Lock release can delete a newer lock owner

The finally: cache.delete(lock_key) path can clear a lock acquired by a later task when the first lock expires mid-run. That re-opens concurrent regroup execution and breaks the idempotency guarantee.

Suggested hardening
+import uuid
...
-    lock_key = f"regroup_events:lock:deployment:{deployment_id}"
-    if not cache.add(lock_key, 1, timeout=REGROUP_EVENTS_LOCK_TTL):
+    lock_key = f"regroup_events:lock:deployment:{deployment_id}"
+    lock_token = uuid.uuid4().hex
+    if not cache.add(lock_key, lock_token, timeout=REGROUP_EVENTS_LOCK_TTL):
         logger.info(f"regroup_events skipped for deployment {deployment_id}: another run is in progress.")
         return
     try:
         ...
     finally:
-        cache.delete(lock_key)
+        if cache.get(lock_key) == lock_token:
+            cache.delete(lock_key)

For full safety, use an ownership-checked atomic unlock primitive in the cache backend (token compare-and-delete).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/tasks.py` around lines 106 - 119, The current lock uses
cache.add(lock_key, 1, ...) and unconditionally cache.delete(lock_key) which can
remove a newer owner's lock; change to an ownership token pattern: when
acquiring the lock (where cache.add is called) generate a unique token (e.g.
UUID) and store that token as the cache value instead of 1 (use
REGROUP_EVENTS_LOCK_TTL as before), keep the token in a local variable, and on
release only remove the key if the stored cache value equals your token (fetch
with cache.get(lock_key) and only call cache.delete(lock_key) when it matches)
or use the cache backend's atomic compare-and-delete if available; update the
lock handling around lock_key, cache.add, cache.get and cache.delete to use this
token ownership check.

@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented May 7, 2026

Claude says: Push 50d618e addresses the takeaway-review feedback from CodeRabbit and Copilot:

  1. regroup_sessions_deployment custom permission (Copilot) — added to Project.Permissions and Project.Meta.permissions, granted to existing ProjectManager role groups via data migration 0085. Mirrors how sync_deployment is wired. Without this, the action 403'd for all non-superusers.
  2. Per-deployment idempotency lock on regroup_events (CodeRabbit) — cache.add() SETNX-style with a 1 h TTL, applied inside the Celery task itself rather than the view. So all callers (the new API action, Deployment.save autoregroup, the admin bulk action) are protected, not just the new endpoint. Concurrent enqueues collapse to a single run.
  3. Clamp non-positive session_time_gap_seconds (Copilot) — fall back to the historical 120-min default with a warning log for 0 or negative values, preventing pathological per-image splits.
  4. Form numeric coercion (CodeRabbit + Copilot) — left as-is. <input type="number"> returning a string for an integer field is the established pattern across processing-form, capture-set form, deployment-location form, and job-details form. DRF IntegerField parses strings transparently. Worth standardizing the helper later, not in this PR.

New tests covering the lock and the clamp:

  • test_invalid_session_time_gap_falls_back_to_default — exercises 0 / -1 / -7200
  • test_regroup_events_task_is_idempotent_under_concurrent_calls — pre-takes the lock and asserts the second call exits early without invoking group_images_into_events

11/11 TestImageGrouping and 13/13 permission tests green locally.

@mihow mihow requested a review from annavik May 7, 2026 06:24
Copy link
Copy Markdown
Member

@annavik annavik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

I still think this setting would make more sense in a new section, instead of in Project->Settings->Processing. I know you said this can affect the logic for occurrences once tracking is enabled, but I think this is a bit far-fetched. I'm thinking, if I were about to look for this setting as a user, where would I first go? I'm not sure the processing section would be the place. Maybe this belongs more with upcoming sampling schedule configuration we have discussed?

I don't have very strong opinions here though, so if you feel this is where it belongs, go for it! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make max_time_gap a configurable Project setting

3 participants