Feat/tree tests rollup populate command#1874
Feat/tree tests rollup populate command#1874gustavobtflores wants to merge 2 commits intokernelci:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a one-off Django management command to (re)populate tree_tests_rollup from existing Tests/Builds/Incidents data, and centralizes the rollup processed-item key hashing helper so both streaming aggregation and backfill paths use the same key generator.
Changes:
- Introduces
populate_tree_tests_rollupmanagement command to recompute/upserttree_tests_rollupand upsert correspondingProcessedListingItemsentries. - Moves
get_rollup_key()intoprocess_pending_helpersand reuses it fromprocess_pending_aggregations. - Tightens typing in
process_pending_helpersto useRollupKeyfor rollup dictionaries.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
backend/kernelCI_app/management/commands/process_pending_aggregations.py |
Imports shared get_rollup_key() instead of defining a local version. |
backend/kernelCI_app/management/commands/populate_tree_tests_rollup.py |
New command that aggregates tests into rollup buckets and writes rollup + processed markers. |
backend/kernelCI_app/management/commands/helpers/process_pending_helpers.py |
Adds shared get_rollup_key() and updates rollup typing to use RollupKey. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _fetch_test_issues(test_ids: list[str]) -> dict[str, dict]: | ||
| """Bulk-fetch the first incident per test_id, returning {test_id: {issue_id, issue_version}}.""" | ||
| issues_map: dict[str, dict] = {} | ||
| incidents = Incidents.objects.filter( | ||
| test_id__in=test_ids, | ||
| ).values("test_id", "issue_id", "issue_version") | ||
|
|
||
| for inc in incidents: | ||
| issues_map.setdefault( | ||
| inc["test_id"], | ||
| { | ||
| "issue_id": inc["issue_id"], | ||
| "issue_version": inc["issue_version"], | ||
| }, | ||
| ) | ||
|
|
There was a problem hiding this comment.
_fetch_test_issues claims to return the “first incident per test_id”, but the queryset has no ordering, so which incident is chosen is not deterministic. Either add a deterministic order_by(...) that matches the intended definition of “first” or adjust the docstring to avoid implying determinism.
| """Iterate over checkouts matching the filter criteria.""" | ||
| checkouts_qs = Checkouts.objects.order_by("-start_time", "-id") | ||
|
|
||
| if since_days: |
There was a problem hiding this comment.
_iter_checkouts uses if since_days: which will skip filtering when --since-days 0 is provided. Use an explicit is not None check so 0 behaves as a valid value (now - 0 days).
| if since_days: | |
| if since_days is not None: |
| self.stderr.write(f"ERROR: Checkout with id={checkout_id} not found") | ||
| exit(1) | ||
|
|
||
| if limit: |
There was a problem hiding this comment.
_iter_checkouts uses if limit: which ignores --limit 0 and treats it like “no limit”. Use if limit is not None (and optionally validate non-negative) to make CLI behavior predictable.
| if limit: | |
| if limit is not None: |
| if not checkouts_qs.exists(): | ||
| self.stderr.write(f"ERROR: Checkout with id={checkout_id} not found") | ||
| exit(1) |
There was a problem hiding this comment.
Avoid calling exit(1) inside a Django management command; it bypasses Django’s error handling and can make failures harder to test/compose. Prefer raising CommandError (or SystemExit via sys.exit) after writing the message.
| processed_rows.extend( | ||
| ProcessedListingItems( | ||
| listing_item_key=get_rollup_key(t.id), | ||
| checkout_id=checkout.id, | ||
| status=simplify_status(t.status), | ||
| ) | ||
| for t in test_chunk | ||
| ) |
There was a problem hiding this comment.
This command chunks DB iteration, but still accumulates all ProcessedListingItems for the checkout in processed_rows before writing. For large checkouts this can create high memory pressure; consider bulk-upserting processed items per chunk (or flushing the list periodically) instead of buffering the full set.
| tests_qs = Tests.objects.filter(build_id__in=builds.keys()).select_related( | ||
| "build__checkout" | ||
| ) |
There was a problem hiding this comment.
tests_qs uses select_related("build__checkout"), but the loop only reads fields directly on Tests and you already fetched Builds (with checkout) separately. This adds an unnecessary join and extra data transfer; consider dropping the select_related here to reduce query cost.
| tests_qs = Tests.objects.filter(build_id__in=builds.keys()).select_related( | |
| "build__checkout" | |
| ) | |
| tests_qs = Tests.objects.filter(build_id__in=builds.keys()) |
No description provided.