Skip to content

Dq 665 contract delete#600

Open
shivahanumanthula-atlan wants to merge 5323 commits intoapache:masterfrom
atlanhq:dq-665-contract-delete
Open

Dq 665 contract delete#600
shivahanumanthula-atlan wants to merge 5323 commits intoapache:masterfrom
atlanhq:dq-665-contract-delete

Conversation

@shivahanumanthula-atlan
Copy link
Copy Markdown

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix. Create an issue in ASF JIRA before opening a pull request and
set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. ATLAS-XXXX: Fix a typo in YYY))

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

PRATHAM2002-DS and others added 30 commits February 18, 2026 14:09
MS-625 : Handle ClassCastException in ABAC evaluation
- Add tenant health verification after Temporal workflow completes
- Poll Temporal workflow status until completion (60min timeout)
- Health checks per tenant:
  - Connect to tenant vCluster via vcluster platform
  - Verify all atlas pods exist and containers are ready
  - Verify image pattern (repo-ring) and tag match expected
  - Port-forward to atlas service and check /api/atlas/admin/status
- Post health check results as PR comment
- Setup kubectl, vCluster CLI, and improve VPN connection handling
- Use GLOBALPROTECT_PORTAL_URL from vars instead of hardcoded value

Co-authored-by: Cursor <cursoragent@cursor.com>
feat: Add e2e health checks to cohort release workflow
GITHUB_PATH changes only apply to subsequent steps, not the current
step. Use full path $HOME/.temporalio/bin/temporal for version check.

Co-authored-by: Cursor <cursoragent@cursor.com>
fix: Use full path for temporal CLI in setup step
- Configure routing for 172.17.0.0/16 to VPN interface (Docker conflict)
- Increase VPN stabilization wait to 20s
- Verify vCluster Platform connectivity before login
- Match smoke test VPN configuration from maven.yml

Co-authored-by: Cursor <cursoragent@cursor.com>
fix: Add routing fix for vCluster Platform connectivity
Adds publishAsyncIngestionEvent to classification, relationship, and
business metadata endpoints in EntityMutationService, EntityREST, and
RelationshipREST. Includes unit tests for all new event types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tants

Centralises all hardcoded event type strings into a single constants
class so consumers can reference them without string duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wait up to 5 minutes for pods to have correct image after ArgoCD sync
- Poll every 30 seconds for pod readiness and image tag
- Accounts for StatefulSet rolling update time after ArgoCD applies manifest
- Prevents false failures when checking immediately after Temporal completes

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Add filter-level gate in ActiveServerFilter that returns 503 for admin/repair
write endpoints while ENABLE_ASYNC_INGESTION flag is active. This prevents
graph state mutations that bypass the async Kafka producer pipeline.

Blocked: 24 endpoints across admin, repair, entity repair, and migration repair.
All GET requests, config APIs, and normal CRUD operations remain unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Utility test that generates sample payloads to a file — should only be
run manually when needed, not as part of regular test suites.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Retry entire rollout check up to 2 times if timeout (total max: 30 min)
- Each attempt has 15 minute timeout with 30s poll interval
- Rename variables with _SECS suffix for clarity
- Handles cases where ArgoCD sync takes longer than expected

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove VPN, vCluster, and kubectl setup from PR label release workflow.
Health verification (ArgoCD sync + pod rollout) will be handled by
Temporal workflows, eliminating idle billing during long waits.

GitHub Actions now only triggers Temporal and polls for completion.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace paths-ignore with dorny/paths-filter so the job always runs
and reports a status. This fixes the issue where required checks
remain in "Waiting" state when the workflow is skipped due to paths-ignore.

When only docs/config files change, the job runs quickly and reports
success without running the actual tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
krishnanunni-atlan and others added 17 commits April 7, 2026 11:51
- Update manual-cohort-cleanup to support multi-label detection,
  allow_open_pr for selective rollback, and auto-default source
  to github when path is provided
- Update DEVELOPER-RUNBOOK with accurate tenant counts, release
  gates (SHA+branch validation), new gotchas, and GA/rollback flows
- Update implementation doc with correct atlas ring cohort names,
  bugs 4-6, image override validation, parallelism details, and
  expanded developer guide with selective rollback instructions

Made-with: Cursor
…6497)

* ms-802: Trim whitespaces in name attribute

* ms-802: Refactor code

* ms-802: Remove custom image
… soft-deleted (#6510)

When an entity is soft-deleted, tagDAO.deleteTags() was only called for
HARD/PURGE deletes, leaving tag rows in tags_by_id with is_deleted=false.
This caused orphaned propagated tag rows with:
- is_deleted=false (should be true)
- asset_metadata=null (entity no longer resolvable)
- tag_meta_json.entityStatus=DELETED

The fix removes the delete-type guard so tags are always soft-deleted in
Cassandra regardless of whether the entity delete is soft, hard, or purge.
Downstream propagation cleanup via async tasks is unaffected.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… scaling (#6517)

ThreadPoolExecutor only creates new threads beyond corePoolSize when the
queue is FULL. With queue=200, the pool stays at ~40 threads while 200
requests queue up before any expansion toward maxThreads=400 happens.

This caused thread pool exhaustion on nasdaq-vc: all 3 pods hit 40 threads
with 60-109 requests queued, but the pool refused to grow because the queue
(capacity 200) wasn't full yet.

Queue=10 means the pool starts creating new threads after just 10 requests
are queued, matching the fast-scaling behavior of Jetty's native QueuedThreadPool.

Changed in:
- AtlasConfiguration.java: default 100 -> 10
- helm/atlas configmap: 200 -> 10
- helm/atlas-read configmap: 200 -> 10
- helm/atlas configmap-leangraph: 200 -> 10

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: (ms-903) Emit parent ENTITY_UPDATE on sub-asset soft delete

* fix: (ms-903) Adding integration test cases
…or mixed index registration failures (#6436)

* feat: Add index health Prometheus metrics for mixed index audit

* feat: Adding tenant and pod filter

* fix: Micrometer gauges bind to object references at registration time. If the object is garbage collected, the gauge returns NaN. The gauge must be registered once, using the exact same objects that get updated later.

* fix: Removing custom branch

* fix: Adding java doc

* fix: Addressing the cursor bot reviews

* feat: Removing feature branch

* feat: Add self-healing for missing mixed index property keys (Phase 2)

* fix: Retry RepairIndex bean lookup for async reindex during startup

* fix: Retry RepairIndex bean lookup for async reindex during startup

* fix: Retry RepairIndex bean lookup for async reindex during startup

* chore: Remove non-Phase-2 files from PR

Revert maven.yml to master and remove local dev files
(grafana dashboard, docker-compose, synonym.txt, test file)
that are not part of the Phase 2 self-healing changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adding feature branch

* fix: adding feature branch

* fix: adding feature branch

* trigger build

* chore: trigger CI build

* fix: Fixing review comments

* fix: Fixing review comments

* fix: Fixing review comments

* fix: removing feature branch

* feat: Self-healing for missing mixed index property keys (Phase 2a)

* fix: Now isStringField for the repair matches the original createIndexForAttribute logic exactly:
        - Original: isStringField = true only when primitiveClassType == String.class && IndexType.STRING.equals(indexType)
        - Repair: repairIsStringField = (primitiveClass == String.class) && isStringField

* fix: Increased test timeout from 60s to 120s — accommodates ES stabilization after schema repairs on fresh environments

* fix: don't self-heal if the property key was never created in the first place (fresh environment). Only self-heal if the property key EXISTS in the schema but is NOT in the mixed index. That's the actual production failure mode

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…6485)

* perf(MS-752): optimize getAccessControlEntity to avoid O(N×K) graph reads

For a Persona with K existing policies, the old code called
toAtlasEntityWithExtInfo() once per new policy, triggering
mapRelationshipAttributes (O(K × attrs) JanusGraph reads) and then
looping K existing policies calling toAtlasEntity() each time.

For N=20 new policies and K=500 existing ones this caused ~10,000 redundant
graph reads, producing >30s latency for CME Group.

Fix:
- AuthPolicyPreProcessor: promote noRelAttrRetriever to a constructor field
  (EntityGraphRetriever with ignoreRelationshipAttr=true) and rewrite
  getAccessControlEntity() to load the Persona vertex once via
  entityRetriever.getEntityVertex(), load scalar attrs only via
  noRelAttrRetriever.toAtlasEntity(), then traverse policy edges directly
  via GraphHelper.getActiveCollectionElementsUsingRelationship() — skipping
  mapRelationshipAttributes entirely.
- AccessControlUtils.objectToEntityList(): add null-guard around
  getReferredEntities().keySet() to prevent NPE when the optimised path
  builds AtlasEntityWithExtInfo manually with zero existing policies
  (referredEntities left null by the AtlasEntityWithExtInfo constructor).

Also adds AuthPolicyPreProcessorLatencyTest (10 unit tests) documenting the
root-cause call-count behaviour and guarding against NPE regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: trigger Java CI build for hitesh/ms-752-fix branch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: retrigger Java CI build for hitesh/ms-752-fix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: touch test file to trigger Java CI image build

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: add hitesh-ms-752-fix branch to Java CI image build

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: retrigger build on hitesh-ms-752-fix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: touch source to trigger image build on hitesh-ms-752-fix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf(MS-752): cache ES index name, connection entity, and reuse preprocessor per batch

Fix A — ESAliasStore: cache the resolved physical index name for VERTEX_INDEX_NAME.
The alias→index mapping is static at runtime; one HTTP GET per ESAliasStore instance
is sufficient. Previously this GET fired once per new policy in a bulk request.

Fix C — AuthPolicyPreProcessor: add a per-instance Map<String,AtlasEntity> connection
cache in validateConnectionAdmin(). All N policies in a bulk create typically target
the same Connection, so the graph read now happens once instead of N times.

Fix D — AtlasEntityStoreV2.executePreProcessor(): build a type→preprocessors map
once per request (computeIfAbsent) instead of calling getPreProcessor() (which creates
a fresh instance) for every entity. This is the prerequisite for Fix A and Fix C to
be effective across entities in the same bulk request.

Together with the Fix B already on this branch, the per-request I/O for a bulk create
of N=20 policies against a Persona with K=500 existing policies drops to:
  Persona graph reads   : N → 1
  Policy graph reads    : N×K → K  (Fix B)
  Connection graph reads: N → 1    (Fix C)
  ES index-name GETs   : N → 1    (Fix A + Fix D)
  ES alias PUTs        : N (unchanged; Fix E deferred)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "perf(MS-752): cache ES index name, connection entity, and reuse preprocessor per batch"

This reverts commit 023c447.

* fix: address review issues in getAccessControlEntity edge traversal

- Use AtlasRelationshipEdgeDirection (IN/OUT/BOTH) to pick the correct
  policy vertex from each edge, matching the EntityGraphRetriever pattern
  and avoiding the fragile getIdForDisplay() string comparison
- Use specific relationship type name key ("access_control_policies") when
  looking up the policiesAttr from the relationshipAttributes map, instead
  of blindly taking iterator().next() which may pick the wrong entry if
  multiple relationship types define the same attribute name

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add /atlas-context skill — codebase awareness and learned constraints

Living document encoding architecture, I/O cost model, constraints, review
checklist, and accumulated lessons from MS-752 (Fix B, edge direction bug,
relationship map key, bulk policy perf). Includes self-update protocol so
Claude appends new lessons after each review/incident.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: exercise Fix B edge-traversal path in AuthPolicyPreProcessorLatencyTest

Adds getAccessControlEntity_traversesEdgesForExistingPolicies() to cover
Steps 2-3 of the Fix B optimisation — the GraphHelper edge loop that was
previously unreachable because mockPersonaFetch() stubbed typeRegistry to
return null.

The new test:
- Provides a real AtlasEntityType mock with a valid policiesAttrMap keyed
  by the specific relationship type "access_control_policies"
- Mocks policiesAttr.getRelationshipEdgeDirection() = IN so the correct
  vertex (edge.getOutVertex()) is selected as the policy vertex
- Uses mockStatic(GraphHelper.class) to return K mock edges from
  getActiveCollectionElementsUsingRelationship()
- Asserts noRelAttrRetriever.toAtlasEntity(vertex) is called K times and
  that all K policy GUIDs are registered in ret.getReferredEntities()

Closes the test gap flagged in the PR review (issue-3).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: guard setRelationshipAttribute(REL_ATTR_POLICIES) inside policiesAttr != null block

When typeRegistry.getEntityTypeByName() returns null, policiesAttr is null
and the edge traversal is skipped. Previously setRelationshipAttribute was
called unconditionally with an empty policyObjectIds, causing ESAliasStore
to rebuild the Persona's ES alias with zero existing policies — silently
wiping all K existing filter clauses from janusgraph_vertex_index.

Move setRelationshipAttribute inside the if-block and add a warn log for
the fallback case so the failure is observable in logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* undo

* docs: add /zerograph-deployer Claude skill

Step-by-step skill covering health check, connectivity test, trigger
(async/sync/dry-run), status polling, input params, CI pipeline, and
common ops for the mothership-zerograph deployer agent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: clarify masterBehavior_* tests as post-fix regression guards

Remove stale "switch assertion below" instructions and "BUG DOCUMENTATION"
framing. The assertions are already set to the post-Fix-B values (times(0))
— update Javadoc to reflect that these are regression guards, not master
baseline documentation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* remove the skill zg-deployer, not patrt of this pr

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The default per_page for listReviews is 30. PRs with many review
comments/iterations can exceed this, causing recent approvals on
the current HEAD to be missed.

This caused ring releases to fail with "No approvals found" even
when approvals existed on the current HEAD SHA.

Made-with: Cursor
…butes (#6531)

* feat: Phase 2b — controlled reindex for self-healed mixed index attributes

* fix: adding the dedup on the producer side and removing the redis dependency

* fix: addressing review comments
* Atlas testing harness (#6471)

* added fix for deletion for <10K assets ES sync

* BulkPurgeService with crash recovery, concurrency safety, and ES cleanup

* removed unnecessary configs

* hardcoded the self graph loading to make new object

* made config fetching synchronized across threads

* fixed cancel API in case of cancel+ cancel or cancel+purgeTrigger

* forced cancel should be cleaned post cancellation

* added the case fix for cancelPurge-> retrigger-> cancel

* enforced worker queue to be called only on force cancel

* addressed comments, moved cleanup to finally

* added test harness suit for all endpoints first commit

* added tests for harnesS

* fixed tests and data for harness

* fixed tests, added more tests related to deletes

* fixed time for es sync along with flagging

* added kafka helpers and more test on entity behaviour

* added lineage and propagation tests

* added tests for lineages and all

* added tests for lineages and all

* added the pending tests for no-op

* fixed audit search index time by adding correct filters

* fixed typedef, lineage, classification and busiess metadata tests

* fixed glossary tests as per the UI

* checkpoint benchmark1

* increased timeouts for searches

* fixed lineage specific tests

* updated suite

* added tests for evaluator, accessor, bulk unique attr (#6345)

* Testing harness extended (#6370)

* added tests for evaluator, accessor, bulk unique attr

* added glossary and attribute test

* Testing harness extended (#6401)

* chore: remove Tags V1 dead code from propagation tasks and ClassificationAssociator (#6305)

- ClassificationPropagationTasks: remove isTagV2Enabled() branches in Add, UpdateText,
  Delete, and RefreshPropagation tasks. V2 path (Cassandra) is now always taken.
  Also removed unused previousRestrictPropagation* local vars from Add.run().
- ClassificationAssociator: remove V1-only updateClassificationText(null, allVertices)
  guarded by !isTagV2Enabled(). Remove now-unused DynamicConfigStore import.

Part 1 of Tags V1 cleanup. Refs: MS-751

Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* Fix AtlasEntityHeader constructors to preserve docId, vertexId, and superTypeNames (#6304)

The copy constructor AtlasEntityHeader(AtlasEntityHeader) and the entity-based
constructor AtlasEntityHeader(AtlasEntity) were not copying docId, vertexId, or
superTypeNames fields. When these constructors are used in the notification
pipeline (e.g., convertDiffEntityToHeader), headers with null docId propagate
to Elasticsearch, causing ES documents to lose their document sync references.
This results in assets appearing as "not found" in the UI.

Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* bulk purge released forced refresh (#6323)

* fix(icarus): dynamic JVM options for memory management and adjust CPU limits (#6330)

* Add atlas_vertex_index ES alias for janusgraph_vertex_index on startup (#6336)

Create a stable ES alias "atlas_vertex_index" pointing to the actual
vertex index (e.g. janusgraph_vertex_index) during startup. This allows
consumers to use a backend-agnostic index name. The alias is created
once (idempotent check on every startup) and is best-effort — failures
do not block Atlas startup.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: allow apostrophe in link URL validation regex (#6342)

* fix: allow apostrophe in link URL validation regex

Add unit tests for LinkPreProcessor URL validation.
Add branch to CI for testing.

* remove custom branch

* added tests for evaluator, accessor, bulk unique attr

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS (#6347)

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS

Remove multiarch preferredDuringSchedulingIgnoredDuringExecution blocks
from the nodeAffinity section of the cassandra-online-dc StatefulSet in
atlas-read. These blocks caused both soft (preferred) and hard (required)
affinity rules to coexist when multiarch was enabled, leading to mixed
affinity behavior. The STS now matches the normal atlas cassandra STS
which only uses requiredDuringSchedulingIgnoredDuringExecution in
non-Development/Enterprise deployments.

Fixes: MS-803

Co-Authored-By: Claude Code <noreply@anthropic.com>

* commit

---------

Co-authored-by: Claude Code <noreply@anthropic.com>

* added atlas mcp observability skills (#6315)

* added atlas mcp skills

* Removed hard paths in mcp.json

* docs(cohort-release): Add auto-sync check, dynamic redistribution, and release channel filtering (#6366)

- Document auto-sync safety check that skips tenants without ArgoCD auto-sync
- Add dynamic ring redistribution section (quarterly automation, data sources)
- Document release channel filtering (MAIN-BASE, GOLDEN-MAIN-BASE only)
- Add release result states explanation (success, partial_success, failed, skipped)
- Update tenant counts and asset ranges in runbook
- Add gotchas for skipped tenants and release channel exclusions

Made-with: Cursor

* added glossary and attribute test

* feat: reduce icarus memory from 4Gi to 2Gi (#6380)

* Switch entity_audits to niofs store type to free page cache for vertex index (#6324)

* Switch entity_audits ES index to niofs store type to eliminate page cache contention

entity_audits uses the default hybridfs store type, which memory-maps all segment
files at index open time. On production clusters, this consumes 19-400GB of virtual
address space per node, competing with janusgraph_vertex_index for the OS page cache
and degrading search performance.

niofs uses Java NIO FileChannel.read() instead of mmap — audit pages only enter the
page cache during active queries and are easily evictable, freeing page cache for the
vertex index that actually needs it.

Changes:
- ESBasedAuditRepository: add ensureStoreTypeNiofs() to createSession() startup flow.
  Uses a marker document (HEAD check) so the close/open migration runs exactly once
  across all pods and all future deployments (~1ms no-op on subsequent startups).
- es-audit-mappings.json: add settings block with store.type=niofs and
  refresh_interval=60s so new indices are created with niofs from the start.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move writeNiofsMigrationMarker() into finally block per review feedback

Only write the marker when both the settings update and index reopen
succeed, preventing partial migration state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change (#6381)

* test(MS-701): add failing integration test for missing parent UPDATE on sub-asset add

When a sub-asset (Process) is added with a relationship to a parent (Table)
via bulk createOrUpdate, and the parent's own attributes haven't changed,
the parent entity is incorrectly excluded from the UPDATE response and
Kafka notifications.

This test demonstrates the bug: it creates a Table, then sends a bulk
request with the same unchanged Table + a new Process referencing it.
The assertion that Table appears as UPDATED will fail until the fix
is applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change

When a sub-asset is added/deleted/restored via bulk createOrUpdate and
the parent entity's own attributes are unchanged, the parent was
incorrectly excluded from UPDATE notifications. This happened because
RequestContext.recordEntityUpdate() checks entitiesToSkipUpdate, which
blocks entities whose attributes didn't change — even when their
relationships did change.

Fix: Add recordEntityUpdateForRelationshipChange() to RequestContext
that bypasses the entitiesToSkipUpdate check (following the existing
pattern of recordEntityUpdateForNonRelationshipAttributes). Update all
call sites in EntityGraphMapper and DeleteHandlerV1 that record parent
entity updates due to relationship edge creation/deletion to use this
new method.

Affected call sites:
- EntityGraphMapper.recordEntityUpdate(vertex) — simple relationship update
- EntityGraphMapper.recordEntityUpdate(vertex, ctx, isAdd) — sub-asset add/remove
- EntityGraphMapper inverse reference update (line ~1563)
- DeleteHandlerV1.deleteEdgeReference() — both relationship and legacy edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(MS-701): add SubAssetAddParentUpdateNotificationTest to CI integration test list

The integration-tests.yml workflow uses an explicit -Dtest= list.
Without this change, the new test would never run in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): rewrite test to use AtlasInProcessBaseIT instead of Docker

The CI integration tests use AtlasInProcessBaseIT (starts Atlas in-process
via Jetty with testcontainers for infra). The previous test extended
AtlasDockerIntegrationTest which requires a private atlanhq/atlas:test
Docker image not available in CI.

Rewritten to use AtlasClientV2 API with the same test scenario: create
Table, then bulk createOrUpdate with unchanged Table + new Process
referencing it, and assert the Table appears as UPDATED in the response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(MS-701): add Kafka ENTITY_UPDATE notification assertion for parent Table

The test now verifies both:
1. REST response: Table appears in updatedEntities (existing)
2. Kafka: ENTITY_UPDATE notification emitted for Table on ATLAS_ENTITIES topic

Uses ApplicationProperties to get kafka bootstrap servers (same pattern
as AsyncIngestionIntegrationTest). Polls ATLAS_ENTITIES topic filtering
by GUID + operationType + eventTime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): Validate build ran on ring branch, not just matching SHA (#6389)

The pr-label-release workflow was checking only head_sha when validating
builds, allowing releases to proceed using builds from non-ring branches
that happened to share the same commit SHA.

This caused an incident where ring-ms-864-keycloak-jwks-fix used a build
from ms-864-keycloak-jwks-internal-url (both pointing to the same SHA)
without any actual build running on the ring branch.

Add branch validation for both maven build and integration tests to ensure
the workflows actually ran on the expected ring branch.

Made-with: Cursor

* GOV-667 | Add duplicate policy name validation for Persona entities (#6375)

* GOV-667: Validate if policy name exists or not

* GOV-667: Removed comments

* GOV-667: Added unit tests

* GOV-667: allow purposes to have same names

* GOV-667: Fix minor issues

* GOV_667: only check for persona

* GOV-667: Changes to correctly perform unit tests

* GOV-667: Resolved review comments

* GOVFOUN-235: v1 implementation for Datasets (#6172)

* GOVFOUN-235: v1 implementation for Datasets

* GOVFOUN-235: normalize datasetType

* GOVFOUN-235: implement delete and make Qn immutable

* GOVFOUN-235: block updates to element count attr

* GOVFOUN-235: allow dataset to be linked to domain

* GOVFOUN-235: fix delete type

* GOVFOUN-235: Added tests

* GOVFOUN-235: Fixed typeDefs

* GOVFOUN-235: Fix tests

* GOVFOUN-235: fix failing test

* GOVFOUN-235: Fix minor big

* GOVFOUN-235: allow admins to edit resources

* GVOFOUN-235: Enrich dataset info for audit

* GOVFOUN-235: Changes after typeDef review

* GOVFOUN-235: fix tests

* GOVFOUN-235: Resolve reviews

* GOVFOUN-235: Reverting previous commit

* fix: (MS-609) Improving Task Lifecycl Management in Apps Team Workflows (#6395)

* updated tests

---------

Co-authored-by: Arnab Saha <arniesaha@gmail.com>
Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>
Co-authored-by: mothership-ai[bot] <246624273+mothership-ai[bot]@users.noreply.github.com>
Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Syed <150783904+syed-atlan@users.noreply.github.com>
Co-authored-by: LijiAlex <liji.a@atlan.com>
Co-authored-by: Hitesh Khandelwal <60309732+hitk6@users.noreply.github.com>
Co-authored-by: krishnanunni-atlan <krishnanunni.m@atlan.com>
Co-authored-by: ankitpatnaik-atlan <ankit.patnaik@atlan.com>
Co-authored-by: salman-atlan <salman.khurshid@atlan.com>

* remove unwanted files

* removed checks ginore

* Testing harness extended (#6470)

* chore: remove Tags V1 dead code from propagation tasks and ClassificationAssociator (#6305)

- ClassificationPropagationTasks: remove isTagV2Enabled() branches in Add, UpdateText,
  Delete, and RefreshPropagation tasks. V2 path (Cassandra) is now always taken.
  Also removed unused previousRestrictPropagation* local vars from Add.run().
- ClassificationAssociator: remove V1-only updateClassificationText(null, allVertices)
  guarded by !isTagV2Enabled(). Remove now-unused DynamicConfigStore import.

Part 1 of Tags V1 cleanup. Refs: MS-751

Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* Fix AtlasEntityHeader constructors to preserve docId, vertexId, and superTypeNames (#6304)

The copy constructor AtlasEntityHeader(AtlasEntityHeader) and the entity-based
constructor AtlasEntityHeader(AtlasEntity) were not copying docId, vertexId, or
superTypeNames fields. When these constructors are used in the notification
pipeline (e.g., convertDiffEntityToHeader), headers with null docId propagate
to Elasticsearch, causing ES documents to lose their document sync references.
This results in assets appearing as "not found" in the UI.

Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* bulk purge released forced refresh (#6323)

* fix(icarus): dynamic JVM options for memory management and adjust CPU limits (#6330)

* Add atlas_vertex_index ES alias for janusgraph_vertex_index on startup (#6336)

Create a stable ES alias "atlas_vertex_index" pointing to the actual
vertex index (e.g. janusgraph_vertex_index) during startup. This allows
consumers to use a backend-agnostic index name. The alias is created
once (idempotent check on every startup) and is best-effort — failures
do not block Atlas startup.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: allow apostrophe in link URL validation regex (#6342)

* fix: allow apostrophe in link URL validation regex

Add unit tests for LinkPreProcessor URL validation.
Add branch to CI for testing.

* remove custom branch

* added tests for evaluator, accessor, bulk unique attr

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS (#6347)

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS

Remove multiarch preferredDuringSchedulingIgnoredDuringExecution blocks
from the nodeAffinity section of the cassandra-online-dc StatefulSet in
atlas-read. These blocks caused both soft (preferred) and hard (required)
affinity rules to coexist when multiarch was enabled, leading to mixed
affinity behavior. The STS now matches the normal atlas cassandra STS
which only uses requiredDuringSchedulingIgnoredDuringExecution in
non-Development/Enterprise deployments.

Fixes: MS-803

Co-Authored-By: Claude Code <noreply@anthropic.com>

* commit

---------

Co-authored-by: Claude Code <noreply@anthropic.com>

* added atlas mcp observability skills (#6315)

* added atlas mcp skills

* Removed hard paths in mcp.json

* docs(cohort-release): Add auto-sync check, dynamic redistribution, and release channel filtering (#6366)

- Document auto-sync safety check that skips tenants without ArgoCD auto-sync
- Add dynamic ring redistribution section (quarterly automation, data sources)
- Document release channel filtering (MAIN-BASE, GOLDEN-MAIN-BASE only)
- Add release result states explanation (success, partial_success, failed, skipped)
- Update tenant counts and asset ranges in runbook
- Add gotchas for skipped tenants and release channel exclusions

Made-with: Cursor

* added glossary and attribute test

* feat: reduce icarus memory from 4Gi to 2Gi (#6380)

* Switch entity_audits to niofs store type to free page cache for vertex index (#6324)

* Switch entity_audits ES index to niofs store type to eliminate page cache contention

entity_audits uses the default hybridfs store type, which memory-maps all segment
files at index open time. On production clusters, this consumes 19-400GB of virtual
address space per node, competing with janusgraph_vertex_index for the OS page cache
and degrading search performance.

niofs uses Java NIO FileChannel.read() instead of mmap — audit pages only enter the
page cache during active queries and are easily evictable, freeing page cache for the
vertex index that actually needs it.

Changes:
- ESBasedAuditRepository: add ensureStoreTypeNiofs() to createSession() startup flow.
  Uses a marker document (HEAD check) so the close/open migration runs exactly once
  across all pods and all future deployments (~1ms no-op on subsequent startups).
- es-audit-mappings.json: add settings block with store.type=niofs and
  refresh_interval=60s so new indices are created with niofs from the start.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move writeNiofsMigrationMarker() into finally block per review feedback

Only write the marker when both the settings update and index reopen
succeed, preventing partial migration state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change (#6381)

* test(MS-701): add failing integration test for missing parent UPDATE on sub-asset add

When a sub-asset (Process) is added with a relationship to a parent (Table)
via bulk createOrUpdate, and the parent's own attributes haven't changed,
the parent entity is incorrectly excluded from the UPDATE response and
Kafka notifications.

This test demonstrates the bug: it creates a Table, then sends a bulk
request with the same unchanged Table + a new Process referencing it.
The assertion that Table appears as UPDATED will fail until the fix
is applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change

When a sub-asset is added/deleted/restored via bulk createOrUpdate and
the parent entity's own attributes are unchanged, the parent was
incorrectly excluded from UPDATE notifications. This happened because
RequestContext.recordEntityUpdate() checks entitiesToSkipUpdate, which
blocks entities whose attributes didn't change — even when their
relationships did change.

Fix: Add recordEntityUpdateForRelationshipChange() to RequestContext
that bypasses the entitiesToSkipUpdate check (following the existing
pattern of recordEntityUpdateForNonRelationshipAttributes). Update all
call sites in EntityGraphMapper and DeleteHandlerV1 that record parent
entity updates due to relationship edge creation/deletion to use this
new method.

Affected call sites:
- EntityGraphMapper.recordEntityUpdate(vertex) — simple relationship update
- EntityGraphMapper.recordEntityUpdate(vertex, ctx, isAdd) — sub-asset add/remove
- EntityGraphMapper inverse reference update (line ~1563)
- DeleteHandlerV1.deleteEdgeReference() — both relationship and legacy edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(MS-701): add SubAssetAddParentUpdateNotificationTest to CI integration test list

The integration-tests.yml workflow uses an explicit -Dtest= list.
Without this change, the new test would never run in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): rewrite test to use AtlasInProcessBaseIT instead of Docker

The CI integration tests use AtlasInProcessBaseIT (starts Atlas in-process
via Jetty with testcontainers for infra). The previous test extended
AtlasDockerIntegrationTest which requires a private atlanhq/atlas:test
Docker image not available in CI.

Rewritten to use AtlasClientV2 API with the same test scenario: create
Table, then bulk createOrUpdate with unchanged Table + new Process
referencing it, and assert the Table appears as UPDATED in the response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(MS-701): add Kafka ENTITY_UPDATE notification assertion for parent Table

The test now verifies both:
1. REST response: Table appears in updatedEntities (existing)
2. Kafka: ENTITY_UPDATE notification emitted for Table on ATLAS_ENTITIES topic

Uses ApplicationProperties to get kafka bootstrap servers (same pattern
as AsyncIngestionIntegrationTest). Polls ATLAS_ENTITIES topic filtering
by GUID + operationType + eventTime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): Validate build ran on ring branch, not just matching SHA (#6389)

The pr-label-release workflow was checking only head_sha when validating
builds, allowing releases to proceed using builds from non-ring branches
that happened to share the same commit SHA.

This caused an incident where ring-ms-864-keycloak-jwks-fix used a build
from ms-864-keycloak-jwks-internal-url (both pointing to the same SHA)
without any actual build running on the ring branch.

Add branch validation for both maven build and integration tests to ensure
the workflows actually ran on the expected ring branch.

Made-with: Cursor

* GOV-667 | Add duplicate policy name validation for Persona entities (#6375)

* GOV-667: Validate if policy name exists or not

* GOV-667: Removed comments

* GOV-667: Added unit tests

* GOV-667: allow purposes to have same names

* GOV-667: Fix minor issues

* GOV_667: only check for persona

* GOV-667: Changes to correctly perform unit tests

* GOV-667: Resolved review comments

* GOVFOUN-235: v1 implementation for Datasets (#6172)

* GOVFOUN-235: v1 implementation for Datasets

* GOVFOUN-235: normalize datasetType

* GOVFOUN-235: implement delete and make Qn immutable

* GOVFOUN-235: block updates to element count attr

* GOVFOUN-235: allow dataset to be linked to domain

* GOVFOUN-235: fix delete type

* GOVFOUN-235: Added tests

* GOVFOUN-235: Fixed typeDefs

* GOVFOUN-235: Fix tests

* GOVFOUN-235: fix failing test

* GOVFOUN-235: Fix minor big

* GOVFOUN-235: allow admins to edit resources

* GVOFOUN-235: Enrich dataset info for audit

* GOVFOUN-235: Changes after typeDef review

* GOVFOUN-235: fix tests

* GOVFOUN-235: Resolve reviews

* GOVFOUN-235: Reverting previous commit

* fix: (MS-609) Improving Task Lifecycl Management in Apps Team Workflows (#6395)

* updated tests

* fixed tests

---------

Co-authored-by: Arnab Saha <arniesaha@gmail.com>
Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>
Co-authored-by: mothership-ai[bot] <246624273+mothership-ai[bot]@users.noreply.github.com>
Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Syed <150783904+syed-atlan@users.noreply.github.com>
Co-authored-by: LijiAlex <liji.a@atlan.com>
Co-authored-by: Hitesh Khandelwal <60309732+hitk6@users.noreply.github.com>
Co-authored-by: krishnanunni-atlan <krishnanunni.m@atlan.com>
Co-authored-by: ankitpatnaik-atlan <ankit.patnaik@atlan.com>
Co-authored-by: salman-atlan <salman.khurshid@atlan.com>

---------

Co-authored-by: Arnab Saha <arniesaha@gmail.com>
Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>
Co-authored-by: mothership-ai[bot] <246624273+mothership-ai[bot]@users.noreply.github.com>
Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Syed <150783904+syed-atlan@users.noreply.github.com>
Co-authored-by: LijiAlex <liji.a@atlan.com>
Co-authored-by: Hitesh Khandelwal <60309732+hitk6@users.noreply.github.com>
Co-authored-by: krishnanunni-atlan <krishnanunni.m@atlan.com>
Co-authored-by: ankitpatnaik-atlan <ankit.patnaik@atlan.com>
Co-authored-by: salman-atlan <salman.khurshid@atlan.com>

* Atlas testing harness (#6483)

* added fix for deletion for <10K assets ES sync

* BulkPurgeService with crash recovery, concurrency safety, and ES cleanup

* removed unnecessary configs

* hardcoded the self graph loading to make new object

* made config fetching synchronized across threads

* fixed cancel API in case of cancel+ cancel or cancel+purgeTrigger

* forced cancel should be cleaned post cancellation

* added the case fix for cancelPurge-> retrigger-> cancel

* enforced worker queue to be called only on force cancel

* addressed comments, moved cleanup to finally

* added test harness suit for all endpoints first commit

* added tests for harnesS

* fixed tests and data for harness

* fixed tests, added more tests related to deletes

* fixed time for es sync along with flagging

* added kafka helpers and more test on entity behaviour

* added lineage and propagation tests

* added tests for lineages and all

* added tests for lineages and all

* added the pending tests for no-op

* fixed audit search index time by adding correct filters

* fixed typedef, lineage, classification and busiess metadata tests

* fixed glossary tests as per the UI

* checkpoint benchmark1

* increased timeouts for searches

* fixed lineage specific tests

* updated suite

* added tests for evaluator, accessor, bulk unique attr (#6345)

* Testing harness extended (#6370)

* added tests for evaluator, accessor, bulk unique attr

* added glossary and attribute test

* Testing harness extended (#6401)

* chore: remove Tags V1 dead code from propagation tasks and ClassificationAssociator (#6305)

- ClassificationPropagationTasks: remove isTagV2Enabled() branches in Add, UpdateText,
  Delete, and RefreshPropagation tasks. V2 path (Cassandra) is now always taken.
  Also removed unused previousRestrictPropagation* local vars from Add.run().
- ClassificationAssociator: remove V1-only updateClassificationText(null, allVertices)
  guarded by !isTagV2Enabled(). Remove now-unused DynamicConfigStore import.

Part 1 of Tags V1 cleanup. Refs: MS-751

Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* Fix AtlasEntityHeader constructors to preserve docId, vertexId, and superTypeNames (#6304)

The copy constructor AtlasEntityHeader(AtlasEntityHeader) and the entity-based
constructor AtlasEntityHeader(AtlasEntity) were not copying docId, vertexId, or
superTypeNames fields. When these constructors are used in the notification
pipeline (e.g., convertDiffEntityToHeader), headers with null docId propagate
to Elasticsearch, causing ES documents to lose their document sync references.
This results in assets appearing as "not found" in the UI.

Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* bulk purge released forced refresh (#6323)

* fix(icarus): dynamic JVM options for memory management and adjust CPU limits (#6330)

* Add atlas_vertex_index ES alias for janusgraph_vertex_index on startup (#6336)

Create a stable ES alias "atlas_vertex_index" pointing to the actual
vertex index (e.g. janusgraph_vertex_index) during startup. This allows
consumers to use a backend-agnostic index name. The alias is created
once (idempotent check on every startup) and is best-effort — failures
do not block Atlas startup.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: allow apostrophe in link URL validation regex (#6342)

* fix: allow apostrophe in link URL validation regex

Add unit tests for LinkPreProcessor URL validation.
Add branch to CI for testing.

* remove custom branch

* added tests for evaluator, accessor, bulk unique attr

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS (#6347)

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS

Remove multiarch preferredDuringSchedulingIgnoredDuringExecution blocks
from the nodeAffinity section of the cassandra-online-dc StatefulSet in
atlas-read. These blocks caused both soft (preferred) and hard (required)
affinity rules to coexist when multiarch was enabled, leading to mixed
affinity behavior. The STS now matches the normal atlas cassandra STS
which only uses requiredDuringSchedulingIgnoredDuringExecution in
non-Development/Enterprise deployments.

Fixes: MS-803

Co-Authored-By: Claude Code <noreply@anthropic.com>

* commit

---------

Co-authored-by: Claude Code <noreply@anthropic.com>

* added atlas mcp observability skills (#6315)

* added atlas mcp skills

* Removed hard paths in mcp.json

* docs(cohort-release): Add auto-sync check, dynamic redistribution, and release channel filtering (#6366)

- Document auto-sync safety check that skips tenants without ArgoCD auto-sync
- Add dynamic ring redistribution section (quarterly automation, data sources)
- Document release channel filtering (MAIN-BASE, GOLDEN-MAIN-BASE only)
- Add release result states explanation (success, partial_success, failed, skipped)
- Update tenant counts and asset ranges in runbook
- Add gotchas for skipped tenants and release channel exclusions

Made-with: Cursor

* added glossary and attribute test

* feat: reduce icarus memory from 4Gi to 2Gi (#6380)

* Switch entity_audits to niofs store type to free page cache for vertex index (#6324)

* Switch entity_audits ES index to niofs store type to eliminate page cache contention

entity_audits uses the default hybridfs store type, which memory-maps all segment
files at index open time. On production clusters, this consumes 19-400GB of virtual
address space per node, competing with janusgraph_vertex_index for the OS page cache
and degrading search performance.

niofs uses Java NIO FileChannel.read() instead of mmap — audit pages only enter the
page cache during active queries and are easily evictable, freeing page cache for the
vertex index that actually needs it.

Changes:
- ESBasedAuditRepository: add ensureStoreTypeNiofs() to createSession() startup flow.
  Uses a marker document (HEAD check) so the close/open migration runs exactly once
  across all pods and all future deployments (~1ms no-op on subsequent startups).
- es-audit-mappings.json: add settings block with store.type=niofs and
  refresh_interval=60s so new indices are created with niofs from the start.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move writeNiofsMigrationMarker() into finally block per review feedback

Only write the marker when both the settings update and index reopen
succeed, preventing partial migration state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change (#6381)

* test(MS-701): add failing integration test for missing parent UPDATE on sub-asset add

When a sub-asset (Process) is added with a relationship to a parent (Table)
via bulk createOrUpdate, and the parent's own attributes haven't changed,
the parent entity is incorrectly excluded from the UPDATE response and
Kafka notifications.

This test demonstrates the bug: it creates a Table, then sends a bulk
request with the same unchanged Table + a new Process referencing it.
The assertion that Table appears as UPDATED will fail until the fix
is applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change

When a sub-asset is added/deleted/restored via bulk createOrUpdate and
the parent entity's own attributes are unchanged, the parent was
incorrectly excluded from UPDATE notifications. This happened because
RequestContext.recordEntityUpdate() checks entitiesToSkipUpdate, which
blocks entities whose attributes didn't change — even when their
relationships did change.

Fix: Add recordEntityUpdateForRelationshipChange() to RequestContext
that bypasses the entitiesToSkipUpdate check (following the existing
pattern of recordEntityUpdateForNonRelationshipAttributes). Update all
call sites in EntityGraphMapper and DeleteHandlerV1 that record parent
entity updates due to relationship edge creation/deletion to use this
new method.

Affected call sites:
- EntityGraphMapper.recordEntityUpdate(vertex) — simple relationship update
- EntityGraphMapper.recordEntityUpdate(vertex, ctx, isAdd) — sub-asset add/remove
- EntityGraphMapper inverse reference update (line ~1563)
- DeleteHandlerV1.deleteEdgeReference() — both relationship and legacy edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(MS-701): add SubAssetAddParentUpdateNotificationTest to CI integration test list

The integration-tests.yml workflow uses an explicit -Dtest= list.
Without this change, the new test would never run in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): rewrite test to use AtlasInProcessBaseIT instead of Docker

The CI integration tests use AtlasInProcessBaseIT (starts Atlas in-process
via Jetty with testcontainers for infra). The previous test extended
AtlasDockerIntegrationTest which requires a private atlanhq/atlas:test
Docker image not available in CI.

Rewritten to use AtlasClientV2 API with the same test scenario: create
Table, then bulk createOrUpdate with unchanged Table + new Process
referencing it, and assert the Table appears as UPDATED in the response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(MS-701): add Kafka ENTITY_UPDATE notification assertion for parent Table

The test now verifies both:
1. REST response: Table appears in updatedEntities (existing)
2. Kafka: ENTITY_UPDATE notification emitted for Table on ATLAS_ENTITIES topic

Uses ApplicationProperties to get kafka bootstrap servers (same pattern
as AsyncIngestionIntegrationTest). Polls ATLAS_ENTITIES topic filtering
by GUID + operationType + eventTime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): Validate build ran on ring branch, not just matching SHA (#6389)

The pr-label-release workflow was checking only head_sha when validating
builds, allowing releases to proceed using builds from non-ring branches
that happened to share the same commit SHA.

This caused an incident where ring-ms-864-keycloak-jwks-fix used a build
from ms-864-keycloak-jwks-internal-url (both pointing to the same SHA)
without any actual build running on the ring branch.

Add branch validation for both maven build and integration tests to ensure
the workflows actually ran on the expected ring branch.

Made-with: Cursor

* GOV-667 | Add duplicate policy name validation for Persona entities (#6375)

* GOV-667: Validate if policy name exists or not

* GOV-667: Removed comments

* GOV-667: Added unit tests

* GOV-667: allow purposes to have same names

* GOV-667: Fix minor issues

* GOV_667: only check for persona

* GOV-667: Changes to correctly perform unit tests

* GOV-667: Resolved review comments

* GOVFOUN-235: v1 implementation for Datasets (#6172)

* GOVFOUN-235: v1 implementation for Datasets

* GOVFOUN-235: normalize datasetType

* GOVFOUN-235: implement delete and make Qn immutable

* GOVFOUN-235: block updates to element count attr

* GOVFOUN-235: allow dataset to be linked to domain

* GOVFOUN-235: fix delete type

* GOVFOUN-235: Added tests

* GOVFOUN-235: Fixed typeDefs

* GOVFOUN-235: Fix tests

* GOVFOUN-235: fix failing test

* GOVFOUN-235: Fix minor big

* GOVFOUN-235: allow admins to edit resources

* GVOFOUN-235: Enrich dataset info for audit

* GOVFOUN-235: Changes after typeDef review

* GOVFOUN-235: fix tests

* GOVFOUN-235: Resolve reviews

* GOVFOUN-235: Reverting previous commit

* fix: (MS-609) Improving Task Lifecycl Management in Apps Team Workflows (#6395)

* updated tests

---------

Co-authored-by: Arnab Saha <arniesaha@gmail.com>
Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>
Co-authored-by: mothership-ai[bot] <246624273+mothership-ai[bot]@users.noreply.github.com>
Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Syed <150783904+syed-atlan@users.noreply.github.com>
Co-authored-by: LijiAlex <liji.a@atlan.com>
Co-authored-by: Hitesh Khandelwal <60309732+hitk6@users.noreply.github.com>
Co-authored-by: krishnanunni-atlan <krishnanunni.m@atlan.com>
Co-authored-by: ankitpatnaik-atlan <ankit.patnaik@atlan.com>
Co-authored-by: salman-atlan <salman.khurshid@atlan.com>

* remove unwanted files

* removed checks ginore

* Testing harness extended (#6470)

* chore: remove Tags V1 dead code from propagation tasks and ClassificationAssociator (#6305)

- ClassificationPropagationTasks: remove isTagV2Enabled() branches in Add, UpdateText,
  Delete, and RefreshPropagation tasks. V2 path (Cassandra) is now always taken.
  Also removed unused previousRestrictPropagation* local vars from Add.run().
- ClassificationAssociator: remove V1-only updateClassificationText(null, allVertices)
  guarded by !isTagV2Enabled(). Remove now-unused DynamicConfigStore import.

Part 1 of Tags V1 cleanup. Refs: MS-751

Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* Fix AtlasEntityHeader constructors to preserve docId, vertexId, and superTypeNames (#6304)

The copy constructor AtlasEntityHeader(AtlasEntityHeader) and the entity-based
constructor AtlasEntityHeader(AtlasEntity) were not copying docId, vertexId, or
superTypeNames fields. When these constructors are used in the notification
pipeline (e.g., convertDiffEntityToHeader), headers with null docId propagate
to Elasticsearch, causing ES documents to lose their document sync references.
This results in assets appearing as "not found" in the UI.

Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>

* bulk purge released forced refresh (#6323)

* fix(icarus): dynamic JVM options for memory management and adjust CPU limits (#6330)

* Add atlas_vertex_index ES alias for janusgraph_vertex_index on startup (#6336)

Create a stable ES alias "atlas_vertex_index" pointing to the actual
vertex index (e.g. janusgraph_vertex_index) during startup. This allows
consumers to use a backend-agnostic index name. The alias is created
once (idempotent check on every startup) and is best-effort — failures
do not block Atlas startup.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: allow apostrophe in link URL validation regex (#6342)

* fix: allow apostrophe in link URL validation regex

Add unit tests for LinkPreProcessor URL validation.
Add branch to CI for testing.

* remove custom branch

* added tests for evaluator, accessor, bulk unique attr

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS (#6347)

* fix(helm): disable soft affinity for atlas-read cassandra-online-dc STS

Remove multiarch preferredDuringSchedulingIgnoredDuringExecution blocks
from the nodeAffinity section of the cassandra-online-dc StatefulSet in
atlas-read. These blocks caused both soft (preferred) and hard (required)
affinity rules to coexist when multiarch was enabled, leading to mixed
affinity behavior. The STS now matches the normal atlas cassandra STS
which only uses requiredDuringSchedulingIgnoredDuringExecution in
non-Development/Enterprise deployments.

Fixes: MS-803

Co-Authored-By: Claude Code <noreply@anthropic.com>

* commit

---------

Co-authored-by: Claude Code <noreply@anthropic.com>

* added atlas mcp observability skills (#6315)

* added atlas mcp skills

* Removed hard paths in mcp.json

* docs(cohort-release): Add auto-sync check, dynamic redistribution, and release channel filtering (#6366)

- Document auto-sync safety check that skips tenants without ArgoCD auto-sync
- Add dynamic ring redistribution section (quarterly automation, data sources)
- Document release channel filtering (MAIN-BASE, GOLDEN-MAIN-BASE only)
- Add release result states explanation (success, partial_success, failed, skipped)
- Update tenant counts and asset ranges in runbook
- Add gotchas for skipped tenants and release channel exclusions

Made-with: Cursor

* added glossary and attribute test

* feat: reduce icarus memory from 4Gi to 2Gi (#6380)

* Switch entity_audits to niofs store type to free page cache for vertex index (#6324)

* Switch entity_audits ES index to niofs store type to eliminate page cache contention

entity_audits uses the default hybridfs store type, which memory-maps all segment
files at index open time. On production clusters, this consumes 19-400GB of virtual
address space per node, competing with janusgraph_vertex_index for the OS page cache
and degrading search performance.

niofs uses Java NIO FileChannel.read() instead of mmap — audit pages only enter the
page cache during active queries and are easily evictable, freeing page cache for the
vertex index that actually needs it.

Changes:
- ESBasedAuditRepository: add ensureStoreTypeNiofs() to createSession() startup flow.
  Uses a marker document (HEAD check) so the close/open migration runs exactly once
  across all pods and all future deployments (~1ms no-op on subsequent startups).
- es-audit-mappings.json: add settings block with store.type=niofs and
  refresh_interval=60s so new indices are created with niofs from the start.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move writeNiofsMigrationMarker() into finally block per review feedback

Only write the marker when both the settings update and index reopen
succeed, preventing partial migration state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change (#6381)

* test(MS-701): add failing integration test for missing parent UPDATE on sub-asset add

When a sub-asset (Process) is added with a relationship to a parent (Table)
via bulk createOrUpdate, and the parent's own attributes haven't changed,
the parent entity is incorrectly excluded from the UPDATE response and
Kafka notifications.

This test demonstrates the bug: it creates a Table, then sends a bulk
request with the same unchanged Table + a new Process referencing it.
The assertion that Table appears as UPDATED will fail until the fix
is applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): emit parent entity UPDATE event on sub-asset relationship change

When a sub-asset is added/deleted/restored via bulk createOrUpdate and
the parent entity's own attributes are unchanged, the parent was
incorrectly excluded from UPDATE notifications. This happened because
RequestContext.recordEntityUpdate() checks entitiesToSkipUpdate, which
blocks entities whose attributes didn't change — even when their
relationships did change.

Fix: Add recordEntityUpdateForRelationshipChange() to RequestContext
that bypasses the entitiesToSkipUpdate check (following the existing
pattern of recordEntityUpdateForNonRelationshipAttributes). Update all
call sites in EntityGraphMapper and DeleteHandlerV1 that record parent
entity updates due to relationship edge creation/deletion to use this
new method.

Affected call sites:
- EntityGraphMapper.recordEntityUpdate(vertex) — simple relationship update
- EntityGraphMapper.recordEntityUpdate(vertex, ctx, isAdd) — sub-asset add/remove
- EntityGraphMapper inverse reference update (line ~1563)
- DeleteHandlerV1.deleteEdgeReference() — both relationship and legacy edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(MS-701): add SubAssetAddParentUpdateNotificationTest to CI integration test list

The integration-tests.yml workflow uses an explicit -Dtest= list.
Without this change, the new test would never run in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MS-701): rewrite test to use AtlasInProcessBaseIT instead of Docker

The CI integration tests use AtlasInProcessBaseIT (starts Atlas in-process
via Jetty with testcontainers for infra). The previous test extended
AtlasDockerIntegrationTest which requires a private atlanhq/atlas:test
Docker image not available in CI.

Rewritten to use AtlasClientV2 API with the same test scenario: create
Table, then bulk createOrUpdate with unchanged Table + new Process
referencing it, and assert the Table appears as UPDATED in the response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(MS-701): add Kafka ENTITY_UPDATE notification assertion for parent Table

The test now verifies both:
1. REST response: Table appears in updatedEntities (existing)
2. Kafka: ENTITY_UPDATE notification emitted for Table on ATLAS_ENTITIES topic

Uses ApplicationProperties to get kafka bootstrap servers (same pattern
as AsyncIngestionIntegrationTest). Polls ATLAS_ENTITIES topic filtering
by GUID + operationType + eventTime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): Validate build ran on ring branch, not just matching SHA (#6389)

The pr-label-release workflow was checking only head_sha when validating
builds, allowing releases to proceed using builds from non-ring branches
that happened to share the same commit SHA.

This caused an incident where ring-ms-864-keycloak-jwks-fix used a build
from ms-864-keycloak-jwks-internal-url (both pointing to the same SHA)
without any actual build running on the ring branch.

Add branch validation for both maven build and integration tests to ensure
the workflows actually ran on the expected ring branch.

Made-with: Cursor

* GOV-667 | Add duplicate policy name validation for Persona entities (#6375)

* GOV-667: Validate if policy name exists or not

* GOV-667: Removed comments

* GOV-667: Added unit tests

* GOV-667: allow purposes to have same names

* GOV-667: Fix minor issues

* GOV_667: only check for persona

* GOV-667: Changes to correctly perform unit tests

* GOV-667: Resolved review comments

* GOVFOUN-235: v1 implementation for Datasets (#6172)

* GOVFOUN-235: v1 implementation for Datasets

* GOVFOUN-235: normalize datasetType

* GOVFOUN-235: implement delete and make Qn immutable

* GOVFOUN-235: block updates to element count attr

* GOVFOUN-235: allow dataset to be linked to domain

* GOVFOUN-235: fix delete type

* GOVFOUN-235: Added tests

* GOVFOUN-235: Fixed typeDefs

* GOVFOUN-235: Fix tests

* GOVFOUN-235: fix failing test

* GOVFOUN-235: Fix minor big

* GOVFOUN-235: allow admins to edit resources

* GVOFOUN-235: Enrich dataset info for audit

* GOVFOUN-235: Changes after typeDef review

* GOVFOUN-235: fix tests

* GOVFOUN-235: Resolve reviews

* GOVFOUN-235: Reverting previous commit

* fix: (MS-609) Improving Task Lifecycl Management in Apps Team Workflows (#6395)

* updated tests

* fixed tests

---------

Co-authored-by: Arnab Saha <arniesaha@gmail.com>
Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>
Co-authored-by: mothership-ai[bot] <246624273+mothership-ai[bot]@users.noreply.github.com>
Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Syed <150783904+syed-atlan@users.noreply.github.com>
Co-authored-by: LijiAlex <liji.a@atlan.com>
Co-authored-by: Hitesh Khandelwal <60309732+hitk6@users.noreply.github.com>
Co-authored-by: krishnanunni-atlan <krishnanunni.m@atlan.com>
Co-authored-by: ankitpatnaik-atlan <ankit.patnaik@atlan.com>
Co-authored-by: salman-atlan <salman.khurshid@atlan.com>

---------

Co-authored-by: Arnab Saha <arniesaha@gmail.com>
Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>
Co-authored-by: mothership-ai[bot] <246624273+mothership-ai[bot]@users.noreply.github.com>
Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Syed <150783904+syed-atlan@users.noreply.github.com>
Co-authored-by: LijiAlex <liji.a@atlan.com>
Co-authored-by: Hitesh Khandelwal <60309732+hitk6@users.noreply.github.com>
Co-authored-by: krishnanunni-atlan <krishnanunni.m@atlan.com>
Co-authored-by: ankitpatnaik-atlan <ankit.patnaik@atlan.com>
Co-authored-by: salman-atlan <salman.khurshid@atlan.com>

* fixed qn and glossary move

* removed atlas.java , checked out from master

* removed atlas.java , checked out from master

* added readMe.md

* added claude skill to run tests and do review

* added html reporter

* backmerged form master

* removed purge, added tenant checks

* fixed produciton guardrails

---------

Co-authored-by: Arnab Saha <arniesaha@gmail.com>
Co-authored-by: MetaClaw <metaclaw@atlan.com>
Co-authored-by: sriram-atlan <sriram.aravamuthan@atlan.com>
Co-authored-by: mothership-ai[bot] <246624273+mothership-ai[bot]@users.noreply.github.com>
Co-authored-by: Mothership Agent <mothership@atlan.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Syed <150783904+syed-atlan@users.noreply.github.com>
Co-authored-by: LijiAlex <liji.a@atlan.com>
Co-authored-by: Hitesh Khandelwal <60309732+hitk6@users.noreply.github.com>
Co-authored-by: krishnanunni-atlan <krishnanunni.m@atlan.com>
Co-authored-by: ankitpatnaik-atlan <ankit.patnaik@atlan.com>
Co-authored-by: salman-atlan <salman.khurshid@atlan.com>
* ms-696: Es sync redesign for tags propagation flow

* ms-696: Removed the DLQReplayService

* ms-696: Hardening DLQ flow

* ms-696: Resolve PR comments

* ms-696: Removed bad code that can cause OOM issue

* ms-696: Fix the source Tag issue

* ms-696: Resolved comments

* ms-696: Resolved prometheus review suggestion

* ms-696: Better diff for this file

* ms-696: Added comments back

* ms-696: Removed custom image

* ms-696: Ring Dummy commit

* ms-696: Add topic to helm config

* ms-696: Added adaptive retry

* ms-696: Reduce async read batch size to 30

* ms-696: Handle task status for ES failures

* ms-696: Updated doc

* ms-696: Resolved PR comments

* MS-696 : Tag Denorm DLQ Replay Service (#6496)

* ms-696: DLQ Replay Service

* ms-696: Added Task refrence in dlq

* ms-696: Added Batching in DLQ replay service

* ms-696: Prevent memory leak

* ms-696: Handled Kafka connection error

* ms-696: Set maxPollRecords to 1 default value

* ms-696: Resolved PR comments and added metrics for consumer

* ms-696: Resolved indefinite retry error

* ms-696: Resolved metric errors

* ms-696: Index Task ES status

* dummy commit

---------

Co-authored-by: Krishnanunni M <krishnanunni.m@atlan.com>
…#6526)

* fix: (ms-928) Native ES nested type mapping support in typedef seeder

* fix: add indexTypeESMapping to equals(), hashCode(), toString() in AtlasAttributeDef

* fix: address review comments — fatal CREATE, non-fatal UPDATE, fix IndexRepairConsumer build

- Split applyESNestedMappings into CREATE (fatal) and UPDATE (non-fatal) paths
- Add INDEX_REPAIR_CONSUMER_ENABLED, INDEX_REPAIR_BATCH_SIZE, INDEX_REPAIR_BATCH_DELAY_MS to AtlasConfiguration
- Add INDEX_REPAIR_CONSUMER to ActiveStateChangeHandler.HandlerOrder
- Add REINDEX_REPAIRED_ATTRIBUTES to AtlasTaskType enum

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove IndexRepairConsumer constants — not part of MS-928 scope

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

* feat: add bootstrap entity policies for Context Studio types

Add READ and CUD bootstrap policies for ContextRepository and
ContextArtifact entity types. READ access for all roles, CUD
restricted to admin and API token access.

* feat: add relationship policy for ContextRepository → ContextArtifact

Allows admin and API tokens to add/update/remove the containment
relationship between ContextRepository and ContextArtifact entities.

* feat: separate Skill + Context entity and relationship policies

Split into independent policy sets:
- READ/CUD_SKILL_ENTITIES for Skill + SkillArtifact
- READ/CUD_CONTEXT_ENTITIES for ContextRepository + ContextArtifact
- LINK_SKILL_TO_SKILL_ARTIFACT relationship policy (new)
- LINK_CONTEXT_REPOSITORY_TO_ARTIFACT relationship policy (existing)
…-665)

When a new entity (e.g., DataContract) is created in a bulk request alongside
a relationship attribute update on another entity (e.g., setting dataContractLatest
on a Table), the relationship target has a temporary GUID that hasn't been resolved
to the assigned GUID yet. This causes the vertex lookup to fail and the relationship
edge to never be created.

The fix mirrors the existing pattern in mapSoftRefValue() (line 1493) which already
handles this correctly by checking context.getGuidAssignments() for the temporary-to-
assigned GUID mapping.

Impact: Fixes SDK-created DataContracts where dataContractLatest relationship was
not being set, causing orphaned contracts invisible from the asset's contract tab.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…odes

Add processDelete override to ContractPreProcessor with two modes:
- Delete all versions (default): cascade-deletes all contract versions
  for an asset, cleans up hasContract attribute
- Delete latest only (x-atlan-contract-delete-scope: single header):
  deletes only the latest version, promotes previous version

Patterns follow PersonaPreProcessor cascade delete and
ConnectionPreProcessor soft-delete skip.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… (DQ-665)

When deleting only the latest contract version, the dataContractLatest
relationship edge was left in DELETED state because JanusGraph doesn't
restore previously-replaced edges. This caused the UI to show "something
went wrong" on the asset's contract tab.

Fix: restoreAssetContractPointers() uses entityStore.createOrUpdate() to
re-establish the dataContractLatest (and dataContractLatestCertified if
VERIFIED) relationship edges pointing to the previous version.

Also adds __state=ACTIVE filter to getSecondLatestVersion ES query to
avoid matching soft-deleted contracts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shivahanumanthula-atlan and others added 3 commits April 17, 2026 14:57
- Save/restore skipAuthorizationCheck original value instead of
  hardcoding false (matches StakeholderTitlePreProcessor pattern)
- Use deleteByIds batch instead of per-element deleteById loop
- Remove branch from maven.yml CI trigger

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Will remove before merge to master.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shivahanumanthula-atlan and others added 4 commits April 17, 2026 17:07
Without this filter, soft-deleted contracts could be returned as the
"current version", causing processSingleVersionDelete to incorrectly
reject valid delete operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…owed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…DQ-665)

Reverts the graphHelper.getOrCreateEdge refactor. The createOrUpdate
approach for restoring asset contract pointers is an established pattern
used by other preprocessors. Re-adds CI branch trigger for image build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shivahanumanthula-atlan and others added 3 commits April 20, 2026 17:55
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.