Skip to content

feat: add @payloadcms-vectorize/mongodb adapter#52

Merged
techiejd merged 55 commits intomainfrom
feat/mongodb-adapter
Apr 26, 2026
Merged

feat: add @payloadcms-vectorize/mongodb adapter#52
techiejd merged 55 commits intomainfrom
feat/mongodb-adapter

Conversation

@techiejd
Copy link
Copy Markdown
Owner

Summary

  • New @payloadcms-vectorize/mongodb adapter targets MongoDB Atlas + self-hosted Community 8.2+ via unified $vectorSearch.
  • WHERE-clause parity with the PG adapter: pre-filter for equals/not_equals/in/not_in/gt/gte/lt/lte/exists/and/or; post-filter for like/contains/all.
  • Local dev + CI use mongodb/mongodb-atlas-local Docker image — no Atlas account or secrets.

Test plan

  • pnpm test:adapters:mongodb passes locally (79/79)
  • pnpm test:adapters:pg (61/61) and pnpm test:adapters:cf (90/90) still pass (no regressions)
  • pnpm build:types:all passes
  • Spec: docs/superpowers/specs/2026-04-25-mongodb-adapter.md
  • Plan: docs/superpowers/plans/2026-04-25-mongodb-adapter.md

techiejd added 30 commits April 25, 2026 16:25
Single adapter targeting Atlas (GA) and self-hosted Community 8.2+ via
a unified $vectorSearch API. Documents public API, data layout, method
semantics, WHERE clause pre/post-filter split, index lifecycle, dev/CI
environment (mongodb-atlas-local Docker image), test plan, and
acceptance criteria.

Reviewed by spec-document-reviewer subagent (status: Approved); five
advisory recommendations folded in.
…setup:mongodb, test:adapters:mongodb, build:adapters:mongodb), placed adjacent to existing PG/CF siblings
20 bite-sized TDD tasks covering: package skeleton, escapeRegExp,
types, lazy MongoClient, convertWhereToMongo (pre/post-filter split,
and/or recursion, id→_id ObjectId casting, evaluatePostFilter),
ensureSearchIndex with definition-mismatch detection, storeChunk,
search via \$vectorSearch aggregation, deleteChunks/hasEmbeddingVersion,
docker-compose, compliance/where/integration suites, README,
root build+test wiring, changeset registration, CI job, end-to-end
verification.
…Chunk

Both pg and mongodb adapters previously spread data.extensionFields after
sourceCollection/docId/chunkIndex/chunkText/embeddingVersion, allowing
user-supplied extension data to silently overwrite the canonical reserved
values. Move the spread to the front so reserved fields always win at
runtime — defense-in-depth on top of the existing config-time field-name
collision check in createEmbeddingsCollection.
…cing NaN

A search result without a numeric score is meaningless — sort order is
load-bearing on it. Replace the silent Number() coercion with an
explicit guard that throws if doc.score is not a number, with an
actionable message pointing at the $project stage.
…compliance suite

Atlas Local rejects createSearchIndex against a non-existent collection
("Collection 'X' does not exist"). Materialize the collection idempotently
via db.createCollection in indexes.ts before the first search-index
creation. Atlas Cloud is more lenient, but the adapter must work in both.

Also lands the compliance suite (12 tests across getConfigExtension,
storeChunk, search, deleteChunks, hasEmbeddingVersion) that surfaced
this gap, plus shared dev/specs/{constants,utils}.ts helpers.
…isibility lag, PENDING-state poll

Three plan-side bugs surfaced when the WHERE + integration suites ran
against the live Atlas Local container:

1. performVectorSearch helper default limit=100 violates Atlas's
   limit <= numCandidates (pool is numCandidates: 50). Lowered to 10.
2. ~1s eventual-consistency lag between insertOne and $vectorSearch
   visibility even after index READY — added 1200ms waits in WHERE
   beforeAll and integration immediate-search test.
3. Manually-created search index returns in PENDING; ensureSearchIndex
   treats non-READY/BUILDING as unexpected — added poll loop in the
   conflicting-definition test before triggering the conflict.
techiejd added 25 commits April 25, 2026 22:09
Task 4 stripped uri from _mongoConfig but compliance.spec.ts:29 still
asserted ext.custom._mongoConfig.uri === MONGO_URI, breaking the test
suite. Replace with a positive `not.toHaveProperty('uri')` assertion
so the security invariant is locked: re-introducing uri on the persisted
config would now fail this test with a clear "expected ... to not have
property 'uri'" message.

Manual revert proof performed (uri added back in src/index.ts → test
fails with expected message → restored).
Code-review follow-ups from Task 5:
- injectDbName now throws on URIs with a path component (e.g. SRV
  cluster strings carrying a default DB), instead of silently producing
  invalid double-path URIs like mongodb+srv://.../myapp/test.
- Drop unused dropTestDb import in integration.spec.ts.
- Drop the (payload as any).destroy() escape hatch in teardownDbs;
  BasePayload.destroy() is on the public type.
Throws "limit must be a positive integer" before reaching $vectorSearch
when limit is 0, negative, NaN, or non-integer. Without this guard the
driver returns an opaque "Executor error during aggregate command"
which is hard to trace back to caller input.
ensureSearchIndex polls listSearchIndexes every 1s until status === 'READY'
after creating a new index. Adds a fake-timers test that drives the mock
through []/BUILDING/BUILDING/READY and asserts createSearchIndex is called
once and listSearchIndexes is called 4 times.

Manual revert proof: relaxing the polling guard to also early-return on
BUILDING makes the test fail with "expected spy to be called 4 times,
but got 2 times".
…rdering

mongot is free to return latestDefinition with the fields array in any
order and with object property keys in any order. The previous
JSON.stringify-based comparison flagged these as 'different definition'
and threw, forcing users to drop a perfectly valid index.

Canonicalize before compare:
- sort all object keys
- sort the fields array by canonical-JSON of each field

Other arrays (in case the schema grows later) keep their order so a real
ordered list can't be silently swallowed.

Tests:
- positive: a reordered-but-equivalent definition is accepted, no
  createSearchIndex call.
- negative: a definition with a different similarity still throws
  /different definition/.

Manual revert proofs:
- restoring JSON.stringify makes the positive test fail with
  /different definition/.
- stubbing definitionsEqual to always-true makes the negative test fail
  ('promise rejected' vs resolved undefined).
…, tuning, contributing, sibling parity

Addresses /judge-readme punch list:
- Add npm/CI/license/Payload badges and TOC
- Add Prerequisites section (Atlas / self-hosted / driver / Payload / Node)
- Quick Start now wires plugin + collections and includes a "Verify it works" curl block (ingest + search)
- Soften "GA on Atlas" claim — status callout names mongodb-atlas-local CI; surfaced again under Limitations
- Reconcile post-filter info: mechanism canonical in "WHERE clause behavior", Limitations links to it
- Add "How it works", "Who is this for?", and Tuning numCandidates/forceExact sections
- Add Multiple Knowledge Pools worked example
- Add Contributing section with test setup/teardown commands and src layout map
- Link CHANGELOG.md
- Available Adapters table: add mongodb row
- Installation snippet: add mongodb pnpm add line
- Quick Start adapter docs links: add mongodb pointer
- Migrations: note mongodb auto-ensures the $vectorSearch index
- Adapter Configuration: link mongodb API Reference
- Adapter parity callout: describe mongodb pre/post filter split
- Roadmap: move mongodb from "Help wanted" to "Already shipped"
- adapters/README.md Available Adapters: add mongodb row
@techiejd techiejd merged commit 39076db into main Apr 26, 2026
8 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant