Skip to content

Merge mindsdb/engine into fork (catch-up, 26.0.1 → 26.2.0)#83

Open
gabrielbressan-tfy wants to merge 331 commits into
mainfrom
merge/engine-catchup
Open

Merge mindsdb/engine into fork (catch-up, 26.0.1 → 26.2.0)#83
gabrielbressan-tfy wants to merge 331 commits into
mainfrom
merge/engine-catchup

Conversation

@gabrielbressan-tfy

Copy link
Copy Markdown
Collaborator

Why this PR

Our fork was tracking mindsdb/mindsdb, but that repo has been repurposed into the "Minds Platform" product (a different codebase: backend/ + a frontend submodule). The actual Python query engine we forked now lives at mindsdb/engine. The two still share full git history, so this is an ordinary catch-up merge — not a manual port.

This PR re-points upstream to mindsdb/engine and merges ~330 engine commits into our fork, reconciling them with all of our custom work (handlers, planner/select fixes).

  • Version: 26.0.126.2.0
  • Merged commits: ~330 from mindsdb/engine
  • Merge base: aeca4da7e (shared ancestor — clean 3-way merge)

Going forward, syncing is just git fetch upstream && git merge upstream/main (upstream = engine now). The hard reconciliation is baked into history with this PR.


What we kept (our work — fully preserved)

Custom handlers (engine does not ship these; they stay in-tree):
google_analytics · github · gmail · google_calendar · google_search · hubspot · bigquery · s3 · s3vectors · ms_one_drive · litellm · langsmith · xero · linkedin_ads · meta_ad_library · microsoft_ads · multi_format_api · sentry

Custom planner / select behavior (documented in CLAUDE.md):

  • JOIN column collection includes WHERE-clause columns
  • item.conditions filter exclusion + IS NULL guard
  • CTE clearing after plan_cte
  • order_by not forwarded to API handlers
  • applied-WHERE-column stripping in SubSelectStep
  • _collect_identifiers recursive column extraction

Framework stack: kept langchain (our RAG/reranker depend on it via the MindsDB abstraction). Engine has migrated away from langchain — see dependency notes below.


What comes with the upgrade (engine 26.0.126.2.0)

Architecture change — community-handler model

Engine moved ~140 standard handlers (most DB connectors + niche APIs: slack, twitter, youtube, mongodb, elasticsearch, clickhouse, etc.) out of the core tree into an on-demand fetch model:

  • New module mindsdb/integrations/utilities/community_handler_fetcher.py
  • Off by default — gated by MINDSDB_COMMUNITY_HANDLERS=true
  • When enabled: fetches an index from a separate repo (mindsdb/mindsdb-community-handlers) and lazily downloads each handler on first use into a local storage dir
  • ⚠️ That repo appears to be private currently (uses GITHUB_TOKEN), so those handlers may not be fetchable without access. If we need one of the dropped handlers, restore it from git history rather than relying on the fetcher.
  • In-tree handler count: 302 (old main) → 41 (engine core) → 68 (engine core + our 28 custom/kept)

Security fixes

Features / improvements

Dependency bumps (came with engine; a few adjusted for our stack)

Package Old New
flask / werkzeug 3.0.3 / 3.0.6 3.1.3 / 3.1.6
pandas 2.2.3 2.3.1
cryptography >=35.0 >=46.0.5
redis 5.x 6.4.0
protobuf 4.24.4 6.33.5
mind-castle >=0.4.9 0.5.0
pyjwt / orjson / python-multipart bumped
added xlrd, aiobotocore, google-genai, faiss-cpu (KB default)

Decisions made during the merge

Area Decision
pydantic-ai Engine bumped to ==1.77.0 (needs openai>=2.29), which conflicts with our langchain-openai==0.3.6 (openai<2.0). Engine dropped langchain; we depend on it. Pinned pydantic-ai>=0.0.14 → resolves to 1.30.1 (openai-1.x compatible). Kept our duckdb==1.4.0, aipdf==0.0.6.3.
Response types Our DataHubResponse (inherited from old-upstream PRs mindsdb#10716/mindsdb#10632, never our own work) was removed — engine never had it. Switched to engine's OkResponse/TableResponse/DataHandlerResponse.
plan_join.py Kept our JOIN fixes and restored engine's optimize_inner_join (they only collided textually). Adopted engine's check_use_limit (ours had regressed it — dropped the feature and had an or/and precedence bug).
bigquery_handler Merged both sides: our usage-metering + include/exclude tables and engine's dataset_project, service_account_json, LEFT-JOIN fix.
integrations.py Adopted engine's community-handler loading (our in-tree handlers register via _register_handler_dir); preserved our HandlersCache cross-thread fix.
vectordatabase_handler Kept our update()/hybrid_search() methods; switched to engine's response types.
hubspot Kept ours wholesale (our OAuth2 rewrite; engine's changes were cosmetic).

Bugs fixed during integration

  • insert_step.py read response.data_frame on an OkResponseAttributeError on KB insert. Aligned to engine's behavior (return ResultSet(affected_rows=...)). Behavioral note: KB insert now returns affected-row count only, not the inserted rows.
  • Handler-restore gotcha: git checkout --ours only restored modified files in the deleted handler dirs; unmodified __init__.py/__about__.py/icon/tests/ were silently dropped. Restored full dirs from history for github, gmail, google_analytics, google_calendar, google_search, litellm, ms_one_drive, s3.

Testing

  • python -m compileall mindsdb — clean (no syntax errors)
  • ✅ Dependency resolution validated with uv pip compile
  • ✅ Image rebuilt; container boots clean (HTTP API on 47334, no import/startup errors)
  • ✅ KB insert, custom handlers, and RAG/reranker verified working in the dev container

Notes / follow-ups

  • The dependabot alerts on main are pre-existing, not introduced here (this PR actually pulls engine's security bumps).
  • Longer term we've diverged from engine at the framework level (langchain vs pydantic-ai); not blocking, but worth tracking.

tino097 and others added 30 commits March 3, 2026 13:10
…1973) (mindsdb#12281)

Co-authored-by: SyedaAnshrahGillani <syedaanshrah16@gmail.com>
StpMax and others added 26 commits April 17, 2026 12:03
Co-authored-by: Lucas Koontz <lucas.emanuel.koontz@gmail.com>
Co-authored-by: Minura Punchihewa <49385643+MinuraPunchihewa@users.noreply.github.com>
Co-authored-by: Jorge Torres <jorge.torres.maldonado@gmail.com>
Co-authored-by: Konstantin Sivakov <konstantin.sivakov@gmail.com>
Co-authored-by: Zoran Pandovski <zoran.pandovski@gmail.com>
- Rewrite README as a developer-first intro to the Query Engine:
  technical How-it-works with architecture diagram, zero-to-semantic-
  search SQL walkthrough aligned with the new docs syntax, and links
  migrated from mindsdb.com/docs.mindsdb.com to mindshub.ai and
  mindsdb.github.io/engine
- Position the engine as a standalone open-source project that
  optionally pairs with MindsHub agents
- Remove outdated content: header image, demo gif, DeepWiki badge,
  tutorials, Slack links, contributor rewards, contributors graph,
  agents examples
- docs: drop the MindsDB Cloud setup section, sharpen the page title,
  add SEO meta (description, canonical, robots, Open Graph, Twitter
  card, JSON-LD) and a 1200x630 og-image.png
- Update CONTRIBUTING.md and issue-template contact link to Discord
  and the engine repo; delete the ended integrations-contest template

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Revamp README and docs for the Query Engine era
Added a center alignment to the title and adjusted the closing div tag.
Re-point upstream from mindsdb/mindsdb (now the "minds" product) to
mindsdb/engine and reconcile ~330 engine commits with our work.

- Keep our handlers (google_analytics, github, gmail, google_calendar,
  google_search, hubspot, bigquery, s3, s3vectors, ms_one_drive, litellm,
  langsmith, xero, linkedin_ads, meta_ad_library, microsoft_ads,
  multi_format_api, sentry) + custom planner/select fixes
- Adopt engine's community-handler model, response types
  (OkResponse/TableResponse), and optimize_inner_join (kept our JOIN fixes)
- Keep langchain stack; pin pydantic-ai>=0.0.14 (engine dropped langchain,
  bumped pydantic-ai==1.77.0 which needs openai>=2.29, incompatible with
  langchain-openai==0.3.6)
- Align insert_step.py with engine's OkResponse (no .data_frame)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gabrielbressan-tfy

Copy link
Copy Markdown
Collaborator Author

Eu testei localmente aqui

  • Insert e query em Knowledge Base via file upload no mktplace
  • Query nos datasources: BigQuery, S3, Google Ads, Shopify e Xero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.