Merge mindsdb/engine into fork (catch-up, 26.0.1 → 26.2.0)#83
Open
gabrielbressan-tfy wants to merge 331 commits into
Open
Merge mindsdb/engine into fork (catch-up, 26.0.1 → 26.2.0)#83gabrielbressan-tfy wants to merge 331 commits into
gabrielbressan-tfy wants to merge 331 commits into
Conversation
…-2146-hubspot-multitable-join
…r's dependencies (mindsdb#12234)
…1973) (mindsdb#12281) Co-authored-by: SyedaAnshrahGillani <syedaanshrah16@gmail.com>
…-2146-hubspot-multitable-join
Co-authored-by: Lucas Koontz <lucas.emanuel.koontz@gmail.com> Co-authored-by: Minura Punchihewa <49385643+MinuraPunchihewa@users.noreply.github.com> Co-authored-by: Jorge Torres <jorge.torres.maldonado@gmail.com> Co-authored-by: Konstantin Sivakov <konstantin.sivakov@gmail.com> Co-authored-by: Zoran Pandovski <zoran.pandovski@gmail.com>
- Rewrite README as a developer-first intro to the Query Engine: technical How-it-works with architecture diagram, zero-to-semantic- search SQL walkthrough aligned with the new docs syntax, and links migrated from mindsdb.com/docs.mindsdb.com to mindshub.ai and mindsdb.github.io/engine - Position the engine as a standalone open-source project that optionally pairs with MindsHub agents - Remove outdated content: header image, demo gif, DeepWiki badge, tutorials, Slack links, contributor rewards, contributors graph, agents examples - docs: drop the MindsDB Cloud setup section, sharpen the page title, add SEO meta (description, canonical, robots, Open Graph, Twitter card, JSON-LD) and a 1200x630 og-image.png - Update CONTRIBUTING.md and issue-template contact link to Discord and the engine repo; delete the ended integrations-contest template Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Revamp README and docs for the Query Engine era
Added a center alignment to the title and adjusted the closing div tag.
Re-point upstream from mindsdb/mindsdb (now the "minds" product) to mindsdb/engine and reconcile ~330 engine commits with our work. - Keep our handlers (google_analytics, github, gmail, google_calendar, google_search, hubspot, bigquery, s3, s3vectors, ms_one_drive, litellm, langsmith, xero, linkedin_ads, meta_ad_library, microsoft_ads, multi_format_api, sentry) + custom planner/select fixes - Adopt engine's community-handler model, response types (OkResponse/TableResponse), and optimize_inner_join (kept our JOIN fixes) - Keep langchain stack; pin pydantic-ai>=0.0.14 (engine dropped langchain, bumped pydantic-ai==1.77.0 which needs openai>=2.29, incompatible with langchain-openai==0.3.6) - Align insert_step.py with engine's OkResponse (no .data_frame) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collaborator
Author
|
Eu testei localmente aqui
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this PR
Our fork was tracking
mindsdb/mindsdb, but that repo has been repurposed into the "Minds Platform" product (a different codebase:backend/+ a frontend submodule). The actual Python query engine we forked now lives atmindsdb/engine. The two still share full git history, so this is an ordinary catch-up merge — not a manual port.This PR re-points
upstreamtomindsdb/engineand merges ~330 engine commits into our fork, reconciling them with all of our custom work (handlers, planner/select fixes).26.0.1→26.2.0mindsdb/engineaeca4da7e(shared ancestor — clean 3-way merge)What we kept (our work — fully preserved)
Custom handlers (engine does not ship these; they stay in-tree):
google_analytics·github·gmail·google_calendar·google_search·hubspot·bigquery·s3·s3vectors·ms_one_drive·litellm·langsmith·xero·linkedin_ads·meta_ad_library·microsoft_ads·multi_format_api·sentryCustom planner / select behavior (documented in
CLAUDE.md):item.conditionsfilter exclusion +IS NULLguardplan_cteorder_bynot forwarded to API handlersSubSelectStep_collect_identifiersrecursive column extractionFramework stack: kept langchain (our RAG/reranker depend on it via the MindsDB abstraction). Engine has migrated away from langchain — see dependency notes below.
What comes with the upgrade (engine
26.0.1→26.2.0)Architecture change — community-handler model
Engine moved ~140 standard handlers (most DB connectors + niche APIs: slack, twitter, youtube, mongodb, elasticsearch, clickhouse, etc.) out of the core tree into an on-demand fetch model:
mindsdb/integrations/utilities/community_handler_fetcher.pyMINDSDB_COMMUNITY_HANDLERS=truemindsdb/mindsdb-community-handlers) and lazily downloads each handler on first use into a local storage dirGITHUB_TOKEN), so those handlers may not be fetchable without access. If we need one of the dropped handlers, restore it from git history rather than relying on the fetcher.302(old main) →41(engine core) →68(engine core + our 28 custom/kept)Security fixes
safe_extractpath-traversal hardening (security(utilities): harden path traversal validation in safe_extract mindsdb/minds#12347, Bugfix/files safe extract mindsdb/minds#12010)4.24.4→6.33.5(Snyk)Features / improvements
1.10 → 1.26optimize_inner_join— batchesLIMITed inner-join fetches into partitioned steps (restored on top of our JOIN fixes)Dependency bumps (came with engine; a few adjusted for our stack)
xlrd,aiobotocore,google-genai,faiss-cpu(KB default)Decisions made during the merge
pydantic-ai==1.77.0(needsopenai>=2.29), which conflicts with ourlangchain-openai==0.3.6(openai<2.0). Engine dropped langchain; we depend on it. Pinnedpydantic-ai>=0.0.14→ resolves to1.30.1(openai-1.x compatible). Kept ourduckdb==1.4.0,aipdf==0.0.6.3.DataHubResponse(inherited from old-upstream PRs mindsdb#10716/mindsdb#10632, never our own work) was removed — engine never had it. Switched to engine'sOkResponse/TableResponse/DataHandlerResponse.plan_join.pyoptimize_inner_join(they only collided textually). Adopted engine'scheck_use_limit(ours had regressed it — dropped the feature and had anor/andprecedence bug).bigquery_handlerdataset_project,service_account_json, LEFT-JOIN fix.integrations.py_register_handler_dir); preserved ourHandlersCachecross-thread fix.vectordatabase_handlerupdate()/hybrid_search()methods; switched to engine's response types.Bugs fixed during integration
insert_step.pyreadresponse.data_frameon anOkResponse→AttributeErroron KB insert. Aligned to engine's behavior (return ResultSet(affected_rows=...)). Behavioral note: KB insert now returns affected-row count only, not the inserted rows.git checkout --oursonly restored modified files in the deleted handler dirs; unmodified__init__.py/__about__.py/icon/tests/were silently dropped. Restored full dirs from history for github, gmail, google_analytics, google_calendar, google_search, litellm, ms_one_drive, s3.Testing
python -m compileall mindsdb— clean (no syntax errors)uv pip compileNotes / follow-ups
mainare pre-existing, not introduced here (this PR actually pulls engine's security bumps).