Scaffold server query pipeline & streaming API #353
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the 'Atlas' scope component, integrating tldraw to render interactive text and chart cards on a canvas, alongside supporting elements like buttons, skeletons, and custom icons. On the backend, it implements a streaming query API powered by Gemini and Data Commons MCP tools, complete with query analysis, safety filtering, and metadata fetching. The review feedback focuses on aligning with repository conventions—such as renaming files to snake_case and applying category-first naming to components—and ensuring strict TypeScript compliance. Additionally, the reviewer recommends enhancing robustness by adding unmount cleanup hooks to prevent memory leaks, caching compiled regular expressions for better performance, and wrapping JSON parsing in try-catch blocks to prevent runtime TypeErrors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
b80edeb to
c108463
Compare
656111f to
b2e7d1e
Compare
b2e7d1e to
a2aab0e
Compare
beets
left a comment
There was a problem hiding this comment.
Thanks for the PR! The scaffolding and overall flow for the data discovery query pipeline are well-designed. Since this is an initial commit, most of these comments are fine to be added as TODO's in the code for future refactor.
| headers: { 'Content-Type': 'application/json' }, | ||
| }); | ||
| } | ||
| const { query, atlasContext } = body; |
There was a problem hiding this comment.
selectedEntityDcids is defined in the request body but not yet destructured here. For follow-up queries (where parsed.places is empty), falling back to selectedEntityDcids will ensure previous place context is preserved.
Also, adding a brief verification check ensuring query is a valid string before invoking safety filters will safeguard against runtime TypeErrors.
| const entityDcid = | ||
| parsedResponse.entityDcid || resolvedPlaceDcid || place; |
There was a problem hiding this comment.
If a place cannot be resolved to a valid DCID, report a warning status or fail gracefully rather than calling the observations API with the raw name.
| ? `\n\nDATE CONSTRAINT: The user wants data limited to ${parsed.dateRange.start && parsed.dateRange.end ? `years ${parsed.dateRange.start} through ${parsed.dateRange.end}` : parsed.dateRange.start ? `years from ${parsed.dateRange.start} onward` : `years up to ${parsed.dateRange.end}`}.` | ||
| : ''; |
There was a problem hiding this comment.
Date constraints might not be limited to years. Could we pass the dates directly in the prompt without assumptions, perhaps as a structured input.
| }); | ||
|
|
||
| if (response.functionCalls && response.functionCalls.length > 0) { | ||
| const fc = response.functionCalls[0]; |
There was a problem hiding this comment.
Gemini models frequently output multiple tool calls concurrently. Currently, this extracts only response.functionCalls[0]. As we iterate, let's loop over all calls in response.functionCalls, execute each, and push their tool responses back in sequence to avoid API conversation history mismatch errors. Could add a TODO for now as a reminder.
| const cleaned = responseText | ||
| .replace(/```json/g, '') | ||
| .replace(/```/g, '') | ||
| .trim(); |
There was a problem hiding this comment.
Instead of stripping out the json markdown, consider searching for the json block instead to be resilient against unexpected model commentary:
.match(/\{[\s\S]*\}/)
There was a problem hiding this comment.
Should we consolidate all our agent instructions into the skills folder?
| .replace(/```json/g, '') | ||
| .replace(/```/g, '') | ||
| .trim(); |
There was a problem hiding this comment.
Similar to earlier comment on handling possible LLM preamble.
const jsonMatch = responseText.match(/\{[\s\S]*\}/);
const cleaned = jsonMatch ? jsonMatch[0] : responseText;
parsedResponse = JSON.parse(cleaned);
|
|
||
| const response = await genAI.models.generateContent({ | ||
| model: config.models.parseQuery, | ||
| contents: `${systemPrompt}\n\nQuery: "${query}"\nReturn ONLY its JSON object, no markdown, no other text or explanation.`, |
There was a problem hiding this comment.
Looks like the return formatting instruction is already part of the system prompt
| @@ -0,0 +1,59 @@ | |||
| import { getServiceConfig } from './config'; | |||
|
|
|||
| export interface ObservationResponse { | |||
There was a problem hiding this comment.
Nice :) thanks for adding the type for this.
| | { type: 'status'; message: string } | ||
| | { type: 'parsed_query'; data: ParsedQuery } | ||
| | { type: 'tool_call'; tool: string; args: Record<string, unknown> } | ||
| | { type: 'query_result'; result: QueryResult; place: string } | ||
| | { type: 'complete' } | ||
| | { type: 'error'; message: string }; |
There was a problem hiding this comment.
Could we lift out the types of these events to string constants?
Overview
Adds the streaming query API route and supporting server-side pipeline that powers Data Weaver's natural-language-to-chart workflow. A user's query flows through safety checks, analysis, Gemini tool-calling with the Data Commons MCP server, and streams structured chart specifications back to the client via SSE.
Changes Made
/api/query) — POST endpoint that accepts a natural-language query and streams SSE events (status,analysis,tool_call,chart_spec,metadata,error,complete)src/server/)safety.ts— layered prompt safety gate (regex patterns + LLM classification)analyze/— extracts places, topic, titles, and date ranges from the query via Geminiquery/— Gemini tool-loop that iteratively calls MCP tools to resolve variablesobservations/— fetches variable metadata/facets from Data Commonsmcp.ts— JSON-RPC client for the Data Commons MCP servergemini.ts— shared Gemini client factorydc-api.ts— Data Commons REST API helpersconfig.ts+config/— service config and skill prompt loadingtypes.ts— shared types for the query pipeline and SSE eventsuse-streaming-query.ts) — SSE parser with abort support and typed event dispatch