Skip to content

fix: query a graph source by alias instead of loading it as a ledger#1375

Open
bplatz wants to merge 5 commits into
mainfrom
fix/graph-source-alias-query
Open

fix: query a graph source by alias instead of loading it as a ledger#1375
bplatz wants to merge 5 commits into
mainfrom
fix/graph-source-alias-query

Conversation

@bplatz

@bplatz bplatz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Closes #1369.

Problem

Querying a registered Iceberg/R2RML graph source by alias
(POST /v1/fluree/query/<alias>) returned a 500:

Ledger error: Nameservice error: Serialization error: missing field `f:ledger`

Registration, fluree list, and fluree info all worked — only the direct
query failed, and it failed during alias resolution (before the catalog was
ever read). Querying the same source through the dataset path (from: <alias>)
worked, so the bug was specific to the alias-in-URL query route.

Root cause

Two gaps on the alias-query path:

  1. FileNameService::lookup had no graph-source guard. It deserialized the
    nameservice record as a ledger NsFileV2, but a graph source is stored as a
    GraphSourceNsFileV2 (no f:ledger field), so the deserialize failed. The
    sibling methods list_branches and all_records already skip graph-source
    records via is_graph_source_record; lookup was missing the same check.

  2. The ledger-query route had no graph-source fallback. It loaded the alias
    strictly as a ledger. The dataset path already handles this
    (load_view_from_source falls back to resolve_as_graph_source); the
    alias-in-URL path did not.

Fix

  • FileNameService::lookup now skips graph-source records (the same guard
    list_branches/all_records use), returning None so the alias resolves as
    "no ledger here" instead of erroring on the missing f:ledger.
  • The ledger-query route falls back to the connection/dataset path on a
    not-found ledger, which resolves the alias as a graph source and queries it
    via its source engine — the same fallback the dataset path already uses.

Tests

  • fluree-db-nameservice: a unit test that lookup of a graph-source alias
    returns None rather than erroring (reproduces the exact missing field f:ledger failure before the fix).
  • fluree-db-server: an integration test that registers an R2RML graph source
    and queries it by alias, asserting the response no longer fails by
    deserializing the record as a ledger.

The existing nameservice and server query suites pass unchanged; the memory
nameservice was unaffected (it stores graph sources in a separate map).

Reviewers: @aaj3f @zonotope

…1369)

Querying a registered Iceberg/R2RML graph source by alias
(`POST /v1/fluree/query/<alias>`) returned 500 `Serialization error: missing
field f:ledger`: the file nameservice's `lookup` deserialized the graph-source
record as a ledger `NsFileV2`, and the ledger-query path had no graph-source
fallback.

- `FileNameService::lookup` now skips graph-source records (the guard
  `list_branches`/`all_records` already use), returning `None` so the alias
  resolves as "no ledger here" instead of crashing on the missing `f:ledger`.
- The ledger-query route falls back to the connection/dataset path on a
  not-found ledger, which resolves the alias as a graph source and queries it
  via its source engine (the same fallback `load_view_from_source` already uses).

Regression tests: a nameservice unit test (lookup of a graph-source alias must
not error) and a server integration test (alias query must not fail
deserializing the record as a ledger).
@bplatz bplatz requested review from aaj3f and zonotope June 25, 2026 12:32
@christophediprima

christophediprima commented Jun 26, 2026

Copy link
Copy Markdown

Hi there!

I can confirme this removes the missing field f:ledger 500 (the nameservice lookup no longer deserializes a graph-source record as NsFileV2). 👍

Testing the branch against a readable Iceberg/R2RML graph source, querying by alias still didn't return rows. Digging in, completing #1369 needs three pieces — #1375 has part of (1); here are all three, each verified to return rows:

1. Nameservice lookup skip — also needed on the storage backend

#1375 adds the graph-source guard to FileNameService::lookup. The same guard is needed on StorageNameService::lookup (storage_ns.rs), or the storage backend still 500s with the deserialize error. (is_graph_source_record already exists there.)

2. Alias-in-URL route (POST /query/<alias>) — resolve + query the graph-source view

The single-alias routes (execute_query for JSON-LD, execute_sparql_ledger for SPARQL) load the alias strictly as a ledger and have no graph-source path. Resolve the alias to a genesis view tagged with graph_source_id and query it via the source engine:

// fluree-db-api: Fluree::resolve_graph_source(alias) -> Option<GraphDb>
let mut db = GraphDb::from_ledger_state(&LedgerState::new(LedgerSnapshot::genesis(&gs_id), Novelty::new(0)));
db.graph_source_id = Some(gs_id.into());   // engine auto-wraps patterns in GRAPH <id> {…}

// fluree-db-server query.rs: on a not-found ledger, in BOTH handlers:
if let Some(view) = state.fluree.resolve_graph_source(ledger_id).await? {
    return run_graph_source_view_query(&view, input /* JsonLd | Sparql */,).await; // view.query().with_r2rml()…
}

3. SPARQL FROM <alias> (the connection/dataset path) — resolve the graph source when formatting

This was the subtle one. FROM <graph-source> on the base /query endpoint executes correctly (the source is scanned, rows produced), but then 500s Ledger not found — because after execution the builder re-resolves the alias via Fluree::db() to get a view for result formatting, and db() resolves strictly as a ledger:

// fluree-db-api/src/query/builder.rs — was, in execute_formatted / execute_formatted_string:
let view = self.fluree.db(alias.identifier.as_str()).await?;            // 500s for a graph source
// fix: graph-source-aware (db, falling back to a genesis graph-source view)
let view = self.fluree.load_graph_db_or_graph_source(alias.identifier.as_str()).await?;

(JSON-LD from dodged this: its connection executor returns the DataSetDb and formats with it, so it never re-resolves via db(). SPARQL returns only the result and re-resolves — hence the asymmetry where JSON-LD from worked but SPARQL FROM didn't.)

Verified

With all three, a registered Iceberg/R2RML graph source returns rows for every shape:

  • POST /query/<alias> — JSON-LD ✅ and SPARQL ✅
  • POST /query + from: <alias> (JSON-LD) ✅
  • POST /query + FROM <alias> (SPARQL) ✅ ← previously 500

Also: the existing regression test points at a non-existent catalog and only asserts the error isn't f:ledger, so a Ledger not found response passes it — it wouldn't catch (2)/(3). A readable-source test (any in-tree R2RML source) would.

To be honest with you this has been generated by Claude in ultra-code mode, but I tested it myself and it is solving it.

The complete diff is below (8 files; reuses your file.rs guard, adds the storage_ns.rs guard, the two handler hookups, and the formatter resolver). Happy to open it as a PR against your branch instead, or fold it here — whichever you prefer.

Complete diffgit apply against the PR branch (8 files, +517/-43)
diff --git a/fluree-db-api/src/query/builder.rs b/fluree-db-api/src/query/builder.rs
index ff9c8bf9f..3e522f406 100644
--- a/fluree-db-api/src/query/builder.rs
+++ b/fluree-db-api/src/query/builder.rs
@@ -1024,7 +1024,10 @@ impl<'a> FromQueryBuilder<'a> {
                 .first()
                 .or_else(|| spec.named_graphs.first())
             {
-                let view = self.fluree.db(alias.identifier.as_str()).await?;
+                let view = self
+                    .fluree
+                    .load_graph_db_or_graph_source(alias.identifier.as_str())
+                    .await?;
                 Ok(result
                     .format_async(view.as_graph_db_ref(), &format_config)
                     .await?)
@@ -1064,7 +1067,10 @@ impl<'a> FromQueryBuilder<'a> {
                             .first()
                             .or_else(|| spec.named_graphs.first())
                             .ok_or_else(|| ApiError::query("No graph specified for formatting"))?;
-                        let view = self.fluree.db(alias.identifier.as_str()).await?;
+                        let view = self
+                            .fluree
+                            .load_graph_db_or_graph_source(alias.identifier.as_str())
+                            .await?;
                         Ok(result
                             .format_async(view.as_graph_db_ref(), &format_config)
                             .await?)
@@ -1120,7 +1126,10 @@ impl<'a> FromQueryBuilder<'a> {
                     .first()
                     .or_else(|| spec.named_graphs.first())
                 {
-                    let view = self.fluree.db(alias.identifier.as_str()).await?;
+                    let view = self
+                        .fluree
+                        .load_graph_db_or_graph_source(alias.identifier.as_str())
+                        .await?;
                     Ok(result
                         .format_async(view.as_graph_db_ref(), &format_config)
                         .await?)
@@ -1165,7 +1174,10 @@ impl<'a> FromQueryBuilder<'a> {
                 .first()
                 .or_else(|| spec.named_graphs.first())
             {
-                let view = self.fluree.db(alias.identifier.as_str()).await?;
+                let view = self
+                    .fluree
+                    .load_graph_db_or_graph_source(alias.identifier.as_str())
+                    .await?;
                 crate::format::format_results_string_async(
                     &result,
                     &result.context,
@@ -1211,7 +1223,10 @@ impl<'a> FromQueryBuilder<'a> {
                             .first()
                             .or_else(|| spec.named_graphs.first())
                             .ok_or_else(|| ApiError::query("No graph specified for formatting"))?;
-                        let view = self.fluree.db(alias.identifier.as_str()).await?;
+                        let view = self
+                            .fluree
+                            .load_graph_db_or_graph_source(alias.identifier.as_str())
+                            .await?;
                         crate::format::format_results_string_async(
                             &result,
                             &result.context,
@@ -1273,7 +1288,10 @@ impl<'a> FromQueryBuilder<'a> {
                     .first()
                     .or_else(|| spec.named_graphs.first())
                 {
-                    let view = self.fluree.db(alias.identifier.as_str()).await?;
+                    let view = self
+                        .fluree
+                        .load_graph_db_or_graph_source(alias.identifier.as_str())
+                        .await?;
                     crate::format::format_results_string_async(
                         &result,
                         &result.context,
diff --git a/fluree-db-api/src/view/fluree_ext.rs b/fluree-db-api/src/view/fluree_ext.rs
index 584238ec5..17156e377 100644
--- a/fluree-db-api/src/view/fluree_ext.rs
+++ b/fluree-db-api/src/view/fluree_ext.rs
@@ -592,29 +592,42 @@ impl Fluree {
     pub async fn load_graph_db_or_graph_source(&self, ledger_id: &str) -> Result<GraphDb> {
         match self.load_graph_db(ledger_id).await {
             Ok(db) => Ok(db),
-            Err(ref e) if e.is_not_found() => {
-                let gs_id = fluree_db_core::normalize_ledger_id(ledger_id)
-                    .unwrap_or_else(|_| ledger_id.to_string());
-
-                let _record = self
-                    .nameservice()
-                    .lookup_graph_source(&gs_id)
-                    .await
-                    .map_err(|e| ApiError::internal(e.to_string()))?
-                    .ok_or_else(|| ApiError::NotFound(ledger_id.to_string()))?;
-
-                let snapshot = fluree_db_core::LedgerSnapshot::genesis(&gs_id);
-                let state = fluree_db_ledger::LedgerState::new(
-                    snapshot,
-                    fluree_db_novelty::Novelty::new(0),
-                );
-                let mut db = GraphDb::from_ledger_state(&state);
-                db.graph_source_id = Some(gs_id.into());
-                Ok(db)
-            }
+            Err(ref e) if e.is_not_found() => self
+                .resolve_graph_source(ledger_id)
+                .await?
+                .ok_or_else(|| ApiError::NotFound(ledger_id.to_string())),
             Err(e) => Err(e),
         }
     }
+
+    /// Resolve `ledger_id` as a registered graph source (Iceberg/R2RML, BM25,
+    /// vector, …), returning a minimal genesis view tagged with the graph
+    /// source id. Returns `Ok(None)` when no graph source is registered under
+    /// the alias.
+    ///
+    /// The `graph_source_id` tag causes query execution to auto-wrap patterns
+    /// in `GRAPH <gs_id> { ... }` so the configured provider resolves them.
+    pub async fn resolve_graph_source(&self, ledger_id: &str) -> Result<Option<GraphDb>> {
+        let gs_id = fluree_db_core::normalize_ledger_id(ledger_id)
+            .unwrap_or_else(|_| ledger_id.to_string());
+
+        if self
+            .nameservice()
+            .lookup_graph_source(&gs_id)
+            .await
+            .map_err(|e| ApiError::internal(e.to_string()))?
+            .is_none()
+        {
+            return Ok(None);
+        }
+
+        let snapshot = fluree_db_core::LedgerSnapshot::genesis(&gs_id);
+        let state =
+            fluree_db_ledger::LedgerState::new(snapshot, fluree_db_novelty::Novelty::new(0));
+        let mut db = GraphDb::from_ledger_state(&state);
+        db.graph_source_id = Some(gs_id.into());
+        Ok(Some(db))
+    }
 }
 
 // ============================================================================
diff --git a/fluree-db-api/tests/it_graph_source_r2rml.rs b/fluree-db-api/tests/it_graph_source_r2rml.rs
index d0f436dba..f35a18f90 100644
--- a/fluree-db-api/tests/it_graph_source_r2rml.rs
+++ b/fluree-db-api/tests/it_graph_source_r2rml.rs
@@ -1599,6 +1599,72 @@ async fn integration_create_r2rml_graph_source_with_mapping() {
     );
 }
 
+/// Regression for the Iceberg/R2RML graph-source alias-resolution bug
+/// (`Nameservice error: Serialization error: missing field f:ledger`).
+///
+/// Registering a graph source and then resolving it by alias through the same
+/// path the HTTP/CLI query handler uses (nameservice `lookup` / `db()` /
+/// `resolve_graph_source`) must NOT fail to deserialize the ledger `NsFileV2`
+/// record. This MUST use a file-backed nameservice: the in-memory backend keeps
+/// graph sources in a separate map and never deserializes the on-disk record,
+/// so it cannot reproduce the failure — which is exactly why the shipped tests
+/// (all `build_memory()`) missed it.
+#[tokio::test]
+async fn regression_graph_source_alias_resolves_on_file_backend() {
+    use fluree_db_api::R2rmlCreateConfig;
+
+    let tmp = tempfile::tempdir().unwrap();
+    let fluree = FlureeBuilder::file(tmp.path().to_str().unwrap())
+        .build()
+        .expect("file-backed Fluree should build");
+
+    // Register an Iceberg/R2RML graph source. The catalog URI is bogus, but the
+    // bug fires during alias resolution, before any catalog call.
+    let config =
+        R2rmlCreateConfig::new("gs", "https://example.invalid", "ns.t", AIRLINE_MAPPING_TTL)
+            .with_mapping_media_type("text/turtle");
+    fluree
+        .create_r2rml_graph_source(config)
+        .await
+        .expect("graph source registration should succeed");
+
+    // 1. Nameservice lookup of the alias is a clean not-found, not a
+    //    Serialization error ("missing field `f:ledger`").
+    let looked_up = fluree.nameservice().lookup("gs:main").await;
+    assert!(
+        matches!(looked_up, Ok(None)),
+        "lookup(gs:main) should be Ok(None) for a graph-source alias, got {looked_up:?}"
+    );
+
+    // 2. db() (the single-target query entrypoint) reports a clean NotFound —
+    //    never a 500/Serialization error mentioning `f:ledger`.
+    match fluree.db("gs:main").await {
+        Ok(_) => panic!("db(gs:main) must not resolve a graph source as a ledger"),
+        Err(e) => {
+            assert!(
+                e.is_not_found(),
+                "db(gs:main) should be NotFound, got: {e:?}"
+            );
+            let msg = e.to_string();
+            assert!(
+                !msg.contains("f:ledger") && !msg.to_lowercase().contains("serializ"),
+                "db(gs:main) must not be a serialization failure: {msg}"
+            );
+        }
+    }
+
+    // 3. The graph-source-aware resolver (used by the server/CLI single-target
+    //    query path) resolves the alias to a graph-source view.
+    let resolved = fluree
+        .resolve_graph_source("gs:main")
+        .await
+        .expect("resolve_graph_source should not error");
+    assert!(
+        resolved.is_some(),
+        "resolve_graph_source(gs:main) should resolve the registered graph source"
+    );
+}
+
 // =============================================================================
 // query_graph_source API Tests (GraphSourcePublisher impl)
 // =============================================================================
diff --git a/fluree-db-cli/src/commands/use_cmd.rs b/fluree-db-cli/src/commands/use_cmd.rs
index 540800ebe..fccaba0d1 100644
--- a/fluree-db-cli/src/commands/use_cmd.rs
+++ b/fluree-db-cli/src/commands/use_cmd.rs
@@ -15,6 +15,18 @@ pub async fn run(ledger: &str, dirs: &FlureeDir) -> CliResult<()> {
         return Ok(());
     }
 
+    // Check if it's a registered graph source (Iceberg/R2RML, BM25, vector, …)
+    if fluree
+        .nameservice()
+        .lookup_graph_source(&ledger_id)
+        .await?
+        .is_some()
+    {
+        config::write_active_ledger(dirs.data_dir(), ledger)?;
+        println!("Now using graph source '{ledger}'");
+        return Ok(());
+    }
+
     // Check if it's a tracked ledger
     let store = TomlSyncConfigStore::new(dirs.config_dir().to_path_buf());
     if store.get_tracked(ledger).is_some() || store.get_tracked(&ledger_id).is_some() {
diff --git a/fluree-db-nameservice/src/file.rs b/fluree-db-nameservice/src/file.rs
index eb3d946c7..f20727932 100644
--- a/fluree-db-nameservice/src/file.rs
+++ b/fluree-db-nameservice/src/file.rs
@@ -276,15 +276,30 @@ impl FileNameService {
 
     /// Load and merge main record with index file
     async fn load_record(&self, ledger_name: &str, branch: &str) -> Result<Option<NsRecord>> {
+        use fluree_db_core::StorageRead;
         let main_address = Self::ns_address(ledger_name, branch);
         let index_address = Self::index_address(ledger_name, branch);
 
-        // Read main record
-        let main_file: Option<NsFileV2> = self.read_json_from_address(&main_address).await?;
+        // Read the main record bytes once.
+        let main_bytes = match self.storage.read_bytes(&main_address).await {
+            Ok(bytes) => bytes,
+            Err(fluree_db_core::Error::NotFound(_)) => return Ok(None),
+            Err(e) => return Err(NameServiceError::from(e)),
+        };
 
-        let Some(main) = main_file else {
+        // Graph-source records share the `ns@v2/{name}/{branch}.json` address space
+        // with ledger records but use a different schema with no `f:ledger` field.
+        // Report them as "not a ledger" (Ok(None)) so single-alias resolution
+        // (`lookup` -> `LedgerState::load`) yields a clean not-found and callers can
+        // fall back to graph-source resolution — instead of failing to deserialize
+        // NsFileV2 with a "missing field `f:ledger`" error. This mirrors the
+        // type-aware `lookup_any` and is the single guard shared by all read paths
+        // (`lookup`, `list_branches`, `all_records`).
+        if Self::is_graph_source_from_bytes(&main_bytes) {
             return Ok(None);
-        };
+        }
+
+        let main: NsFileV2 = serde_json::from_slice(&main_bytes)?;
 
         // Read index file (if exists)
         let index_file: Option<NsIndexFileV2> = self.read_json_from_address(&index_address).await?;
@@ -352,6 +367,16 @@ impl FileNameService {
         Ok(Self::is_graph_source_from_json(&parsed))
     }
 
+    /// Check if raw JSON bytes represent a graph source record (exact match).
+    /// Unparseable bytes are treated as "not a graph source" so the caller can
+    /// surface the underlying deserialization error for the concrete record type.
+    fn is_graph_source_from_bytes(bytes: &[u8]) -> bool {
+        match serde_json::from_slice::<serde_json::Value>(bytes) {
+            Ok(parsed) => Self::is_graph_source_from_json(&parsed),
+            Err(_) => false,
+        }
+    }
+
     /// Check if parsed JSON represents a graph source record (exact match).
     /// Matches `"f:"` compact prefixes and full IRIs.
     fn is_graph_source_from_json(parsed: &serde_json::Value) -> bool {
@@ -449,10 +474,7 @@ impl crate::NameServiceLookup for FileNameService {
                 .trim_end_matches(".json")
                 .to_string();
 
-            if self.is_graph_source_record(ledger_name, &branch).await? {
-                continue;
-            }
-
+            // Graph-source records are skipped by `load_record` (returns Ok(None)).
             if let Ok(Some(record)) = self.load_record(ledger_name, &branch).await {
                 if !record.retracted {
                     records.push(record);
@@ -480,10 +502,7 @@ impl crate::NameServiceLookup for FileNameService {
                 continue;
             }
 
-            if self.is_graph_source_record(&parent, &file_stem).await? {
-                continue;
-            }
-
+            // Graph-source records are skipped by `load_record` (returns Ok(None)).
             if let Ok(Some(record)) = self.load_record(&parent, &file_stem).await {
                 records.push(record);
             }
@@ -2540,4 +2559,55 @@ mod tests {
         assert_eq!(record.commit_head_id, Some(cid));
         assert_eq!(record.commit_t, 2);
     }
+
+    /// Regression: a graph-source record shares the `ns@v2/{name}/{branch}.json`
+    /// key space with ledger records but uses a different schema (no `f:ledger`).
+    /// `lookup` must report it as a clean not-found (Ok(None)) instead of failing
+    /// to deserialize `NsFileV2` ("missing field `f:ledger`"), so the query/`use`
+    /// path can fall back to graph-source resolution.
+    #[tokio::test]
+    async fn test_file_ns_lookup_skips_graph_source_record() {
+        use crate::{GraphSourceLookup, GraphSourcePublisher, GraphSourceType, NsLookupResult};
+        let (_temp, ns) = setup().await;
+
+        ns.publish_commit("realdb:main", 1, &test_cid("commit-1"))
+            .await
+            .unwrap();
+        ns.publish_graph_source(
+            "gs",
+            "main",
+            GraphSourceType::Iceberg,
+            r#"{"catalog":"https://example.invalid","table":"ns.t"}"#,
+            &["realdb:main".to_string()],
+        )
+        .await
+        .unwrap();
+
+        // The bug: lookup of a graph-source alias used to fail to deserialize the
+        // ledger NsFileV2. It must now be a clean not-found.
+        let result = ns.lookup("gs:main").await;
+        assert!(
+            matches!(result, Ok(None)),
+            "lookup of a graph-source alias should be Ok(None), got {result:?}"
+        );
+
+        // The regular ledger still resolves.
+        assert!(ns.lookup("realdb:main").await.unwrap().is_some());
+
+        // The type-aware resolver still classifies each correctly.
+        assert!(matches!(
+            ns.lookup_any("gs:main").await.unwrap(),
+            NsLookupResult::GraphSource(_)
+        ));
+        assert!(matches!(
+            ns.lookup_any("realdb:main").await.unwrap(),
+            NsLookupResult::Ledger(_)
+        ));
+
+        // list_branches / all_records keep excluding graph-source records.
+        assert!(ns.list_branches("gs").await.unwrap().is_empty());
+        let all = ns.all_records().await.unwrap();
+        assert_eq!(all.len(), 1, "all_records should list only the ledger");
+        assert_eq!(all[0].ledger_id, "realdb:main");
+    }
 }
diff --git a/fluree-db-nameservice/src/storage_ns.rs b/fluree-db-nameservice/src/storage_ns.rs
index c2b46ed49..41e45e9ad 100644
--- a/fluree-db-nameservice/src/storage_ns.rs
+++ b/fluree-db-nameservice/src/storage_ns.rs
@@ -331,12 +331,28 @@ where
         let main_key = self.ns_key(ledger_name, branch);
         let index_key = self.index_key(ledger_name, branch);
 
-        // Read main record
-        let main_file: Option<NsFileV2> = self.read_json(&main_key).await?;
+        // Read the main record bytes once.
+        let main_bytes = match self.storage.read_bytes(&main_key).await {
+            Ok(bytes) => bytes,
+            Err(CoreError::NotFound(_)) => return Ok(None),
+            Err(e) => {
+                return Err(NameServiceError::storage(format!(
+                    "Failed to read {main_key}: {e}"
+                )))
+            }
+        };
 
-        let Some(main) = main_file else {
+        // Graph-source records share the `ns@v2/{name}/{branch}.json` key space with
+        // ledger records but use a different schema with no `f:ledger` field. Report
+        // them as "not a ledger" (Ok(None)) so single-alias resolution yields a clean
+        // not-found and callers can fall back to graph-source resolution — instead of
+        // failing to deserialize NsFileV2 with a "missing field `f:ledger`" error.
+        // Mirrors the type-aware `lookup_any`.
+        if Self::is_graph_source_from_bytes(&main_bytes) {
             return Ok(None);
-        };
+        }
+
+        let main: NsFileV2 = serde_json::from_slice(&main_bytes)?;
 
         // Read index file (if exists)
         let index_file: Option<NsIndexFileV2> = self.read_json(&index_key).await?;
@@ -2289,4 +2305,40 @@ mod tests {
         );
         assert_eq!(after_retract.payload.state, "retracted");
     }
+
+    /// Regression (storage backend twin of the file-backend test): a graph-source
+    /// record must surface from `lookup` as a clean not-found (Ok(None)) rather
+    /// than a "missing field `f:ledger`" deserialization error, so single-alias
+    /// query/`use` resolution can fall back to graph-source resolution.
+    #[tokio::test]
+    async fn test_storage_ns_lookup_skips_graph_source_record() {
+        use crate::{GraphSourceLookup, GraphSourcePublisher, GraphSourceType, NsLookupResult};
+        let ns = make_storage_ns();
+
+        publish_commit(&ns, "realdb:main", 1, &dummy_cid("commit-1")).await;
+        ns.publish_graph_source(
+            "gs",
+            "main",
+            GraphSourceType::Iceberg,
+            r#"{"catalog":"https://example.invalid","table":"ns.t"}"#,
+            &["realdb:main".to_string()],
+        )
+        .await
+        .unwrap();
+
+        let result = ns.lookup("gs:main").await;
+        assert!(
+            matches!(result, Ok(None)),
+            "lookup of a graph-source alias should be Ok(None), got {result:?}"
+        );
+        assert!(ns.lookup("realdb:main").await.unwrap().is_some());
+        assert!(matches!(
+            ns.lookup_any("gs:main").await.unwrap(),
+            NsLookupResult::GraphSource(_)
+        ));
+        assert!(matches!(
+            ns.lookup_any("realdb:main").await.unwrap(),
+            NsLookupResult::Ledger(_)
+        ));
+    }
 }
diff --git a/fluree-db-server/src/routes/query.rs b/fluree-db-server/src/routes/query.rs
index a0cb211e9..c714e65dd 100644
--- a/fluree-db-server/src/routes/query.rs
+++ b/fluree-db-server/src/routes/query.rs
@@ -1910,6 +1910,47 @@ fn delimited_response(bytes: Vec<u8>, format: DelimitedFormat) -> Response {
     ([(axum::http::header::CONTENT_TYPE, content_type)], bytes).into_response()
 }
 
+/// Query input for a single-target graph-source query.
+#[cfg(feature = "iceberg")]
+enum GraphSourceQueryInput<'a> {
+    JsonLd(&'a JsonValue),
+    Sparql(&'a str),
+}
+
+/// Execute a query against an already-resolved graph-source view via the
+/// R2RML-aware path, returning formatted JSON.
+///
+/// `view` must carry a `graph_source_id` (see [`Fluree::resolve_graph_source`]);
+/// `with_r2rml()` attaches the Iceberg/R2RML provider and the engine wraps the
+/// patterns in `GRAPH <gs> { ... }` so the provider resolves them. Graph-source
+/// queries support JSON output only — delimited / XML formats are rejected by
+/// the caller.
+#[cfg(feature = "iceberg")]
+async fn run_graph_source_view_query(
+    state: &AppState,
+    view: &GraphDb,
+    input: GraphSourceQueryInput<'_>,
+    format: Option<fluree_db_api::FormatterConfig>,
+    span: &tracing::Span,
+) -> Result<JsonValue> {
+    let builder = view
+        .query(state.fluree.as_ref())
+        .with_r2rml()
+        .execution_options(query_execution_options(state));
+    let builder = match input {
+        GraphSourceQueryInput::JsonLd(json) => builder.jsonld(json),
+        GraphSourceQueryInput::Sparql(sparql) => builder.sparql(sparql),
+    };
+    let builder = match format {
+        Some(cfg) => builder.format(cfg),
+        None => builder,
+    };
+    builder.execute_formatted().await.map_err(|e| {
+        set_span_error_code(span, "error:QueryFailed");
+        ServerError::Api(e)
+    })
+}
+
 async fn execute_query(
     state: &AppState,
     ledger_id: &str,
@@ -1978,7 +2019,36 @@ async fn execute_query(
     }
 
     // Shared storage mode: use load_ledger_for_query with freshness checking
-    let ledger = load_ledger_for_query(state, ledger_id, &span).await?;
+    let ledger = match load_ledger_for_query(state, ledger_id, &span).await {
+        Ok(ledger) => ledger,
+        Err(e) => {
+            // A graph-source alias (Iceberg/R2RML) is not a ledger. After the
+            // nameservice fix it surfaces here as a clean not-found; resolve it as
+            // a graph source and run via the R2RML-aware view path.
+            #[cfg(feature = "iceberg")]
+            if matches!(&e, ServerError::Api(api) if api.is_not_found()) {
+                if let Some(view) = state.fluree.resolve_graph_source(ledger_id).await? {
+                    if let Some(fmt) = delimited {
+                        return Err(ServerError::not_acceptable(format!(
+                            "{} format not supported for graph source queries",
+                            fmt.name().to_uppercase()
+                        )));
+                    }
+                    let result = run_graph_source_view_query(
+                        state,
+                        &view,
+                        GraphSourceQueryInput::JsonLd(query_json),
+                        None,
+                        &span,
+                    )
+                    .await?;
+                    tracing::info!(status = "success", graph_source = true);
+                    return Ok((HeaderMap::new(), Json(result)).into_response());
+                }
+            }
+            return Err(e);
+        }
+    };
     let graph = GraphDb::from_ledger_state(&ledger);
     let fluree = &state.fluree;
 
@@ -2368,6 +2438,34 @@ async fn execute_sparql_ledger(
         let (json_fmt_config, json_content_type) =
             sparql_json_response_format(parsed.ast.as_ref(), headers);
 
+        // Single-target graph source (Iceberg/R2RML) with no dataset clause:
+        // resolve the alias as a graph source and run via the R2RML-aware path.
+        // (A SPARQL FROM/FROM NAMED clause is handled by the dataset branch below.)
+        #[cfg(feature = "iceberg")]
+        if !has_dataset_clause {
+            if let Some(view) = state.fluree.resolve_graph_source(ledger_id).await? {
+                if wants_sparql_xml || wants_rdf_xml || delimited.is_some() {
+                    return Err(ServerError::not_acceptable(
+                        "Only JSON output is supported for graph source queries".to_string(),
+                    ));
+                }
+                let result = run_graph_source_view_query(
+                    state,
+                    &view,
+                    GraphSourceQueryInput::Sparql(sparql),
+                    Some(json_fmt_config.clone()),
+                    &span,
+                )
+                .await?;
+                tracing::info!(status = "success", graph_source = true);
+                return Ok((
+                    [(axum::http::header::CONTENT_TYPE, json_content_type)],
+                    Json(result),
+                )
+                    .into_response());
+            }
+        }
+
         // In proxy mode, use the unified Fluree method (returns pre-formatted JSON)
         if state.config.is_proxy_storage_mode() && !has_dataset_clause {
             if wants_sparql_xml {
diff --git a/fluree-db-server/tests/graph_source_query_integration.rs b/fluree-db-server/tests/graph_source_query_integration.rs
new file mode 100644
index 000000000..942203f85
--- /dev/null
+++ b/fluree-db-server/tests/graph_source_query_integration.rs
@@ -0,0 +1,145 @@
+//! Regression: a registered Iceberg/R2RML graph source must be queryable by
+//! alias over the HTTP query path (`POST /v1/fluree/query/<gs>`), instead of
+//! failing with `Serialization error: missing field f:ledger` (the alias being
+//! deserialized as a ledger `NsFileV2` record) or a bare "ledger not found".
+//!
+//! These exercise the broken path the shipped tests never covered: the server
+//! query handlers (`execute_query` for JSON-LD, `execute_sparql_ledger` for
+//! SPARQL) resolving a graph-source alias through the nameservice. The server
+//! uses a file-backed nameservice, which is required to reproduce the bug — the
+//! in-memory backend keeps graph sources in a separate map and never
+//! deserializes the on-disk record.
+#![cfg(feature = "iceberg")]
+
+use axum::body::Body;
+use fluree_db_api::R2rmlCreateConfig;
+use fluree_db_server::{routes::build_router, AppState, ServerConfig, TelemetryConfig};
+use http::{Request, StatusCode};
+use http_body_util::BodyExt;
+use serde_json::json;
+use std::sync::Arc;
+use tempfile::TempDir;
+use tower::ServiceExt;
+
+const MAPPING_TTL: &str = r#"
+@prefix rr: <http://www.w3.org/ns/r2rml#> .
+@prefix ex: <http://example.org/> .
+
+<http://example.org/mapping#M> a rr:TriplesMap ;
+    rr:logicalTable [ rr:tableName "openflights.airlines" ] ;
+    rr:subjectMap [
+        rr:template "http://example.org/airline/{id}" ;
+        rr:class ex:Airline
+    ] ;
+    rr:predicateObjectMap [
+        rr:predicate ex:name ;
+        rr:objectMap [ rr:column "name" ]
+    ] .
+"#;
+
+/// Build a file-backed server state with a single Iceberg/R2RML graph source
+/// `gs:main` registered. The catalog URI is bogus, but the historical bug fired
+/// during alias resolution, before any catalog call.
+async fn state_with_graph_source() -> (TempDir, Arc<AppState>) {
+    let tmp = tempfile::tempdir().expect("tempdir");
+    let cfg = ServerConfig {
+        cors_enabled: false,
+        indexing_enabled: false,
+        storage_path: Some(tmp.path().to_path_buf()),
+        ..Default::default()
+    };
+    let telemetry = TelemetryConfig::with_server_config(&cfg);
+    let state = Arc::new(AppState::new(cfg, telemetry).await.expect("AppState::new"));
+
+    state
+        .fluree
+        .create_r2rml_graph_source(
+            R2rmlCreateConfig::new(
+                "gs",
+                "https://example.invalid",
+                "openflights.airlines",
+                MAPPING_TTL,
+            )
+            .with_mapping_media_type("text/turtle"),
+        )
+        .await
+        .expect("graph source registration should succeed");
+
+    (tmp, state)
+}
+
+async fn body_text(resp: http::Response<Body>) -> (StatusCode, String) {
+    let status = resp.status();
+    let bytes = resp.into_body().collect().await.expect("body").to_bytes();
+    (status, String::from_utf8_lossy(&bytes).into_owned())
+}
+
+/// JSON-LD `POST /v1/fluree/query/gs:main` (the `execute_query` path).
+#[tokio::test]
+async fn jsonld_query_by_graph_source_alias_resolves() {
+    let (_tmp, state) = state_with_graph_source().await;
+    let app = build_router(state);
+
+    let body = json!({
+        "@context": {"ex": "http://example.org/"},
+        "select": ["?s"],
+        "where": [["?s", "a", "ex:Airline"]]
+    });
+    let resp = app
+        .oneshot(
+            Request::builder()
+                .method("POST")
+                .uri("/v1/fluree/query/gs:main")
+                .header("content-type", "application/json")
+                .body(Body::from(body.to_string()))
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+
+    let (status, text) = body_text(resp).await;
+    // Before the fix: 500 "Serialization error: missing field `f:ledger`".
+    assert!(
+        !text.contains("f:ledger"),
+        "graph-source alias must not be deserialized as a ledger record; got {status}: {text}"
+    );
+    // It must resolve as a graph source, not report the ledger as missing.
+    assert_ne!(
+        status,
+        StatusCode::NOT_FOUND,
+        "graph-source alias should resolve, not 404; body: {text}"
+    );
+}
+
+/// SPARQL `POST /v1/fluree/query/gs:main` — the exact shape from the bug report
+/// (`execute_sparql_ledger` path). The query reaches the R2RML engine (which
+/// rejects the fully-unbound pattern), proving alias resolution succeeded;
+/// crucially it is no longer the `f:ledger` deserialization failure.
+#[tokio::test]
+async fn sparql_query_by_graph_source_alias_resolves() {
+    let (_tmp, state) = state_with_graph_source().await;
+    let app = build_router(state);
+
+    let resp = app
+        .oneshot(
+            Request::builder()
+                .method("POST")
+                .uri("/v1/fluree/query/gs:main")
+                .header("content-type", "application/sparql-query")
+                .body(Body::from("SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 1"))
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+
+    let (status, text) = body_text(resp).await;
+    assert!(
+        !text.contains("f:ledger"),
+        "graph-source alias must not be deserialized as a ledger record; got {status}: {text}"
+    );
+    assert_ne!(
+        status,
+        StatusCode::NOT_FOUND,
+        "graph-source alias should resolve, not 404; body: {text}"
+    );
+}

@bplatz bplatz changed the base branch from feature/shared-disk-cache to perf/iceberg-query-opt June 26, 2026 12:20
Finishes the alias-query support started in #1375, addressing the
reporter's follow-up findings:

- StorageNameService::lookup: add the same graph-source guard
  FileNameService::lookup already has, so the storage backend no longer
  500s ("missing field `f:ledger`") when an alias names a graph source.

- execute_sparql_ledger: the single-target SPARQL alias path had no
  graph-source fallback (the JSON-LD path already falls back to the
  dataset path). On a not-found ledger it now resolves via
  graph(alias).query() (auto-enables R2RML), returning JSON.

- FromQueryBuilder formatter: result formatting re-resolved the alias via
  db(), which is ledger-only and 500s for a graph source. Add
  db_or_graph_source (db() with a graph-source fallback) and use it at
  the six formatter sites, fixing SPARQL FROM <graph-source>.

Consolidate the genesis graph-source view logic into a single
Fluree::resolve_graph_source; resolve_as_graph_source and
db_or_graph_source delegate to it. Remove the unused
load_graph_db_or_graph_source.

Tests: storage-backend lookup regression (nameservice); strengthen the
server alias test to assert it routes to the graph-source engine rather
than 404 / deserialize as a ledger.
@bplatz

bplatz commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @christophediprima -- that helped flesh out the remaining issues and should be all addressed now.

Base automatically changed from perf/iceberg-query-opt to main June 27, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants