From 6876448ecb00219a181504ca0eefa292fe3a09ad Mon Sep 17 00:00:00 2001
From: Tushar-TG-14 <tushar.sharma@tigergraph.com>
Date: Wed, 20 May 2026 04:03:57 +0530
Subject: [PATCH 1/2] DOC-2975-Revise data loading overview and connector
 details

Updated the data loading overview to reflect changes in terminology and structure, including the introduction of data connector architecture and categories.
---
 .../pages/data-loading-overview.adoc          | 95 +++++++++++--------
 1 file changed, 58 insertions(+), 37 deletions(-)

diff --git a/modules/data-loading/pages/data-loading-overview.adoc b/modules/data-loading/pages/data-loading-overview.adoc
index fa6db01ab..08384a61e 100644
--- a/modules/data-loading/pages/data-loading-overview.adoc
+++ b/modules/data-loading/pages/data-loading-overview.adoc
@@ -1,63 +1,84 @@
 :toc:
-= Data Loading Overview
-:description: Overview of available loading methods and supported features.
+= Data Connector Overview
+:description: Overview of available data connectors, sources, and loading workflows.
 :page-aliases: data-loading:kafka-loader/index.adoc
 
-Once you have xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[defined a graph schema], you can load data into the graph. This section focuses on how to configure TigerGraph for the different data sources, as well as different data formats and transport schemes.
+Once you have xref:{page-component-version}@gsql-ref:ddl-and-loading:defining-a-graph-schema.adoc[defined a graph schema], you can load data into the graph. 
 
-== Loading System Architecture
+This section provides an overview of how to configure TigerGraph to connect to different data sources, including data warehouses, cloud storage, streaming systems, and lakehouse platforms.
 
-This diagram shows the supported data sources, which connector to use, and which TigerGraph component manages the data loading.
+== Connector Architecture Overview
 
-.TigerGraph Data Loading Options
-image::data-loading:loading-arch_3.11-rev2.png[Architectural diagram showing supported data sources, which connector to use, and which TigerGraph component manages the data loading]
+This diagram shows the supported data source categories, the connectors used to access them, and the TigerGraph components responsible for ingesting the data.
+
+.TigerGraph Data Connector Architecture
+image::data-loading:loading-arch_3.11-rev2.png[Architectural diagram showing supported data sources, connectors, and data ingestion components]
 //  source file: https://graphsql.atlassian.net/wiki/..../Data+Loading+Architecture+with+New+Spark+Connector
 
-== Data Sources
+== Data Source Categories
+
+TigerGraph supports multiple categories of data sources, each accessed through a specific connector or integration method.
 
-You have several options for data sources:
+* *Local Files*: Files located on the TigerGraph server can be loaded directly without defining a `DATA_SOURCE` object. This option typically provides the highest performance.
 
-* *Local Files*: Files residing on a TigerGraph server can be loaded without the need to create a GSQL DATA_SOURCE object. This option can have the highest performance.
+* *External Sources (via Kafka Connect)*: External systems are accessed by defining a `DATA_SOURCE` object, which uses the https://docs.confluent.io/platform/current/connect/index.html[Kafka Connect] framework. Kafka Connect provides a distributed and fault-tolerant data pipeline.
 
-* *Outside Sources*: Loading data from an outside source, such as cloud storage, requires one additional step to first define a DATA_SOURCE object, which uses the https://docs.confluent.io/platform/current/connect/index.html[Kafka Connect] framework.
-Kafka offers a distributed, fault-tolerant, real-time data pipeline with concurrency.
-By encapsulating the details of the data source connection in a DATA_SOURCE object, GSQL can treat the source like it treats a local file.
-You can use this approach for the following data sources:
++
+Using this approach, TigerGraph can treat external sources similarly to local files. Supported sources include:
 +
 ** Cloud storage (Amazon S3, Azure Blob Storage, Google Cloud Storage)
-** Data warehouse query results (Google BigQuery, Snowflake, PostgreSQL)
-** External Kafka cluster
+** Data warehouses (Google BigQuery, Snowflake, PostgreSQL)
+** External Kafka clusters
+** Lakehouse platforms such as Apache Iceberg (via Kafka Connect)
+
++
+See the pages for each connector for detailed configuration steps.
+
+* *Lakehouse (via Spark or Kafka Connect)*: Lakehouse platforms combine features of data lakes and data warehouses.
+
++
+TigerGraph supports:
++
+** Apache Iceberg via Kafka Connect
+** Apache Iceberg, DeltaLake (and other Spark-supported sources) via the Spark Connector
 +
-See the pages for the specific method that fits your data source.
+See xref:load-from-iceberg.adoc[Load from Apache Iceberg] for details.
 
-* *Spark*: The TigerGraph xref:data-loading:load-from-spark-dataframe.adoc[Spark Connector] is used with Apache Spark to read data from a Spark DataFrame (or Data Lake) and write to TigerGraph.
-Users can leverage it to connect TigerGraph to the Spark ecosystem and load data from any Spark data sources
+* *Spark*: The TigerGraph xref:data-loading:load-from-spark-dataframe.adoc[Spark Connector] integrates with Apache Spark to load data from Spark DataFrames or lakehouse storage systems into TigerGraph.
+
++
+This approach allows you to leverage the broader Spark ecosystem and its supported data sources.
 
 == Loading Workflow
 
-TigerGraph uses the same workflow for both local file and Kafka Connect loading:
+TigerGraph follows a consistent workflow for loading data, regardless of the source:
 
 . *Specify a graph*.
-Data is always loading to exactly one graph (though that graph could have global vertices and edges which are shared with other graphs). For example:
+Data is always loaded into a single graph. For example:
 +
 [source,gsql]
+----
 USE GRAPH ldbc_snb
+----
 
-. If you are using Kafka Connect, *define a `DATA_SOURCE` object*.
-See the details on the pages for
-xref:load-from-cloud.adoc[cloud storage],
+. If using an external connector, *define a `DATA_SOURCE` object*.
++
+See:
++
+xref:load-from-cloud.adoc[Cloud Storage],
 xref:load-from-warehouse.adoc#_bigquery[BigQuery],
 xref:load-from-warehouse.adoc#_snowflake[Snowflake],
-xref:load-from-warehouse.adoc#_postgresql[PostgreSQL] or
-xref:load-from-kafka.adoc#_configure_the_kafka_source[Kafka]
-
+xref:load-from-warehouse.adoc#_postgresql[PostgreSQL],
+xref:load-from-kafka.adoc[Kafka], or
+xref:load-from-iceberg.adoc[Apache Iceberg].
 
 . *Create a xref:#_loading_jobs[loading job]*.
 
-. *Run your loading job*.
+. *Run the loading job*.
 
 == Loading Jobs
-A loading job tells the database how to construct vertices and edges from data sources.
+
+A loading job defines how data is transformed into vertices and edges in the graph.
 
 [source,gsql]
 .CREATE LOADING JOB syntax
@@ -67,16 +88,16 @@ CREATE LOADING JOB <job_name> FOR GRAPH <graph_name> {
   <LOAD statements>
 }
 ----
-The opening line does some naming:
 
-* assigns a name to this job: (`<job_name>`)
-* associates this job with a graph (`<graph_name>`)
+The loading job definition includes:
+
+* A job name (`<job_name>`)
+* A target graph (`<graph_name>`)
 
-The loading job body has two parts:
+The body of the loading job consists of:
 
-. DEFINE statements create variables to refer to data sources.
-These can refer to actual files or be placeholder names. The actual data sources can be given when running the loading job.
+. *DEFINE statements*: Create variables that reference data sources. These can represent files or external queries.
 
-. LOAD statements specify how to take the data fields from files to construct vertices or edges.
+. *LOAD statements*: Specify how input data fields map to vertices and edges.
 
-NOTE: Refer to the xref:{page-component-version}@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[Creating a Loading Job] documentation for full details
+NOTE: For detailed syntax and examples, see xref:{page-component-version}@gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[Creating a Loading Job].

From 7429bc01d08cf592ab51fa1da123daedcfca3ad7 Mon Sep 17 00:00:00 2001
From: Tushar-TG-14 <tushar.sharma@tigergraph.com>
Date: Wed, 20 May 2026 20:34:18 +0530
Subject: [PATCH 2/2] Update data-loading-overview.adoc

---
 modules/data-loading/pages/data-loading-overview.adoc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/modules/data-loading/pages/data-loading-overview.adoc b/modules/data-loading/pages/data-loading-overview.adoc
index 08384a61e..66e8f4e6a 100644
--- a/modules/data-loading/pages/data-loading-overview.adoc
+++ b/modules/data-loading/pages/data-loading-overview.adoc
@@ -12,8 +12,9 @@ This section provides an overview of how to configure TigerGraph to connect to d
 This diagram shows the supported data source categories, the connectors used to access them, and the TigerGraph components responsible for ingesting the data.
 
 .TigerGraph Data Connector Architecture
-image::data-loading:loading-arch_3.11-rev2.png[Architectural diagram showing supported data sources, connectors, and data ingestion components]
+image::data-loading:data-connector-architecture_4.3.png[Architectural diagram showing supported data sources, connectors, and data ingestion components]
 //  source file: https://graphsql.atlassian.net/wiki/..../Data+Loading+Architecture+with+New+Spark+Connector
+// Prior 4.3 image: loading-arch_3.11-rev2.png
 
 == Data Source Categories
 
@@ -41,6 +42,7 @@ TigerGraph supports:
 +
 ** Apache Iceberg via Kafka Connect
 ** Apache Iceberg, DeltaLake (and other Spark-supported sources) via the Spark Connector
+
 +
 See xref:load-from-iceberg.adoc[Load from Apache Iceberg] for details.