Support reading HDFS files with Kerberos authentication

Currently, GraphScope supports reading graph data from local files, object storage, and non-secure HDFS. However, many production Hadoop clusters require Kerberos authentication for security compliance. 

When attempting to load graph data from a Kerberized HDFS (e.g., `hdfs://namenode:8020/path/to/data`), there is no built-in mechanism to provide Kerberos credentials (keytab, principal, krb5.conf), making it impossible to access such data sources directly.

I would like GraphScope to support reading graph data from HDFS clusters with Kerberos authentication enabled. The expected solution could include:

1. **Configuration parameters** in `graphscope.session()` to specify Kerberos settings, e.g.:
   ```python
   sess = graphscope.session(
       hdfs_kerberos_enabled=True,
       hdfs_kerberos_principal="user/host@REALM",
       hdfs_kerberos_keytab="/path/to/keytab",
       hdfs_krb5_conf="/etc/krb5.conf"  # optional
   )

2. Automatic authentication before accessing HDFS files, ensuring all Engine pods can authenticate to the NameNode and DataNodes.

3. Support for both HDFS input (loading graph data) and HDFS output (storing results).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reading HDFS files with Kerberos authentication #4654

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support reading HDFS files with Kerberos authentication #4654

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions