Skip to content

Support reading HDFS files with Kerberos authentication #4654

@chanpion

Description

@chanpion

Currently, GraphScope supports reading graph data from local files, object storage, and non-secure HDFS. However, many production Hadoop clusters require Kerberos authentication for security compliance.

When attempting to load graph data from a Kerberized HDFS (e.g., hdfs://namenode:8020/path/to/data), there is no built-in mechanism to provide Kerberos credentials (keytab, principal, krb5.conf), making it impossible to access such data sources directly.

I would like GraphScope to support reading graph data from HDFS clusters with Kerberos authentication enabled. The expected solution could include:

  1. Configuration parameters in graphscope.session() to specify Kerberos settings, e.g.:

    sess = graphscope.session(
        hdfs_kerberos_enabled=True,
        hdfs_kerberos_principal="user/host@REALM",
        hdfs_kerberos_keytab="/path/to/keytab",
        hdfs_krb5_conf="/etc/krb5.conf"  # optional
    )
  2. Automatic authentication before accessing HDFS files, ensuring all Engine pods can authenticate to the NameNode and DataNodes.

  3. Support for both HDFS input (loading graph data) and HDFS output (storing results).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions