Currently, GraphScope supports reading graph data from local files, object storage, and non-secure HDFS. However, many production Hadoop clusters require Kerberos authentication for security compliance.
When attempting to load graph data from a Kerberized HDFS (e.g., hdfs://namenode:8020/path/to/data), there is no built-in mechanism to provide Kerberos credentials (keytab, principal, krb5.conf), making it impossible to access such data sources directly.
I would like GraphScope to support reading graph data from HDFS clusters with Kerberos authentication enabled. The expected solution could include:
-
Configuration parameters in graphscope.session() to specify Kerberos settings, e.g.:
sess = graphscope.session(
hdfs_kerberos_enabled=True,
hdfs_kerberos_principal="user/host@REALM",
hdfs_kerberos_keytab="/path/to/keytab",
hdfs_krb5_conf="/etc/krb5.conf" # optional
)
-
Automatic authentication before accessing HDFS files, ensuring all Engine pods can authenticate to the NameNode and DataNodes.
-
Support for both HDFS input (loading graph data) and HDFS output (storing results).
Currently, GraphScope supports reading graph data from local files, object storage, and non-secure HDFS. However, many production Hadoop clusters require Kerberos authentication for security compliance.
When attempting to load graph data from a Kerberized HDFS (e.g.,
hdfs://namenode:8020/path/to/data), there is no built-in mechanism to provide Kerberos credentials (keytab, principal, krb5.conf), making it impossible to access such data sources directly.I would like GraphScope to support reading graph data from HDFS clusters with Kerberos authentication enabled. The expected solution could include:
Configuration parameters in
graphscope.session()to specify Kerberos settings, e.g.:Automatic authentication before accessing HDFS files, ensuring all Engine pods can authenticate to the NameNode and DataNodes.
Support for both HDFS input (loading graph data) and HDFS output (storing results).