Add Pystp netcdf driver by RichardHitier · Pull Request #1 · SciQLop/PyISTP

RichardHitier · 2026-04-14T08:03:30Z

No description provided.

jeandet · 2026-05-06T13:37:12Z

Code review

Found 1 issue:

cdf_type() is inconsistent with values() for CDF_EPOCH variables. values() calls _get_units() which checks both 'units' and 'UNITS', and routes through _is_cdf_epoch() to return datetime64[ns]. cdf_type() only calls v.getncattr('units') (lowercase) and only handles the CF time ('since' in units) branch — there is no CDF_EPOCH branch at all. Verified against the bundled tests/resources/ac_h2s_mfi_cdaweb.nc (whose Epoch has UNITS='ms'): values('Epoch').dtype is datetime64[ns] while cdf_type('Epoch') returns 'CDF_DOUBLE'. Downstream consumers that key off cdf_type to identify time axes will misclassify the epoch.

Lines 128 to 141 in ec4e33c

    
           def cdf_type(self, var): 
        
               v = self._ds[var] 
        
               # CF time variable: float with a "units" attribute containing "since" 
        
               try: 
        
                   units = v.getncattr('units') 
        
                   if isinstance(units, str) and 'since' in units: 
        
                       return 'CDF_TIME_TT2000' 
        
               except AttributeError: 
        
                   pass 
        
               if v.dtype == str: 
        
                   return 'CDF_CHAR' 
        
               dtype_str = v.dtype.str.lstrip('<>=!') 
        
               return self._DTYPE_TO_CDF.get(dtype_str, f'CDF_UNKNOWN_{dtype_str}')

Suggest reusing _get_units() and adding a _is_cdf_epoch(var) branch returning 'CDF_EPOCH' (mirroring values()).

Other items considered but not flagged: netCDF4.Dataset is never closed (no close()/context manager) and the _DTYPE_TO_CDF 'S' key is unreachable (v.dtype.str.lstrip('<>=!') leaves the leading |) — both real but lower-impact; worth a follow-up.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

brenard-irap

I have a few remarks, mostly related to design.
We should find some time in the coming days to discuss them together.

brenard-irap · 2026-05-07T13:13:50Z

 def load(file=None, buffer=None, master_file=None, master_buffer=None) -> _ISTPLoader:
    return _ISTPLoader(file=file, buffer=buffer, master_file=master_file, master_buffer=master_buffer)
+
+


The distinction between the load (for CDF) and load_netcdf methods based on the file format bothers me.
I would have preferred to keep a single load method.

The selection of the driver to use could be handled in the constructor of ISTPLoaderImpl (as is already partially done for the CDF driver type: pycdfpp or spacepy).
The format detection could be implemented by inspecting the first 4 bytes that define the magic number, for example.

Additionally, in load_netcdf, we lose the ability to provide a master file, which I find problematic (see examples below).

I would also like the reading of the data file and the reading of the master file to be independent, meaning they could potentially use different drivers.

Example 1 – ICON mission from CDAWeb:

The master file is provided in CDF: https://cdaweb.gsfc.nasa.gov/pub/software/cdawlib/0MASTERS/icon_l2-6_euv_00000000_v01.cdf
The data files are in netCDF: https://spdf.gsfc.nasa.gov/pub/data/icon/l2/l2-6_euv/

The data files look like ISTP-compliant files, but they are not actually compliant.
For example, in the netCDF data files, Var_Type is used to specify whether a variable is data or support_data.
However, the specification (https://github.com/IHDE-Alliance/ISTP_metadata/blob/main/ISTP_metadata_guidelines/docs/05_metadata-variable-attributes.md#istp-variable-attributes) clearly states: "Note that attribute names are case sensitive, and the names of the ISTP variable attributes must match the case as shown."
Therefore, VAR_TYPE should have been used for the netCDF files to be directly ISTP-compliant.
The master file, on the other hand, is properly ISTP-compliant and does use VAR_TYPE to define the data type.

Example 2 - AMDA:

For AMDA, we are considering decommissioning our DDSERVER data server and replacing it with Speasy.
The data in this database is in netCDF and is not ISTP-compliant.
Regenerating the entire database is not an option (several million files and multiple terabytes in volume).
What I would like to do instead is generate CDF/ISTP-compliant master files for each dataset.
This would put us in a situation similar to the ICON mission from CDAWeb:

master file in CDF / ISTP-compliant

data files in non-ISTP-compliant netCDF
This would avoid the need for AMDA-specific development, which would be ideal.

brenard-irap · 2026-05-07T13:16:53Z

+            driver_factory = current_driver
        if file is not None:
            log.debug(f"Loading {file}")
-        self.cdf = current_driver(file or buffer)


As mentioned in another comment, I would like to take into account the fact that the driver used to read the master file may differ from the one used to read the data file.

brenard-irap · 2026-05-07T13:43:11Z

+        return (unix_ms * 1_000_000).astype('datetime64[ns]')
+
+    def values(self, var, is_metadata_variable=False):  # NOSONAR
+        v = self._ds[var]


In my opinion, it is not the responsibility of the pyistp driver to interpret the data and convert it into datetime64; this should instead be handled by the consuming tool (in our case, the Speasy codec).
The data should be provided as-is, exactly as they appear in the file.
The pyistp library should only identify which variable contains the time information for the other variables.
This seems even more important given that, in the case of netCDF, the type of a time variable is not always clearly defined (unlike CDF, which uses CDF_EPOCH, CDF_EPOCH16, or TT2000).
As stated in the specification:
https://github.com/IHDE-Alliance/ISTP_metadata/blob/main/ISTP_metadata_guidelines/docs/04_metadata-variables.md#netcdf-times

NetCDF files can include the CDF time variables, with CDF_TIME_TT2000 especially recommended, but will require using the CDF library time routines for conversion. Otherwise, netCDF times are typically something like seconds from some specific time epoch, with UNITS = "seconds from 2000-01-01 UTC" or similar. In either case, the ISTP time variable attributes should be added.

If we move this interpretation logic into Speasy, we will be able to adapt it more easily depending on the provider.
For example, in AMDA / DDSERVER, the netCDF data files in our local database use a time format called DDTIME (which is not used anywhere else - for historical reasons).
It would not make sense to implement support for this format in pyistp, whereas in Speasy we could provide a callback mechanism in the netCDF codec to handle such very specific cases.

brenard-irap · 2026-05-07T13:46:21Z

    "Programming Language :: Python :: 3.12",
 ]
 dependencies = ['pycdfpp>=0.6.0']
+


Why make this dependency optional?

brenard-irap · 2026-05-07T13:50:17Z

+except ImportError:
+    pytest.skip("netcdf driver not implemented yet", allow_module_level=True)
+
+


I’ll try to provide you with additional test cases.
However, there are far fewer publicly available datasets in netCDF compared to CDF...

cdf_type() now adresses the cf_time and cdf_epoch branches

- remove the factory_driver argument propagated along ISTPLoaders - create a new _factory_driver() guessing the file format

sonarqubecloud · 2026-05-12T14:57:15Z

Quality Gate failed

Failed conditions
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

sonarqubecloud · 2026-05-19T14:24:48Z

Quality Gate failed

Failed conditions
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

jeandet · 2026-05-25T12:20:42Z

+def _resolve_epoch_virtual(cdf: Driver, axis_var: str) -> str:
+    # Some SPDF masters define DEPEND_0=Epoch_cdf, a VIRTUAL variable computed
+    # from Epoch. NetCDF data files only store Epoch directly, so we fall back
+    # to Epoch when Epoch_cdf is absent.
+    if axis_var == 'Epoch_cdf' and not cdf.has_variable('Epoch_cdf') and cdf.has_variable('Epoch'):
+        return 'Epoch'
+    return axis_var


Not sure about this here, we should not infer names at first, that will for sure break files at some point. This also won't work with files that store datasets with different time resolutions.

Yes, that what a quick workaround so we could adresse the Epoch_cdf DEPEND_0 attribute value.
Now, off course I would be happy to discuss about it (VIRTUAL variables in the master_cdf) and find a much cleaner and elegant solution.

Yeah, we discussed with @brenard-irap about this, indeed, this is not a trivial issue.
We kind of concluded that this dataset is not the right example after all, the virtual variable is a CDA thing and Speasy already use the API when a product has a virtual variable (either values or axis). In other words Speasy won't read NetCDF files where a cdf master has virtual variables.

ok. That leads me two the following questions:
1- should I remove that method ?
2- on what dataset could I run tests ?

Sorry for the delay,

ok. That leads me two the following questions: 1- should I remove that method ? 2- on what dataset could I run tests ?

I would say yes

Sadly, I have no idea so far, maybe @brenard-irap ? Worst case scenario, we make a synthetic test until we have such example and fix later in case we have bugs, the most important think is to no alter the current behavior with CDF files.

Sorry for the late reply, but it wasn’t easy to find a real example for testing...

The use of a virtual function to convert times into CDF_EPOCH is almost applied across all netCDF datasets at CDAWeb.

You will find the data file here and the master file here.

import pyistp loader = pyistp.load(file="daili_l2_o2-density-profiles_20220303_v7.nc", master_file="daili_l2_o2-density-profiles_00000000_v01.cdf") var = loader.data_variable("O2DENSITY") print(var.values) print(var.axes) print(var.axes[0].values)

Everything seems to be working fine to me. I suggest using it in the test test_nc_with_cdf_master instead of the ICON dataset, and removing the workaround with _resolve_epoch_virtual.

Thanks

Hi Richard,

I think you forgot to commit the following files in tests/resources:

icon_l2-6_euv_20220307_v03r004.nc

daili_l2_o2-density-profiles_20220303_v7.nc

daili_l2_o2-density-profiles_00000000_v01.cdf

yes, sorry, its fixed

jeandet · 2026-06-02T13:18:00Z

+    else:
+        with open(file_or_buffer, "rb") as f:
+            magic = f.read(4)
+    if magic == b'\x89HDF':


if magic in (b'\x89HDF', b'CDF'): ...

Tested on some random netCDF demo files:

❯ hexdump -C -n8 ~/Downloads/pres_temp_4D.nc 00000000 43 44 46 01 00 00 00 02 |CDF.....| 00000008 ❯ hexdump -C -n8 ~/Downloads/foo.nc 00000000 43 44 46 01 00 00 00 00 |CDF.....| 00000008

sonarqubecloud · 2026-06-02T13:34:43Z

Quality Gate failed

Failed conditions
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

RichardHitier · 2026-06-02T13:46:00Z

THANKS !

RichardHitier added 7 commits April 13, 2026 17:35

Add new unit tests

9cdb298

Add optionnal dependency

89d5e2d

NetCDF driver: first draft

72c70a0

Fix NetCDF driver time detection for CDAWeb ISTP files

f090582

Fix pep8 style

13000cc

Add dev dependencies for CI

598114e

Add load_netcdf() with new 'driver_factory' parameter

ec4e33c

jeandet requested review from brenard-irap and jeandet April 28, 2026 16:54

jeandet added the enhancement New feature or request label Apr 28, 2026

brenard-irap reviewed May 7, 2026

View reviewed changes

brenard-irap mentioned this pull request May 7, 2026

Add new net_cdf4 codec SciQLop/speasy#288

Open

RichardHitier added 2 commits May 11, 2026 10:51

Fix the values()/cdf_type() inconsistency

60ccb5e

cdf_type() now adresses the cf_time and cdf_epoch branches

Use a unique load() method for all file formats

c7bb13d

- remove the factory_driver argument propagated along ISTPLoaders - create a new _factory_driver() guessing the file format

RichardHitier added 7 commits May 12, 2026 17:07

Netcd4 dependency is mandatory now

254dd2c

Reformat Code

9380416

Fix VAR_TYPE case-insensitive lookup

806ffef

Fix DEPEND_N case-insensitive lookup

575f590

Fix UNITS case-insensitive lookup

b8bf933

Add support for milliseconds-since-unix-epoch time

2b3ff37

Add tests for ICON NC file support

bf2920b

jeandet reviewed May 25, 2026

View reviewed changes

RichardHitier force-pushed the pystp_netcdf branch from 3222f03 to 1e3ceed Compare May 28, 2026 08:54

Add tests for netCDF with master cdf

615d5bb

RichardHitier force-pushed the pystp_netcdf branch from 1e3ceed to 615d5bb Compare May 28, 2026 13:27

jeandet reviewed Jun 2, 2026

View reviewed changes

brenard-irap approved these changes Jun 2, 2026

View reviewed changes

Support the two official netCDF magic numbers

cf54ca2

jeandet merged commit cfe8eb8 into SciQLop:main Jun 2, 2026
19 of 21 checks passed

		def load(file=None, buffer=None, master_file=None, master_buffer=None) -> _ISTPLoader:
		return _ISTPLoader(file=file, buffer=buffer, master_file=master_file, master_buffer=master_buffer)

		except ImportError:
		pytest.skip("netcdf driver not implemented yet", allow_module_level=True)

Conversation

RichardHitier commented Apr 14, 2026

Uh oh!

jeandet commented May 6, 2026

Code review

Uh oh!

brenard-irap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 12, 2026

Quality Gate failed

Uh oh!

sonarqubecloud Bot commented May 19, 2026

Quality Gate failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 2, 2026

Quality Gate failed

Uh oh!

Uh oh!

RichardHitier commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants