================= PyCDFpp (PyCDF++) ================= .. toctree:: :titlesonly: :hidden: :maxdepth: 2 generated/pycdfpp installation examples/index history contributing authors A NASA's `CDF `_ modern C++ library with Python bindings thanks to `PyBind11 `_. Why? CDF files are still used for space physics missions but few implementations are available. The main one is NASA's C implementation available `here `_ but it lacks multi-threads support (global shared state), has an old C interface and has a license which isn't compatible with most Linux distributions policy. There are also Java and Python implementations which are not usable in C++. Quickstart ========== Reading CDF files ----------------- Basic example from a local file: .. code-block:: python import pycdfpp cdf = pycdfpp.load("some_cdf.cdf") cdf_var_data = cdf["var_name"].values #builds a numpy view or a list of strings attribute_name_first_value = cdf.attributes['attribute_name'][0] Note that you can also load in memory files: .. code-block:: python import pycdfpp import requests import matplotlib.pyplot as plt tha_l2_fgm = pycdfpp.load(requests.get("https://spdf.gsfc.nasa.gov/pub/data/themis/tha/l2/fgm/2016/tha_l2_fgm_20160101_v01.cdf").content) plt.plot(tha_l2_fgm["tha_fgl_gsm"]) plt.show() Buffer protocol support: .. code-block:: python import pycdfpp import requests import xarray as xr import matplotlib.pyplot as plt tha_l2_fgm = pycdfpp.load(requests.get("https://spdf.gsfc.nasa.gov/pub/data/themis/tha/l2/fgm/2016/tha_l2_fgm_20160101_v01.cdf").content) xr.DataArray(tha_l2_fgm['tha_fgl_gsm'], dims=['time', 'components'], coords={'time':tha_l2_fgm['tha_fgl_time'].values, 'components':['x', 'y', 'z']}).plot.line(x='time') plt.show() # Works with matplotlib directly too plt.plot(tha_l2_fgm['tha_fgl_time'], tha_l2_fgm['tha_fgl_gsm']) plt.show() Datetimes handling: .. code-block:: python import pycdfpp import os # Due to an issue with pybind11 you have to force your timezone to UTC for # datetime conversion (not necessary for numpy datetime64) os.environ['TZ']='UTC' mms2_fgm_srvy = pycdfpp.load("mms2_fgm_srvy_l2_20200201_v5.230.0.cdf") # to convert any CDF variable holding any time type to python datetime: epoch_dt = pycdfpp.to_datetime(mms2_fgm_srvy["Epoch"]) # same with numpy datetime64: epoch_dt64 = pycdfpp.to_datetime64(mms2_fgm_srvy["Epoch"]) # note that using datetime64 is ~100x faster than datetime (~2ns/element on an average laptop) Lazy loading ------------ By default, ``pycdfpp.load`` uses lazy loading: variable metadata (name, shape, type, attributes) is read immediately, but variable **values** are only loaded from disk when first accessed. This makes opening large CDF files very fast when you only need a subset of variables. .. code-block:: python import pycdfpp # Lazy loading (default) — fast open, values loaded on demand cdf = pycdfpp.load("large_file.cdf") cdf["Epoch"].values_loaded # False — not yet read from disk data = cdf["Epoch"].values # triggers the actual read cdf["Epoch"].values_loaded # True # Eager loading — all values read upfront cdf = pycdfpp.load("large_file.cdf", lazy_load=False) Writing CDF files ----------------- Creating a basic CDF file: .. code-block:: python import pycdfpp import numpy as np from datetime import datetime cdf = pycdfpp.CDF() cdf.add_attribute("some attribute", [[1,2,3], [datetime(2018,1,1), datetime(2018,1,2)], "hello\nworld"]) cdf.add_variable("some variable", values=np.ones((10),dtype=np.float64)) pycdfpp.save(cdf, "some_cdf.cdf") Working with variable attributes --------------------------------- Variables can have their own attributes (distinct from global CDF attributes). These are commonly used for ISTP metadata such as ``DEPEND_0``, ``FILLVAL``, ``UNITS``, etc. .. code-block:: python import pycdfpp import numpy as np cdf = pycdfpp.CDF() var = cdf.add_variable("Flux", values=np.ones((100, 3), dtype=np.float64)) # Add attributes to a variable var.add_attribute("UNITS", "cm^-2 s^-1") var.add_attribute("DEPEND_0", "Epoch") var.add_attribute("FILLVAL", [-1e31]) # Read variable attributes print(var.attributes["UNITS"].value) # "cm^-2 s^-1" print(list(var.attributes)) # ["UNITS", "DEPEND_0", "FILLVAL"] # Modify an existing attribute value var.attributes["UNITS"].set_value("m^-2 s^-1") Modifying variables ------------------- You can update the values of an existing variable using ``set_values``: .. code-block:: python import pycdfpp import numpy as np cdf = pycdfpp.CDF() var = cdf.add_variable("data", values=np.zeros(10, dtype=np.float64)) # Replace values (must match shape and type, unless force=True) var.set_values(np.ones(10, dtype=np.float64)) # Force replacement with different shape or type var.set_values(np.arange(20, dtype=np.int32), force=True) Using compression ----------------- Variables and entire CDF files can use GZip or RLE compression: .. code-block:: python import pycdfpp import numpy as np cdf = pycdfpp.CDF() # Per-variable compression cdf.add_variable("compressed_var", values=np.arange(1000, dtype=np.float64), compression=pycdfpp.CompressionType.gzip_compression) # Whole-file compression cdf.compression = pycdfpp.CompressionType.gzip_compression pycdfpp.save(cdf, "compressed.cdf") Filtering CDF files ------------------- Use ``CDF.filter`` to create a copy containing only the variables and attributes you need: .. code-block:: python import pycdfpp cdf = pycdfpp.load("large_file.cdf") # Keep only specific variables by name filtered = cdf.filter(variables=["Epoch", "Flux"]) # Keep variables matching a regex pattern filtered = cdf.filter(variables="tha_fg.*") # Keep variables matching a callable filtered = cdf.filter(variables=lambda v: v.type == pycdfpp.DataType.CDF_DOUBLE) # Keep specific global attributes filtered = cdf.filter(variables=["Epoch"], attributes=["Project", "Source_name"]) # Filter in-place (modifies the original) cdf.filter(variables=["Epoch"], inplace=True) Exporting to dictionary ------------------------ Use ``pycdfpp.to_dict_skeleton`` to export the structure of a CDF, variable, or attribute as a plain dictionary (useful for JSON serialization or inspection): .. code-block:: python import pycdfpp import json cdf = pycdfpp.load("some_cdf.cdf") # Export full CDF structure (without variable data) skeleton = pycdfpp.to_dict_skeleton(cdf) print(json.dumps(skeleton, indent=2, default=str)) # Export a single variable's metadata var_info = pycdfpp.to_dict_skeleton(cdf["some_var"]) FAQ === How to control integer attributes types? ---------------------------------------- Use numpy types to control the type of integer attributes: .. code-block:: python import pycdfpp import numpy as np cdf = pycdfpp.CDF() cdf.add_attribute("int8 attribute", np.array([[1, 2, 3]], dtype=np.int8)) cdf.add_attribute("int16 attribute", np.array([[1, 2, 3]], dtype=np.int16)) cdf.add_attribute("int32 attribute", np.array([[1, 2, 3]], dtype=np.int32)) cdf.add_attribute("int64 attribute", np.array([[1, 2, 3]], dtype=np.int64)) # or: cdf.add_attribute("int8 attribute2", [[np.int8(1)]]) Why do I get a segfault when accessing attributes after modifying a CDF? ------------------------------------------------------------------------ PyCDFpp returns lightweight references into the underlying C++ data structures. Adding or removing variables or attributes may reallocate internal storage, which invalidates any previously obtained reference. Always re-fetch references after mutating the CDF: .. code-block:: python import pycdfpp import numpy as np cdf = pycdfpp.CDF() cdf.add_variable("var1", values=np.ones(10)) var1 = cdf["var1"] # UNSAFE — var1 may be invalidated by the next add_variable call cdf.add_variable("var2", values=np.zeros(5)) # var1.values # may segfault! # SAFE — re-fetch after mutation var1 = cdf["var1"] var1.values # ok The same applies to attribute references: do not hold onto a reference returned by ``var.attributes["name"]`` while adding or removing attributes on the same variable. How to make special values (FILLVAL, Pad values, etc.)? ------------------------------------------------------- Use the `pycdfpp.default_fill_value` and `pycdfpp.default_pad_value` functions to create Fill or Pad values depending on the CDF type: .. code-block:: python import pycdfpp import numpy as np cdf = pycdfpp.CDF() cdf.add_variable("int8 variable", values=np.ones((10), dtype=np.int8), attributes={"FILLVAL": [pycdfpp.default_fill_value(pycdfpp.DataType.CDF_INT1)]}) cdf.add_variable("tt2000 variable", values=np.ones((10), dtype=np.int64), attributes={"FILLVAL": [pycdfpp.default_fill_value(pycdfpp.DataType.CDF_TIME_TT2000)]})