Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
"sphinx.ext.intersphinx",
"sphinx.ext.extlinks",
"sphinx.ext.mathjax",
"myst_nb",
]

extlinks = {
Expand Down Expand Up @@ -302,3 +303,7 @@
"numpy": ("https://docs.scipy.org/doc/numpy/", None),
"xarray": ("https://docs.xarray.dev/en/stable/", None),
}

# Myst-NB configuration
nb_execution_mode = "force"
nb_execution_raise_on_error = True
97 changes: 97 additions & 0 deletions doc/dask.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9b9a1d2c-7664-4fd9-b5cb-3a766d907fe7",
"metadata": {},
"source": [
"# Dask integration\n",
"\n",
"recursive-diff supports {class}`xarray.DataArray` and {class}`xarray.Dataset` objects backed by [Dask](https://dask.org). When it compares two such objects, the comparison is optimized to maximise parallelism and minimize memory usage.\n",
"\n",
"In this example, we're going to compare two arrays worth a total of 3 GiB.\n",
"However, because they're lazily defined, the whole comparison will use only a few MiB RAM and will run on all available threads:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e52dd2a-b565-4aee-8aa8-52c4a41e8914",
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"..\")\n",
"\n",
"import dask.array as da\n",
"import xarray\n",
"\n",
"from recursive_diff import display_diffs\n",
"\n",
"a = xarray.DataArray(da.ones((200_000, 1_000)), name=\"ones\")\n",
"b = xarray.DataArray(da.ones((200_000, 1_000)), name=\"ones\")\n",
"a[123_456, 789] = 1.01\n",
"b[133_700, 333] = 1.0000000001 # Below tolerance\n",
"\n",
"display_diffs(a, b)"
]
},
{
"cell_type": "markdown",
"id": "bf4417c1-3989-4512-8ea0-d9b5ecf31ab8",
"metadata": {},
"source": [
"## Dask clusters\n",
"If you have a Dask client active and compare chunked Xarray objects, the comparison will run on the Dask cluster.\n",
"\n",
"In this example we're using a ``LocalCluster``, but this works with remote clusters as well as [Coiled](https://coiled.io) clusters!\n",
"\n",
"You may use {func}`xarray.open_zarr` or {func}`xarray.open_dataset` to open Zarr or NetCDF files on S3, which means that if your client is outside of AWS the data won't transfer over the internet and you won't pay egress charges.\n",
"S3 access not yet supported by {func}`~recursive_diff.recursive_open`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9205439e-3c2b-43d7-a512-b8a4e986ea27",
"metadata": {},
"outputs": [],
"source": [
"import dask.distributed\n",
"\n",
"with dask.distributed.LocalCluster() as cluster, dask.distributed.Client(cluster):\n",
" display_diffs(a, b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5822b326-f3e0-4be0-a015-a734ddbc816d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
4 changes: 3 additions & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,10 @@ Index

quickstart
installing
api
notebooks
dask
extend
api
cli
develop
whats-new
Expand Down
145 changes: 145 additions & 0 deletions doc/notebooks.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9b9a1d2c-7664-4fd9-b5cb-3a766d907fe7",
"metadata": {},
"source": [
"# Working with Jupyter notebooks\n",
"\n",
"{func}`~recursive_diff.display_diffs` can be used to compare two NumPy, Pandas, or Xarray objects in a Jupyter notebook:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "737df1de-16b9-40d2-a704-29022846bbc0",
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"..\")\n",
"\n",
"import xarray\n",
"\n",
"from recursive_diff import display_diffs\n",
"\n",
"a = xarray.Dataset(\n",
" {\n",
" \"v1\": ((\"r\", \"c\"), [[1, 2], [3, 4]]),\n",
" \"v2\": (\"r\", [\"foo\", \"bar\"]),\n",
" \"r\": [\"r1\", \"r2\"],\n",
" \"extra\": [5],\n",
" },\n",
" attrs={\"some_tag\": \"Hello\"},\n",
")\n",
"\n",
"b = xarray.Dataset(\n",
" {\n",
" \"v1\": ((\"r\", \"c\"), [[1, 5], [3.1, 4]]),\n",
" \"v2\": (\"r\", [\"bar\", \"bar\"]),\n",
" \"r\": [\"r1\", \"r2\"],\n",
" },\n",
" attrs={\"some_tag\": \"World\"},\n",
")\n",
"\n",
"\n",
"display_diffs(a, b)"
]
},
{
"cell_type": "markdown",
"id": "cc84e5ad-4b39-4602-9da7-7aa63bbe6cb9",
"metadata": {},
"source": [
"Just like {func}`recursive_diff.recursive_diff`, you may use it to visualize differences in nested structures too:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7565dc16-7dc3-4ade-b8dd-0abfef712677",
"metadata": {},
"outputs": [],
"source": [
"c = {\"foo\": [1, 2, [3, 4]]}\n",
"d = {\"foo\": [1.0000000001, 5, [3]], \"bar\": 6}\n",
"\n",
"display_diffs(c, d)"
]
},
{
"cell_type": "markdown",
"id": "b85aa876-ad19-4393-bcf3-4c0dd866cbec",
"metadata": {},
"source": [
"## Comparing directories\n",
"\n",
"If you have two directories full of data, you can compare them in one go with {func}`~recursive_diff.recursive_open`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a6324359-3d59-4250-b70d-f9cd0d0bbde0",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import tempfile\n",
"\n",
"lhs = tempfile.TemporaryDirectory()\n",
"rhs = tempfile.TemporaryDirectory()\n",
"\n",
"a.to_zarr(f\"{lhs.name}/array.zarr\", mode=\"w\", zarr_format=2)\n",
"b.to_zarr(f\"{rhs.name}/array.zarr\", mode=\"w\", zarr_format=2)\n",
"with open(f\"{lhs.name}/nested.json\", \"w\") as fh:\n",
" json.dump(c, fh)\n",
"with open(f\"{rhs.name}/nested.json\", \"w\") as fh:\n",
" json.dump(d, fh)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1dc6c4ba-ba4a-4baf-8f48-933a0fa83717",
"metadata": {},
"outputs": [],
"source": [
"from recursive_diff import recursive_open\n",
"\n",
"display_diffs(recursive_open(lhs.name), recursive_open(rhs.name))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42ac0b27-178c-4c59-89c8-dd7c37464656",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
9 changes: 9 additions & 0 deletions doc/requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,18 @@ channels:
dependencies:
- python 3.14.*
- python *
- dask-core *
- distributed *
- msgpack-python *
- pyyaml *
- netcdf4 *
- scipy *
- h5netcdf *
- zarr *
- pip *
- sphinx *
- sphinx_rtd_theme *
- myst-nb *
- numpy *
- pandas *
- xarray *
Expand Down
Loading
Loading