diff --git a/.readthedocs.yml b/.readthedocs.yml index 300ad3ce..50730e90 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -6,20 +6,22 @@ build: python: "3.11" jobs: post_checkout: + # Needed so setuptools_scm can determine the version from git tags. - git fetch --unshallow || true - - cp docs/*.md docs/sphinx/ || true - - cp docs/*.svg docs/sphinx/ || true - pre_build: - - cd docs && sphinx-apidoc -f -o sphinx -e -M ../src/simdb - + python: install: - method: pip path: . extra_requirements: + # `all` (server, postgres, imas-validator) so autodoc can import every + # module for the API reference; `build-docs` for Sphinx itself. + - all - build-docs sphinx: builder: html - configuration: docs/sphinx/conf.py + # The CLI reference and Python API pages are generated from conf.py at build + # time, so no pre_build / copy steps are needed. + configuration: docs/conf.py fail_on_warning: false diff --git a/README.md b/README.md index 68cd03b8..210264f3 100644 --- a/README.md +++ b/README.md @@ -4,122 +4,55 @@ [![Documentation Status](https://readthedocs.org/projects/simdb/badge/?version=latest)](https://simdb.readthedocs.io/en/latest/) [![CI](https://github.com/iterorganization/SimDB/actions/workflows/build_and_test.yml/badge.svg)](https://github.com/iterorganization/SimDB/actions) ---- - -## Overview - -**SimDB** is a powerful tool designed to track, manage, upload, and query simulations. Simulation data can be tagged with metadata, managed locally, and seamlessly transferred to remote SimDB services. Uploaded simulations can then be queried based on metadata. - ---- - -## Features - -- **CLI Tool:** Intuitive command line tool for all major operations. -- **Metadata Tagging:** Associate simulations with flexible, searchable metadata. -- **Remote Sync:** Transfer data to/from remote SimDB servers. -- **Developer Friendly:** Easy setup for contributing & extending codebase. - ---- +**SimDB** tracks, manages, validates and shares scientific simulations. You +describe a simulation and its data in a small manifest file, ingest it into your +local catalogue, and push it to a shared SimDB server where colleagues can query, +validate and reuse it. It is built for [IMAS](https://imas.iter.org/) +fusion-simulation workflows. ## Quickstart -Install SimDB (requires Python 3.11+): +SimDB requires Python 3.11 or newer. ```bash pip install imas-simdb -``` - -SimDB version: - -```bash simdb --version -simdb remote [NAME] version ``` -Ingest and upload your first simulation: +Catalogue a simulation and share it: ```bash -simdb simulation ingest -a SIM_ID MANIFEST_FILE -simdb simulation push [REMOTE] SIM_ID +simdb manifest create manifest.yaml # create and edit a manifest +simdb simulation ingest manifest.yaml # add it to your local catalogue +simdb simulation push [REMOTE] SIM_ID # push it to a server ``` Query simulations by metadata: ```bash -simdb simulation query [OPTIONS] [CONSTRAINTS] -simdb remote [REMOTE] query [OPTIONS] [CONSTRAINTS] +simdb simulation query code.name=SOLPS-ITER # local +simdb remote [REMOTE] query code.name=SOLPS-ITER # on a server ``` -_where:_ -- `SIM_ID` — UUID or alias for your simulation -- `REMOTE` — The remote server name (as configured locally) -- `MANIFEST_FILE` — YAML document that describes your simulation and its associated data -- `OPTION` - Additional optional parameters for the given command (see `--help` output) - -[See full installation guide in the documentation →](https://simdb.readthedocs.io/en/latest/install_guide.html) - ---- -## Command Line Interface +See the [quickstart guide](https://simdb.readthedocs.io/en/latest/getting-started/quickstart.html) +for a full walkthrough. -SimDB provides a CLI tool to manage your simulation workflow. -To view help and subcommands: - -```bash -simdb --help -``` +## Documentation -[Full CLI usage reference →](https://simdb.readthedocs.io/en/latest/cli.html) +Full documentation is at **[simdb.readthedocs.io](https://simdb.readthedocs.io/en/latest/)**: ---- - -## Usage Examples - -- Uploading data: - ```bash - simdb simulation ingest -a my_simulation my_sim_manifest.yaml - simdb simulation push ITER my_simulation - ``` -- Querying simulations: - ```bash - simdb simulation query code.name=ITER - simdb remote ITER query code.name=ITER - alias code.name - -------------------- - 103027/3 SOLPS-ITER - 103028/3 SOLPS-ITER - ``` - ---- - -## Accessing ITER Remotes - -To access data from the ITER remotes outside ITER systems, you'll need to [configure a SimDB remote](https://simdb.readthedocs.io/en/latest/iter_remotes.html). - ---- - -## Server Setup - -Setting up and maintaining a remote CLI server is documented [here](https://simdb.readthedocs.io/en/latest/maintenance_guide.html). - ---- - -## Developer Guide - -Want to contribute or run SimDB from source? -[See the developer guide →](https://simdb.readthedocs.io/en/latest/developer_guide.html) - ---- +- [Installation](https://simdb.readthedocs.io/en/latest/getting-started/installation.html) +- [Tutorial: catalogue your first simulation](https://simdb.readthedocs.io/en/latest/tutorials/first-simulation.html) +- [CLI reference](https://simdb.readthedocs.io/en/latest/reference/cli.html) +- [Connect to ITER](https://simdb.readthedocs.io/en/latest/how-to/connect-to-iter.html) +- [Operating a server](https://simdb.readthedocs.io/en/latest/how-to/operate-server/install-server.html) +- [Developer guide](https://simdb.readthedocs.io/en/latest/how-to/contribute/set-up-dev-env.html) ## License -The software is licensed under the **LGPLv3** License which allows for extensive freedom in using, modifying, and distributing it, provided that the license terms are met. -Details can be found in [LICENSE-LGPL](LICENSE.txt). - ---- +SimDB is licensed under the **LGPLv3** license. See [LICENSE.txt](LICENSE.txt). ## Contact -- Issues & Feature Requests: [GitHub Issues](https://github.com/deepakmaroo/SimDB/issues) +- Issues and feature requests: [GitHub Issues](https://github.com/iterorganization/SimDB/issues) - Documentation: [simdb.readthedocs.io](https://simdb.readthedocs.io/en/latest/) - ---- diff --git a/docs/.gitignore b/docs/.gitignore index fda04c5f..fd5503c4 100644 --- a/docs/.gitignore +++ b/docs/.gitignore @@ -1,4 +1,7 @@ -_build -/sphinx/*.rst -/sphinx/*.md -/sphinx/*.svg \ No newline at end of file +# Sphinx build output +_build/ + +# Generated at build time (see conf.py) +reference/cli.md +reference/python-api/simdb*.rst +reference/python-api/modules.rst diff --git a/docs/Makefile b/docs/Makefile index 44d7ed10..75b9c8dd 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -1,25 +1,25 @@ -# Minimal makefile for Sphinx documentation +# Minimal makefile for Sphinx documentation. # +# The documentation source lives directly in this directory (docs/). The CLI +# reference and Python API pages are generated automatically by conf.py during +# the build, so `make html` produces a complete site with no extra steps. -# You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build -SOURCEDIR = sphinx +SOURCEDIR = . BUILDDIR = _build -# Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) -.PHONY: help html Makefile +clean: + rm -rf "$(BUILDDIR)" + rm -f reference/cli.md + rm -f reference/python-api/simdb*.rst reference/python-api/modules.rst -# Copy source files from docs/ into sphinx/ before building (mirrors .readthedocs.yml) -html: Makefile - cp ../*.md $(SOURCEDIR)/ 2>/dev/null || true - cp ../*.svg $(SOURCEDIR)/ 2>/dev/null || true - @$(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) +.PHONY: help clean Makefile # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile - @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) \ No newline at end of file + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/cli.md b/docs/cli.md deleted file mode 100644 index 37875bdf..00000000 --- a/docs/cli.md +++ /dev/null @@ -1,760 +0,0 @@ -# SimDB CLI commands - - -```text -Usage: simdb [OPTIONS] COMMAND [ARGS]... - -Options: - --version Show the version and exit. - -d, --debug Run in debug mode. - -v, --verbose Run with verbose output. - -c, --config-file FILENAME Config file to load. - --help Show this message and exit. - -Commands: - alias Query remote and local aliases. - config Query/update application configuration. - database Manage local simulation database. - manifest Create/check manifest file. - provenance Create the PROVENANCE_FILE from the current system. - remote Interact with the remote SimDB service. - sim Alias for simulation. - simulation Manage ingested simulations. -``` - -## Alias - - -```text -Usage: simdb alias [OPTIONS] [REMOTE] COMMAND [ARGS]... - - Query remote and local aliases. - -Options: - --username TEXT Username used to authenticate with the remote. - --password TEXT Password used to authenticate with the remote. - --help Show this message and exit. - -Commands: - list List aliases from the local database and the REMOTE (if... - make-unique Make the given alias unique, checking locally stored... - search Search the REMOTE for all aliases that contain the given... -``` - - -```text -Usage: simdb alias [REMOTE] list [OPTIONS] - - List aliases from the local database and the REMOTE (if specified). - -Options: - --local Only list the local aliases. - --help Show this message and exit. -``` - - -```text -Usage: simdb alias [REMOTE] make-unique [OPTIONS] ALIAS - - Make the given alias unique, checking locally stored simulations and the - remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb alias [REMOTE] search [OPTIONS] ALIAS - - Search the REMOTE for all aliases that contain the given VALUE. - -Options: - --help Show this message and exit. -``` - -## Config - - -```text -Usage: simdb config [OPTIONS] COMMAND [ARGS]... - - Query/update application configuration. - -Options: - --help Show this message and exit. - -Commands: - delete Delete the OPTION. - get Get the OPTION. - list List all configurations OPTIONS set. - path Print the location of the user configuration file. - set Set the OPTION to the given VALUE. -``` - - -```text -Usage: simdb config delete [OPTIONS] OPTION - - Delete the OPTION. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb config get [OPTIONS] OPTION - - Get the OPTION. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb config list [OPTIONS] - - List all configurations OPTIONS set. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb config path [OPTIONS] - - Print the location of the user configuration file. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb config set [OPTIONS] OPTION VALUE - - Set the OPTION to the given VALUE. - -Options: - --help Show this message and exit. -``` - - -## Manifest - - -```text -Usage: simdb manifest [OPTIONS] COMMAND [ARGS]... - - Create/check manifest file. - -Options: - --help Show this message and exit. - -Commands: - check Check manifest FILE_NAME. - create Create a new MANIFEST_FILE. -``` - - -```text -Usage: simdb manifest check [OPTIONS] FILE_NAME - - Check manifest FILE_NAME. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb manifest create [OPTIONS] MANIFEST_FILE - - Create a new MANIFEST_FILE. - -Options: - --help Show this message and exit. -``` - -## Provenance - - -```text -Usage: simdb provenance [OPTIONS] PROVENANCE_FILE - - Create the PROVENANCE_FILE from the current system. - -Options: - --help Show this message and exit. -``` - -## Remote - - -```text -Usage: simdb remote [OPTIONS] [NAME] COMMAND [ARGS]... - - Interact with the remote SimDB service. - - If NAME is provided this determines which remote server to communicate with, - otherwise the server in the config file with default=True is used. - -Options: - --username TEXT Username used to authenticate with the remote. - --password TEXT Password used to authenticate with the remote. - --help Show this message and exit. - -Commands: - admin Run admin commands on REMOTE SimDB server (requires admin... - config Configure the available remotes. - directory Print the storage directory of the remote. - info Print information about simulation with given SIM_ID (UUID... - list List simulations available on remote. - query Perform a metadata query to find matching remote simulations. - schema Show validation schemas for the given remote. - test Test that the remote is valid. - token Manage user authentication tokens. - trace Print provenance trace of simulation with given SIM_ID (UUID... - version Show the SimDB version of the remote. - watcher Manage simulation watchers on REMOTE SimDB server. -``` - - -```text -Usage: simdb remote [NAME] admin [OPTIONS] COMMAND [ARGS]... - - Run admin commands on REMOTE SimDB server (requires admin privileges). - - Requires user to have admin privileges on remote. - -Options: - --help Show this message and exit. - -Commands: - del-meta Remove a metadata value for the given simulation. - delete Delete a simulation. - set-meta Add or update a metadata value for the given simulation. - set-status Update the status metadata value for the given simulation. -``` - - -```text -Usage: simdb remote [NAME] admin del-meta [OPTIONS] SIM_ID KEY - - Remove a metadata value for the given simulation. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] admin delete [OPTIONS] SIM_ID - - Delete a simulation. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] admin set-meta [OPTIONS] SIM_ID KEY VALUE - - Add or update a metadata value for the given simulation. - -Options: - -t, --type [string|UUID|int|float] - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] admin set-status [OPTIONS] SIM_ID {NOT_VALIDATED|AC - CEPTED|FAILED|PASSED|DEPRECATED|DE - LETED} - - Update the status metadata value for the given simulation. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] config [OPTIONS] COMMAND [ARGS]... - - Configure the available remotes. - -Options: - --help Show this message and exit. - -Commands: - default Print the default remote. - delete Delete a remote. - get-default Get the name of the default remote. - list List available remotes. - new Add a new remote. - set-default Set a remote as default. - set-option Set a configuration option for a given remote. -``` - - -```text -Usage: simdb remote [NAME] config default [OPTIONS] - - Print the default remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] config delete [OPTIONS] NAME - - Delete a remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] config get-default [OPTIONS] - - Get the name of the default remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] config list [OPTIONS] - - List available remotes. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] config new [OPTIONS] NAME URL - - Add a new remote. - -Options: - --firewall [F5] Specify the remote is behind a login firewall and what type - it is. - --username TEXT Username to use for remote. - --default Set the new remote as the default. - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] config set-default [OPTIONS] NAME - - Set a remote as default. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] config set-option [OPTIONS] NAME OPTION VALUE - - Set a configuration option for a given remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] directory [OPTIONS] - - Print the storage directory of the remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] info [OPTIONS] SIM_ID - - Print information about simulation with given SIM_ID (UUID or alias) from - remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] list [OPTIONS] - - List simulations available on remote. - -Options: - -m, --meta-data NAME Additional meta-data field to print. - -l, --limit INTEGER Limit number of returned entries (use 0 for no limit). - [default: 100] - --uuid Include UUID in the output. - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] query [OPTIONS] [CONSTRAINTS]... - - Perform a metadata query to find matching remote simulations. - - Each constraint must be in the form: - NAME=[mod]VALUE - - Where `[mod]` is an optional query modifier. Available query modifiers are: - eq: - This checks for equality (this is the same behaviour as not providing any modifier). - in: - This searches inside the value instead of looking for exact matches. - gt: - This checks for values greater than the given quantity. - agt: - This checks for any array elements are greater than the given quantity. - ge: - This checks for values greater than or equal to the given quantity. - age: - This checks for any array elements are greater than or equal to the given quantity. - lt: - This checks for values less than the given quantity. - alt: - This checks for any array elements are less than the given quantity. - le: - This checks for values less than or equal to the given quantity. - ale: - This checks for any array elements are less than or equal to the given quantity. - - Modifier examples: - alias=eq:foo performs exact match - summary.code.name=in:foo matches all names containing foo - summary.heating_current_drive.power_additional.value=agt:0 matches all simulations where any array element - of summary.heating_current_drive.power_additional.value is greater than 0 - - Any string comparisons are done in a case-insensitive manner. If multiple constraints are provided then simulations - are returned that match all given constraints. - - Examples: - sim remote query workflow.name=in:test finds all simulations where workflow.name contains test - (case-insensitive) - sim remote query pulse=gt:1000 run=0 finds all simulations where pulse is > 1000 and run = 0 - -Options: - -m, --meta-data TEXT Additional meta-data field to print. - -l, --limit INTEGER Limit number of returned entries (use 0 for no limit). - [default: 100] - --uuid Include UUID in the output. - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] schema [OPTIONS] - - Show validation schemas for the given remote. - -Options: - -d, --depth INTEGER Limit the depth of elements of the schema printed to - the console. [default: 2] - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] test [OPTIONS] - - Test that the remote is valid. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] token [OPTIONS] COMMAND [ARGS]... - - Manage user authentication tokens. - -Options: - --help Show this message and exit. - -Commands: - delete Delete the existing token for the given remote. - new Create a new token for the given remote. -``` - - -```text -Usage: simdb remote [NAME] token delete [OPTIONS] - - Delete the existing token for the given remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] token new [OPTIONS] - - Create a new token for the given remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] trace [OPTIONS] SIM_ID - - Print provenance trace of simulation with given SIM_ID (UUID or alias) from - remote. - - This shows a history of simulations that this simulation has replaced or - been replaced by and what those simulations replaced or where replaced by - and so on. - - If the outputs of this simulation are used as inputs of other simulations or - if the inputs are generated by other simulations then these dependencies are - also reported. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] version [OPTIONS] - - Show the SimDB version of the remote. - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] watcher [OPTIONS] COMMAND [ARGS]... - - Manage simulation watchers on REMOTE SimDB server. - -Options: - --help Show this message and exit. - -Commands: - add Register a user as a watcher for a simulation with given SIM_ID... - list List watchers for simulation with given SIM_ID (UUID or alias). - remove Remove a user from list of watchers on a simulation with given... -``` - - -```text -Usage: simdb remote [NAME] watcher add [OPTIONS] SIM_ID - - Register a user as a watcher for a simulation with given SIM_ID (UUID or - alias). - -Options: - -u, --user TEXT Name of the user to add as a watcher. - -e, --email TEXT Email of the user to add as a watcher. - -n, --notification [VALIDATION|REVISION|OBSOLESCENCE|ALL] - [default: ALL] - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] watcher list [OPTIONS] SIM_ID - - List watchers for simulation with given SIM_ID (UUID or alias). - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb remote [NAME] watcher remove [OPTIONS] SIM_ID - - Remove a user from list of watchers on a simulation with given SIM_ID (UUID - or alias). - -Options: - -u, --user TEXT Name of the user to remove as a watcher. - --help Show this message and exit. -``` - -## Simulation - - -```text -Usage: simdb simulation [OPTIONS] COMMAND [ARGS]... - - Manage ingested simulations. - -Options: - --help Show this message and exit. - -Commands: - delete Delete the ingested simulation with given SIM_ID (UUID or... - info Print information on the simulation with given SIM_ID (UUID... - ingest Ingest a MANIFEST_FILE. - list List ingested simulations. - modify Modify the ingested simulation. - pull Pull the simulation with the given SIM_ID (UUID or alias)... - push Push the simulation with the given SIM_ID (UUID or alias) to... - query Perform a metadata query to find matching local simulations. - validate Validate the ingested simulation with given SIM_ID (UUID or... -``` - - -```text -Usage: simdb simulation delete [OPTIONS] SIM_ID - - Delete the ingested simulation with given SIM_ID (UUID or alias). - - Use --all to reset the local database and delete all simulations. - -Options: - --all Reset the local database, deleting all simulations. - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation info [OPTIONS] SIM_ID - - Print information on the simulation with given SIM_ID (UUID or alias). - -Options: - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation ingest [OPTIONS] MANIFEST_FILE - - Ingest a MANIFEST_FILE. - -Options: - -a, --alias TEXT Alias to give to simulation (overwrites any set in - manifest). - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation list [OPTIONS] - - List ingested simulations. - -Options: - -m, --meta-data TEXT Additional meta-data field to print. - -l, --limit INTEGER Limit number of returned entries (use 0 for no limit). - [default: 100] - --uuid Include UUID in the output. - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation modify [OPTIONS] SIM_ID - - Modify the ingested simulation. - -Options: - -a, --alias ALIAS New alias. - --set-meta NAME=VALUE Add new meta or update existing. - --del-meta NAME Delete metadata entry. - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation pull [OPTIONS] [REMOTE] SIM_ID DIRECTORY - - Pull the simulation with the given SIM_ID (UUID or alias) from the REMOTE. - -Options: - --username TEXT Username used to authenticate with the remote. - --password TEXT Password used to authenticate with the remote. - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation push [OPTIONS] [REMOTE] SIM_ID - - Push the simulation with the given SIM_ID (UUID or alias) to the REMOTE. - -Options: - --username TEXT Username used to authenticate with the remote. - --password TEXT Password used to authenticate with the remote. - --replaces TEXT SIM_ID of simulation to deprecate and replace. - --add-watcher Add the current user as a watcher of the simulation. - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation query [OPTIONS] [CONSTRAINTS]... - - Perform a metadata query to find matching local simulations. - - Each constraint must be in the form: - NAME=[mod]VALUE - - Where `[mod]` is an optional query modifier. Available query modifiers are: - eq: - This checks for equality (this is the same behaviour as not providing any modifier). - ne: - This checks for value that do not equal. - in: - This searches inside the value instead of looking for exact matches. - ni: - This searches inside the value for elements that do not match. - gt: - This checks for values greater than the given quantity. - ge: - This checks for values greater than or equal to the given quantity. - lt: - This checks for values less than the given quantity. - le: - This checks for values less than or equal to the given quantity. - - For the following modifiers, VALUE should not be provided. exist: - This - returns simulations where metadata with NAME exists, regardless of the - value. - - Modifier examples: - responsible_name=foo performs exact match - responsible_name=in:foo matches all names containing foo - pulse=gt:1000 matches all pulses > 1000 - sequence=exist: matches all simulations that have "sequence" metadata values - - Any string comparisons are done in a case-insensitive manner. If multiple constraints are provided then simulations - are returned that match all given constraints. - - Examples: - sim simulation query workflow.name=in:test finds all simulations where workflow.name contains test - (case-insensitive) - sim simulation query pulse=gt:1000 run=0 finds all simulations where pulse is > 1000 and run = 0 - -Options: - -m, --meta-data TEXT Additional meta-data field to print. - --uuid Include UUID in the output. - --help Show this message and exit. -``` - - -```text -Usage: simdb simulation validate [OPTIONS] [REMOTE] SIM_ID - - Validate the ingested simulation with given SIM_ID (UUID or alias) using - validation schema from REMOTE. - -Options: - --username TEXT Username used to authenticate with the remote. - --password TEXT Password used to authenticate with the remote. - --help Show this message and exit. -``` - diff --git a/docs/cli.md.in b/docs/cli.md.in index c1b8371b..a4114b2f 100644 --- a/docs/cli.md.in +++ b/docs/cli.md.in @@ -1,32 +1,36 @@ -# SimDB CLI commands +# CLI reference + +This page is the complete reference for the `simdb` command line interface. It +is generated automatically from `simdb --help`, so it always matches the +installed version of SimDB. Run `simdb COMMAND --help` at any time to get the +same information in your terminal. + +For a task-oriented introduction, start with the +[tutorial](../tutorials/first-simulation.md) and the +[how-to guides](../how-to/create-a-manifest.md). {{ }} -## Alias +## alias {{ alias }} -## Config +## config {{ config }} -## Database - -{{ database }} - -## Manifest +## manifest {{ manifest }} -## Provenance +## provenance {{ provenance }} -## Remote +## remote {{ remote }} -## Simulation +## simulation {{ simulation }} - diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 00000000..f0c35fee --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,158 @@ +# -*- coding: utf-8 -*- +# +# Configuration file for the Sphinx documentation builder. +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +import os +import subprocess +import sys +from pathlib import Path + +from sphinx.util import logging + +# Make the SimDB package importable for autodoc. +sys.path.insert(0, os.path.abspath("../src")) +import simdb # noqa: E402 (must follow the sys.path insert above) + +logger = logging.getLogger(__name__) +DOCS_DIR = Path(__file__).parent.resolve() + +# -- Project information ----------------------------------------------------- + +project = "IMAS Simulation Database Management Tool" +copyright = "2018-2025, ITER Organization" +author = "ITER Organization" + +version = ".".join(simdb.version.split(".")[:2]) +project += f" Version {version}" +release = simdb.__version__ + +# -- General configuration --------------------------------------------------- + +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.napoleon", + "sphinx.ext.viewcode", + "sphinx.ext.intersphinx", + "sphinx.ext.mathjax", + "myst_parser", + "sphinx_immaterial", +] + +source_suffix = { + ".rst": "restructuredtext", + ".md": "markdown", +} + +master_doc = "index" +language = "en" +exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] +pygments_style = None + +# -- MyST (Markdown) configuration ------------------------------------------- + +myst_enable_extensions = [ + "colon_fence", + "deflist", + "substitution", +] +# Auto-generate anchors for headings up to level 3 so that cross-document links +# such as ``[](../reference/manifest-format.md#metadata)`` resolve. +myst_heading_anchors = 3 + +# -- Autodoc configuration --------------------------------------------------- + +autodoc_default_options = { + "members": True, + "undoc-members": True, + "show-inheritance": True, +} +# Optional backends that need system libraries or are not pip-installable; mock +# them so autodoc can still import the modules that reference them. +autodoc_mock_imports = ["uda", "ldap", "easyad", "keycloak"] + +intersphinx_mapping = { + "python": ("https://docs.python.org/3", None), +} + +# -- HTML output ------------------------------------------------------------- + +html_theme = "sphinx_immaterial" +html_title = "SimDB" +html_static_path = ["_static"] + +html_theme_options = { + "palette": [ + { + "media": "(prefers-color-scheme: light)", + "scheme": "default", + "primary": "blue", + "accent": "light-blue", + "toggle": { + "icon": "material/lightbulb-outline", + "name": "Switch to dark mode", + }, + }, + { + "media": "(prefers-color-scheme: dark)", + "scheme": "slate", + "primary": "blue", + "accent": "light-blue", + "toggle": { + "icon": "material/lightbulb", + "name": "Switch to light mode", + }, + }, + ], + "features": [ + "navigation.tabs", + "navigation.top", + "toc.follow", + "search.share", + ], + "repo_url": "https://github.com/iterorganization/SimDB", + "repo_name": "SimDB", +} + +htmlhelp_basename = "simdb" + +# -- Generated documentation ------------------------------------------------- +# +# reference/cli.md and reference/python-api/* are generated at build time (from +# `simdb --help` and sphinx-apidoc) so they cannot drift from the code. Running +# them from config-inited keeps `sphinx-build` self-contained, including on RTD. + + +def _generate_cli_reference() -> None: + script = DOCS_DIR / "generate_cli_docs.py" + subprocess.run([sys.executable, str(script)], check=True, cwd=DOCS_DIR) + + +def _generate_api_reference() -> None: + from sphinx.ext import apidoc + + out_dir = DOCS_DIR / "reference" / "python-api" + apidoc.main( + ["-f", "-e", "-M", "-o", str(out_dir), str(DOCS_DIR / ".." / "src" / "simdb")] + ) + # ``modules.rst`` is the apidoc-generated top-level toc; we use our own + # hand-written ``index.md`` instead, so drop it to avoid an orphan warning. + (out_dir / "modules.rst").unlink(missing_ok=True) + + +def _run_generators(app, config) -> None: + for name, fn in (("CLI reference", _generate_cli_reference), + ("API reference", _generate_api_reference)): + try: + fn() + except Exception as exc: # noqa: BLE001 - degrade gracefully + logger.warning( + "Could not generate %s (is `simdb` installed in this " + "environment?): %s", + name, + exc, + ) + + +def setup(app): + app.connect("config-inited", _run_generators) diff --git a/docs/design.md b/docs/design.md deleted file mode 100644 index 7f093536..00000000 --- a/docs/design.md +++ /dev/null @@ -1,260 +0,0 @@ -# SimDB technical design - -## Introduction - -This document summarises the design of the IMAS simulation management system -(SimDB) as is organised as follows: -* Supported platforms -* High level description of the system and API design -* Overview of the CLI functionality -* Design of the metadata elements -* Outline of simulation data validation - -## Supported Platforms - -The following platforms are support for SimDB: Linux, macOS, Windows. - -## High level description - -The system will have two major components: one local to the user, and the other -remote. - -The local component will be provided as a Command Line Interface (CLI) tool, -similar to tools such as git or openssl. - -```bash -script -``` - -The commands will be divided into a hierarchical tree of commands, with each -level of commands having their own help available, i.e.: - -```bash -script --help -script --help -``` - -The remote component will manage the reference database and associated -metadata. Interactions between the two components will be through a REST API, -using SSL encrypted HTTP (HTTPS). - -### Architecture overview - -The following images shows the high-level components of the system. - -![simdb architecture](simdb-architecture.svg) - -A description of the components is as follows: - -1. The CLI tool: Used to manage the simulation metadata, file manifest and -provenance and to allow the user to query these elements. -2. The SQLite DBMS: To store the user ingested simulations before they have -been pushed to the remote system. -3. The Simulation Directory: The directory where the simulation has been -run and where the simulation files will be retrieved from when they are -pushed to the remote system. -4. The Remote REST API: The remote API which processes requests from -the user CLI to receive pushed simulations and store them ready for -validation and publishing. -5. The Staging Directory: The location the pushed simulation files are -transferred to while waiting for validation. -6. The Remote DBMS: The DBMS where the simulation metadata and -provenance will be saved for all uploaded simulation along with their -validation status flags. - -### Assumptions - -1. Interactions between the CLI and the Remote central database - 1. Are Stateless - 2. May not use a permanent network connection - 3. Will be based on a Simulation Identifier (a UUID) - 4. Will utilise a temporary directory for all exchanged objects - 1. Use a directory named as the UUID - 2. Moved on simulation COMMIT to a permanent directory - 5. Authentication and authorisation will be needed for each interaction on the remote database -2. The Provenance database may use a different DBMS - 1. DAG based schema - 1. Triple is two nodes, and a connected edge - 2. The schema can be written as standard SQL statements - -## CLI functionality - -The following functionality will be provided by the CLI tool. - -### Database Query -1. Query the user’s local database - 1. CLI text input with context -2. Query the remote central database - 1. CLI text input with context -3. Query Output - 1. Text written to command line formatted as YAML - 2. User command line redirection to output file - -### Request a Simulation UUID -1. CLI request with context - 1. context=[alias] -2. Output written to command line - -### File Manifest -1. Simulation Data Files - 1. Simulation Plan, Input files, Output files - 2. Location - 3. Class (Plan, Input, Output, Metadata, Provenance, …) - 4. Hash checksum -2. Data Import - 1. Set of Simulation Data Files - 2. Metadata file - 3. Provenance file -3. Data Export - 1. Set of Simulation Data Files - 2. Metadata file - 3. Provenance file - -### Data Import/Export -1. A JSON transport object containing all simulation data including -simulation plan, metadata, provenance, etc. -2. Binary IO streams sent via HTTP for each simulation file - 1. Input files - 2. Output files - 3. IMAS API log file - 4. UDA log file - -### Log Files -1. IMAS API Log - 1. Ordered list of all IMAS low level API calls -2. UDA Data Access Log - 1. Ordered list of all UDA data access and ingest calls - -### Metadata -1. Metadata file - 1. Name value pairs compliant with Dublin Core - 1. YAML format - 2. Ingested into the CLI SQLite DBMS - 2. Exchanged between local and remote system as part of the JSON -transport object -2. Provenance SQLite database file - 1. Preferably W3C PROV (RDF) triples, otherwise name value pairs - 1. Collected by future provenance instrumentation within - IMAS and written to a user SQLite database - 2. Ingested by the CLI SQLite DBMS - -### File Formats -1. Manifest - * YAML Ascii file -2. Metadata - * Name value pairs - * YAML Ascii file - * One pair per record -3. Provenance - * YAML Ascii file -4. Simulation Plan - * Microsoft Word or Adobe PDF -5. Simulation Input - * Ascii - * Binary: IDS -6. Simulation Output - * Binary: IDS -7. Configuration file - * Ascii - * Name value pairs -8. Git diff and status file - * Ascii -9. IMAS API Log - * CSV Ascii -10. UDA Log - * CSV Ascii -11. IMAS Open/Create arguments - * Name value pairs - -## Use case narratives and system processing actions - -### Prepare for a new simulation - -### Execute the simulation - -### Register the simulation locally using the imasdb CLI - -### Deposit the simulation remotely using the imasdb CLI - -## Contents of the metadata file proforma - -The proforma file contains the value descriptions. Text lines beginning # are -ignored. Names without values are not ingested. - -| Name | Value description | -| ---- | ----------------- | -| Title | The name given to the resource.

Typically, a Title will be a name by which the resource is formally known.

The title element may be repeated multiple times to include variants of the title. | -| Subject | The topic of the content of the resource.

Typically, a Subject will be expressed as keywords or key phrases or classification codes that describe the topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

Select subject keywords from the Title or Description information, or from within a text resource.

Choose the most significant and unique words for keywords.

If multiple vocabulary terms or keywords are used, use separate iterations of the Subject element. | -| Description | An account of the content of the resource.

Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. | -| Type | The nature or genre of the content of the resource.

Recommended best practice is to select a value from a controlled vocabulary. To describe the physical or digital manifestation of the resource, use the FORMAT element.

If the resource is composed of multiple mixed types then multiple or repeated Type elements should be used to describe the main components. | -| Source | A Reference to a resource from which the present resource is derived - in whole or part.

Recommended best practice is to reference the resource by means of a formal identification system. | -| Relation | A reference to a related resource.

Recommended best practice is to reference the resource by means of a formal identification system. | -| Coverage | The extent or scope of the content of the resource.

Coverage will typically include spatial location, temporal period, or jurisdiction (such as a named entity).

Recommended best practice is to select a value from a controlled vocabulary. Where appropriate, named places or time periods should be used in preference to numeric identifiers such as sets of co-ordinates or date ranges. Repeat for each class of coverage. | -| Creator | An entity primarily responsible for making the content of the resource.

Multiple creators should be listed separately. | -| Publisher | The entity responsible for making the resource available.

The intent of specifying this field is to identify the entity that provides access to the resource. If the Creator and Publisher are the same, do not repeat the name in the Publisher area. If the nature of the responsibility is ambiguous, the recommended practice is to use Publisher for organizations, and Creator for individuals. In cases of ambiguous responsibility, use Contributor. | -| Contributor | An entity (name) responsible for making contributions to the content of the resource. Examples of a Contributor include a person, an organization or a service. | -| Rights | Information about rights held in and over the resource. If the rights element is absent, no assumptions can be made about rights with respect to the resource. | -| Date | A date associated with an event in the life cycle of the resource. Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in ISO 8601 and follows the YYYY-MM-DD format.

If the full date is unknown, month and year (YYYY-MM) or just year (YYYY) may be used. | -| Format | The physical or digital manifestation of the resource. Typically, Format may include the media-type.

Recommended best practice is to select a value from a controlled vocabulary.

Repeat for each class of category. | -| Identifier | An unambiguous reference to the resource within a given context.

Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Examples of formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

This element can also be used for local identifiers (e.g. ID numbers) assigned by the Creator of the resource to apply to a particular item. It should not be used for identification of the metadata record itself. | -| Language | A language of the intellectual content of the resource.

Recommended best practice for the values of the Language element is defined by RFC 3066 which, in conjunction with ISO 639, defines two- and three-letter primary language tags with optional sub-tags. Examples include "en" or "eng" for English, "akk" for Akkadian, and "en-GB" for English used in the United Kingdom. | -| Audience | A class of entity for whom the resource is intended or useful. A class of entity may be determined by the creator or the publisher or by a third party.

Audience terms are best utilized in the context of formal or informal controlled vocabularies.

Element of Qualified Dublin Core | -| Provenance | A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity and interpretation. The statement may include a description of any changes successive custodians made to the resource.

Element of Qualified Dublin Core | -| RightsHolder | A person or organization owning or managing rights over the resource. Recommended best practice is to use the URI or name of the Rights Holder to indicate the entity.

Element of Qualified Dublin Core | -| InstructionalMethod | A process, used to engender knowledge, attitudes and skills, that the resource is designed to support. Instructional Method will typically include ways of presenting instructional materials or conducting instructional activities, patterns of learner-to-learner and learner-to-instructor interactions, and mechanisms by which group and individual levels of learning are measured. Instructional methods include all aspects of the instruction and learning processes from planning and implementation through evaluation and feedback.

Best practice is to use terms from controlled vocabularies, whether developed for the use of a particular project or in general use in an educational context. | -| AccrualMethod | The method by which items are added to a collection. Recommended best practice is to use a value from a controlled vocabulary. | -| AccrualPeriodicity | The frequency with which items are added to a collection. Recommended best practice is to use a value from a controlled vocabulary. -| AccrualPolicy | The policy governing the addition of items to a collection. Recommended best practice is to use a value from a controlled vocabulary. - -### DC element qualifiers - -Qualifier elements are terms that extend or refine the original Dublin Core Metadata Element Set. They are associated with an original element. - -There are two broad classes of qualifiers: -1. **Element Refinement** - make the meaning of an element narrower or more specific. -2. **Encoding Scheme** - these qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary, or a string formatted in accordance with a formal notation. - -| DC Element | Element Refinement Qualifier | Element Encoding Scheme | -| --- | --- | --- | -| Title | *Alternative*

Any form of the title used as a substitute or alternative to the formal title of the resource.| | -| Creator | | | -| Subject | | | -| Description | *Abstract*
*tableOfContents*

A summary of the contents of the resource.

A list of subunits of the content of the resource. | | -| Publisher | | | -| Contributor | | | -| Date | Created
Valid
Available
Issued
Modified
DateAccepted
DateCopyrighted
DateSubmitted | DCMI Period
W3C-DTF | -| Type | | | -| Format | Extent
Medium | | -| Identifier | BibliographicCitation | | -| Source | | | -| Language | | ISO 639-2RFC 3066 | -| Relation | Is Version Of
Has Version
Is Replaced By
Replaces
Is Required By
Requires
Is Part Of
Has Part
Is Referenced By
References
Is Format Of
Has Format
Conforms To | | -| Coverage | Spatial
Temporal | DCMI Period
W3C-DTF | -| Rights | AccessRights
Licence | | -| Audience | Mediator
EducationLevel | | -| Provenance | | | -| RightsHolder | | | -| InstructionalMethod | | | -| AccrualMethod | | | -| AccrualPeriodicity | | | - -## Simulation Validation Testing - -Testing cannot verify the accuracy of simulation results. It can however test that data complies with certain expectations: value range, value distribution, and value deviation from standard reference data. The results of testing can become a resource to be utilised in locating simulation data: the results become classifiers that are recorded in a relational database that may be queried by users and applications. - -If an IDS has been populated with data, there are several data quantities that must be assigned values: the ids_properties and code structures. Additionally, if ids_properties/homogeneous_time is set to the value 1, the array time must be filled with values other than the missing value. - -Data that originates from pre-existing IDS files and are used as inputs to the workflow model needs not be tested as they are not the results of the workflow. However, these need to be identified (whole IDS objects and specific individual IDS data entities) to the validation testing routines, so they can be skipped over. It is simpler to identify only the specific IDS objects that need be tested. - -### Initialisation - -To help assist in the generation of test comparison data, the application will have a start-up mode where the tests are not run; instead the statistics data are recorded. These can then be utilised to form the initial set of test comparison data. - -Start-up data may be written to a temporary SQL database table for analysis and aggregation. From this an appropriate set of comparison statistics may be generated. - -Additional test parameters that will need to be set are the check on missing values, and the check on mandatory data fields. - -| # | Test | Description | -| --- | --- | --- | -| 1 | Verify all data are within expected limits. | 1. Compare statistics drawn from the data against a standard set: Mean, Max, Min, Standard Deviation.
2. Verify there are no missing data within the set of data.
3. Verify data has been written – the data entity is a mandatory entity so must be populated with non-missing data.

The set of test values are identified within an SQL database using 3 classifiers: device name, experiment or simulation scenario, and the data entity name.

Data entity names that include hierarchical branching index values (structure arrays) may be classed using a wild card character, ‘*’, to signify any index value. These set of test values are to be used with all similar data entities. | -| 2 | Compare all data with reference data. The reference data may be data for a different occurrence number contained in the same data file. | All data entities within the IDSs to be validated are compared with the same IDS data entities from a reference dataset.

1. Difference in standard statistics: Mean, Max, Min, Standard Deviation, and number of elements.
2. If the coordinate data are known, the integral between curves (for the common coordinate range) as a percentage of the integral of the data to be tested.

As with test #1, expected test values are identified by querying a SQL database. | diff --git a/docs/developer_guide.md b/docs/developer_guide.md deleted file mode 100644 index d3e5259f..00000000 --- a/docs/developer_guide.md +++ /dev/null @@ -1,249 +0,0 @@ -# Developer Guide - -## Setting up developer environment - -Checking out develop branch of SimDB: - -```bash -git clone https://github.com/iterorganization/SimDB.git -cd SimDB -git checkout develop -``` - -Create a virtual environment and install SimDB with all development dependencies -using the `dev` dependency group defined in `pyproject.toml`: - -```bash -python3 -m venv venv --prompt SimDB -source venv/bin/activate -pip install -e . --group dev -``` - -You could also use [uv](https://docs.astral.sh/uv/getting-started/installation/) to install all dependencies: - -```bash -uv sync -``` - - -This installs SimDB along with all tools needed for testing, linting, type -checking, and running the server. - -## Running the tests - -In the SimDB root directory run: - -```bash -pytest -``` - -## Running a development server - -```bash -simdb_server -``` - -This will start a server on port 5000. You can test this server is running by opening http://localhost:5000 in a browser. - -## Swagger API documentation - -SimDB provides interactive Swagger API documentation for each API version. The documentation is automatically generated and accessible at different endpoints depending on the API version you want to explore. - -### Accessing API documentation - -- **v1.2 API**: http://localhost:5000/v1.2/docs -- **v1.2 API at ITER**: https://simdb.iter.org/scenarios/api/v1.2/docs - -### API version differences - -Each API version may have different endpoints and functionality: - -- **v1.2**: Latest API with improved performance and additional features - -Always check the appropriate version documentation for your use case. - -## Linting and formatting - -SimDB uses [Ruff](https://docs.astral.sh/ruff/) for both linting and code -formatting. Ruff replaces the previously used tools `flake8`, `pylint`, and -`black`. The Ruff configuration is defined in `pyproject.toml` under -`[tool.ruff]` and `[tool.ruff.lint]`. - -To check formatting: - -```bash -python -m ruff format --check -``` - -To auto-fix formatting: - -```bash -python -m ruff format -``` - -To check linting: - -```bash -python -m ruff check -``` - -To auto-fix linting issues where possible: - -```bash -python -m ruff check --fix -``` - -## Type checking - -SimDB uses [Ty](https://github.com/astral-sh/ty) for static type checking. - -To run type checking: - -```bash -python -m ty check src -``` - -## Continuous integration - -The CI pipeline (defined in `.github/workflows/build_and_test.yml`) runs -automatically on every push and pull request. It tests against Python 3.11, -3.12, and 3.13, and runs the following checks in order: - -1. **Formatting** – `ruff format --check` -2. **Linting** – `ruff check` -3. **Type checking** – `ty check src` -4. **Tests** – `pytest` with coverage reporting - -Make sure all four checks pass locally before opening a pull request. - -## Database migrations - -SimDB uses [Alembic](https://alembic.sqlalchemy.org/) to manage database -schema migrations. Migration scripts live in the `alembic/versions/` directory. - -### Configuration - -Alembic is configured via `alembic.ini` in the project root. The database URL -is **not** stored in `alembic.ini`; instead it must be provided through the -`DATABASE_URL` environment variable: - -```bash -export DATABASE_URL="postgresql+psycopg2://user:password@localhost/simdb" -``` - -For SQLite (e.g. during development): - -```bash -export DATABASE_URL="sqlite:///simdb.db" -``` - -### Applying migrations - -To upgrade the database to the latest schema revision: - -```bash -alembic upgrade head -``` - -To upgrade to a specific revision: - -```bash -alembic upgrade -``` - -To downgrade by one revision: - -```bash -alembic downgrade -1 -``` - -### Checking the current revision - -```bash -alembic current -``` - -### Viewing migration history - -```bash -alembic history --verbose -``` - -### Creating a new migration - -After modifying the SQLAlchemy models, generate a new migration script -automatically with: - -```bash -alembic revision --autogenerate -m "short description of change" -``` - -Review the generated file in `alembic/versions/` carefully before applying it, -as autogenerate may not capture every change (e.g. custom column types or -server defaults). - -To apply the new migration: - -```bash -alembic upgrade head -``` - -## Setting up PostgreSQL Database - -This section will guide you through setting up a PostgreSQL server for SimDB. - -Setup PostgreSQL configuration and data directory: - -```bash -mkdir $HOME/Path/To/PostgresSQL_Data -``` - -Initialize database with data directory: - -```bash -initdb -D $HOME/Path/To/PostgresSQL_Data -U simdb -``` - -Start database server: - -```bash -pg_ctl -D $HOME/Path/To/PostgresSQL_Data/ -l logfile start -``` - -Verify database server status (should show `/tmp:5432 - accepting connections`): - -```bash -pg_isready -``` - -Create a database named `simdb`: - -```bash -createdb simdb -U simdb -``` - -Access database from command-line (will prompt `simdb=#`): - -```bash -psql -U simdb -``` - -Set the `DATABASE_URL` environment variable and run migrations: - -```bash -export DATABASE_URL="postgresql+psycopg2://simdb@localhost/simdb" -alembic upgrade head -``` - -Update the `[database]` section of `app.cfg`: - -``` -... - -[database] -type = postgres -host = localhost -port = 5432 - -... -``` diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md new file mode 100644 index 00000000..049afcb7 --- /dev/null +++ b/docs/explanation/architecture.md @@ -0,0 +1,82 @@ +# Architecture + +This page describes how SimDB is put together. For the user-facing concepts +(simulation, manifest, metadata, and so on), see [Concepts](concepts.md). + +## Two components + +SimDB has two major components: one local to the user and one remote. + +- The **local component** is a command line tool, similar in spirit to tools + like `git`. Commands are organised as a tree, each level with its own + `--help`. +- The **remote component** manages the shared reference database and its + metadata. + +The two communicate through a REST API over SSL-encrypted HTTP. Interactions are +stateless, do not assume a permanent connection, are keyed on a simulation's +UUID, and require authentication on each request. See the +[REST API reference](../reference/rest-api.md). + +## Component overview + +![SimDB architecture](simdb-architecture.svg) + +1. **CLI tool**: manages simulation metadata, the file manifest, and + provenance, and lets the user query these locally and on remotes. +2. **Local SQLite database**: stores the user's ingested simulations before + they are pushed to a remote. +3. **Simulation directory**: where a simulation was run and where its files are + read from when pushing. +4. **Remote REST API**: receives pushed simulations and stores them for + validation and publishing. +5. **Staging directory**: where pushed files are placed while awaiting + validation, named by the simulation UUID, then moved to permanent storage + once committed. +6. **Remote database**: stores metadata, provenance, and validation status for + every uploaded simulation (PostgreSQL in production, optionally SQLite). + +## Supported platforms + +SimDB runs on Linux, macOS, and Windows. + +## Data flow + +### Ingest + +The user writes a [manifest](../reference/manifest-format.md) and runs +`simdb simulation ingest`. SimDB resolves each input and output URI, computes +checksums, and records the simulation and its files in the local SQLite +catalogue. + +### Push + +`simdb simulation push` transfers the simulation to a server. The metadata is +sent as a structured payload, and each referenced file is streamed over HTTP +(compressed in transit). For IMAS URIs, SimDB discovers the underlying files +from the backend; local IMAS URIs are rewritten to remote URIs using the +server's `imas_remote_host`/`imas_remote_port` settings so the data stays +reachable. Files land in the staging directory, are checksummed, validated, and +then committed to permanent storage. + +### Pull + +`simdb simulation pull` is the mirror of push: it copies a simulation's metadata +from the server into the local catalogue and downloads its data files into a +directory you choose. + +## Server stack + +In production the server runs as a WSGI application (the `simdb_server` entry +point) behind a dedicated web server such as Nginx, with Gunicorn as the WSGI +server, and PostgreSQL as the database. The schema is managed with +[Alembic](https://alembic.sqlalchemy.org/) migrations. See +[Operating a server](../how-to/operate-server/install-server.md). + +## Validation + +The server can validate simulations on upload: integrity checks confirm that +file checksums still match, metadata is checked against a +[Cerberus](https://docs.python-cerberus.org/) schema, and file contents can be +checked by a file validator such as the IDS validator. See +[Validation](validation.md). diff --git a/docs/explanation/concepts.md b/docs/explanation/concepts.md new file mode 100644 index 00000000..affe09d6 --- /dev/null +++ b/docs/explanation/concepts.md @@ -0,0 +1,111 @@ +# Concepts + +This page explains the ideas behind SimDB and how they fit together. If you just +want to get going, see the [quickstart](../getting-started/quickstart.md); come +back here when you want the bigger picture. + +## What SimDB is for + +A simulation produces data, but the data alone does not tell you what produced +it, with which code, for which machine, or whether anyone has checked it. +SimDB's job is to **catalogue** simulations: to record metadata about each run +and the data it is associated with, keep that record locally, and let you +publish it to a shared server where others can find, trust, and reuse it. + +## Simulation + +A **simulation** is the central entity. It represents one run or analysis and +carries: + +- a **UUID**, its permanent unique identifier; +- an optional **alias**, a human-readable name; +- a **status** (see [Lifecycle](#status-and-lifecycle)); +- lists of **input** and **output** files; +- free-form **metadata** (key/value pairs); +- **watchers** (people notified about changes). + +## Manifest + +You do not create a simulation by hand. You write a **manifest**, a small YAML +file that describes the simulation and points at its data, and ingest it. The +manifest is the input; the catalogued simulation is the result. See the +[manifest format](../reference/manifest-format.md). + +## Files and checksums + +Each input and output is tracked as a **file** record with its URI, a type +(an ordinary file, an IMAS entry, or a reference to another simulation), and a +**checksum**. Checksums (a SHA-1 hash for ordinary files, a content hash for +IMAS data) let SimDB detect whether data changed after it was catalogued, which +is the basis of integrity checking during push and validation. See +[Validation](validation.md). + +## Metadata + +**Metadata** is the searchable description of a simulation: the machine, the +code and version, a free-text description, and any other key/value pairs you +choose. Metadata is what you query on, both locally and on a server, using the +[query operators](../reference/query-operators.md). + +## Alias + +An **alias** is a friendly name you can use instead of the UUID, for example +`iter-baseline-2024` or `100001/1`. Aliases must be unique within a SimDB +instance and URL-safe, and become fixed once a simulation is pushed. See +[Alias rules](../reference/manifest-format.md#alias). + +## Local versus remote + +SimDB has two halves: + +- **Local catalogue**: a SQLite database on your own machine. Ingesting a + manifest adds the simulation here, visible only to you. +- **Remote server**: a shared SimDB service (backed by PostgreSQL in + production). Pushing a simulation copies its metadata and data to the server + so authorized colleagues can query and reuse it. + +A separate distinction applies to the *data* a simulation references: + +- **Local IMAS data** is reachable from the file system where you run the CLI. +- **Remote IMAS data** lives on a data server and is reached over the network. + +The two distinctions are independent: a locally-catalogued simulation can +reference either local or remote data. When you push, local IMAS URIs are +rewritten so the data stays reachable from the server. See +[URI schemes](../reference/uri-schemes.md). + +## Typical workflow + +1. Write a manifest describing the simulation and its data. +2. Ingest it into your local catalogue. +3. Manage and inspect it locally; adjust metadata as needed. +4. Validate it against a target server's rules. +5. Push it to the server to share it. + +Others can then query the server, pull a simulation back to their own machine, +and download its data. + +## Status and lifecycle + +On a server, a simulation has a **status** that records where it is in the +review lifecycle: + +| Status | Meaning | +| --- | --- | +| `not_validated` | Uploaded but not yet validated. | +| `accepted` | Accepted into the database. | +| `passed` | Passed validation. | +| `failed` | Failed validation. | +| `deprecated` | Superseded by a newer simulation. | +| `deleted` | Marked as deleted. | + +When a simulation replaces an earlier one (`simdb simulation push --replaces`), +the old one is marked `deprecated` and gains a `replaced_by` reference. You can +follow this chain of revisions with `simdb remote SERVER trace`. + +## Watchers + +A **watcher** is a person who has asked to be notified about a simulation. They +can subscribe to validation results, new revisions, obsolescence, or all +events. Watchers are managed with `simdb remote watcher` and notified by email +from the server. diff --git a/docs/explanation/glossary.md b/docs/explanation/glossary.md new file mode 100644 index 00000000..b9fb537c --- /dev/null +++ b/docs/explanation/glossary.md @@ -0,0 +1,71 @@ +# Glossary + +```{glossary} +Alias + A human-readable, URL-safe, unique name for a simulation, used instead of its + UUID. See [Alias](../reference/manifest-format.md#alias). + +Access Layer (AL) + The IMAS data access layer. SimDB reads data written with Access Layer 5 + (AL5) or later. Older Access Layer 4 (AL4) MDSplus data must be migrated + first. See [Migrate AL4 MDSplus data](../how-to/migrate-al4-mdsplus.md). + +Backend + The storage format used for an IMAS data entry, for example `hdf5` or + `mdsplus`. Specified in an [IMAS URI](../reference/uri-schemes.md). + +Cerberus + The Python library SimDB uses to validate simulation metadata against a + server's schema. See [Validation](validation.md). + +Checksum + A hash recorded for each data file (SHA-1 for ordinary files, a content hash + for IMAS data) used to detect changes. See [Validation](validation.md). + +HDF5 + A file-based storage format, usable as an IMAS backend (`imas:hdf5?...`). + +IDS + Interface Data Structure. The standardized data structure used by IMAS to + represent physics quantities. + +IMAS + Integrated Modelling and Analysis Suite. The data framework used by ITER and + the wider fusion community. SimDB reads IMAS data through + [imas-python](https://pypi.org/project/imas-python/). + +Manifest + A YAML file describing a simulation and the data it is associated with, used + to ingest the simulation. See the + [manifest format](../reference/manifest-format.md). + +MDSplus + A data system usable as an IMAS backend (`imas:mdsplus?...`). + +Metadata + Searchable key/value information attached to a simulation. See + [Concepts](concepts.md#metadata). + +Remote + A configured SimDB server that the client can push to and query. See + [Configure remotes](../how-to/configure-remotes.md). + +Simulation + The central SimDB entity: one run or analysis, with a UUID, optional alias, + status, files, and metadata. See [Concepts](concepts.md#simulation). + +Summary IDS + An IDS holding condensed summary information about a simulation, a common + source of metadata. + +UDA + Universal Data Access. A server protocol for reaching remote IMAS data, used + in [remote IMAS URIs](../reference/uri-schemes.md#remote-imas-data). + +UUID + The permanent unique identifier assigned to every simulation. + +Watcher + A user subscribed to notifications about a simulation. See + [Concepts](concepts.md#watchers). +``` diff --git a/docs/simdb-architecture.svg b/docs/explanation/simdb-architecture.svg similarity index 100% rename from docs/simdb-architecture.svg rename to docs/explanation/simdb-architecture.svg diff --git a/docs/explanation/validation.md b/docs/explanation/validation.md new file mode 100644 index 00000000..92c7613e --- /dev/null +++ b/docs/explanation/validation.md @@ -0,0 +1,64 @@ +# Validation + +Validation is how SimDB and a server decide whether a simulation is complete, +intact, and acceptable. There are three independent layers. For the commands, +see [Validate a simulation](../how-to/validate-a-simulation.md); for the server +settings, see [Configure validation](../how-to/operate-server/configure-validation.md). + +## 1. Integrity (checksums) + +When a simulation is ingested, SimDB records a checksum for every input and +output: a SHA-1 hash for ordinary files, and a content hash derived from the +IMAS data for IMAS entries. During push and validation these checksums are +recomputed and compared. A mismatch means the data changed since it was +catalogued (or a file is missing), and the simulation will not validate. + +This is the most common cause of validation failure: one of the data sources is +absent, or something changed after ingestion. + +## 2. Metadata schema + +A server can require specific metadata through a `validation-schema.yaml` file, +expressed as [Cerberus](https://docs.python-cerberus.org/) rules. For example, a +server might require a string `description` field. Different servers can have +different rules, so a simulation that is valid for one server may not be valid +for another. + +Inspect a server's schema with: + +```bash +simdb remote SERVER schema +``` + +and check a local simulation against it before pushing with: + +```bash +simdb simulation validate SERVER SIM_ID +``` + +Failing to provide a server's mandatory metadata is the second common cause of +validation failure. + +## 3. File-content validation + +Optionally, a server can inspect the *contents* of data files with a file +validator. The validator currently available is the **IDS validator** (the +`imas-validator` package), which applies rulesets to IMAS data, for example +checking that mandatory IDS quantities are populated and values fall within +expected ranges. + +File validation is configured server-side under `[file_validation]` (see +[Server configuration](../reference/server-configuration.md#file_validation)) +and runs when the server is set to validate uploads automatically. + +## When validation runs + +- **On the client**, on demand, with `simdb simulation validate`. This runs + integrity and metadata checks against the target server's rules before you + push, so you can fix problems early. +- **On the server**, when `auto_validate` is enabled, uploaded simulations are + validated automatically (including any configured file validation). With + `error_on_fail` enabled, simulations that fail are rejected. + +The outcome is reflected in the simulation's +[status](concepts.md#status-and-lifecycle). diff --git a/docs/genapidocs.sh b/docs/genapidocs.sh deleted file mode 100755 index ec4530a1..00000000 --- a/docs/genapidocs.sh +++ /dev/null @@ -1,5 +0,0 @@ -#!/bin/bash - -# Run sphinx-apidoc to generate the latest documentation from the SimDB codebase. - -sphinx-apidoc -f -o sphinx -e -M ../src/simdb && rm modules.rst diff --git a/docs/generate_cli_docs.py b/docs/generate_cli_docs.py index 5717d376..443ccab4 100644 --- a/docs/generate_cli_docs.py +++ b/docs/generate_cli_docs.py @@ -1,10 +1,27 @@ +"""Render the CLI reference page. + +Reads ``cli.md.in`` and replaces every ``{{ command }}`` placeholder with the +captured ``simdb --help`` output (recursing into sub-commands). The +result is written to ``reference/cli.md``. + +This runs automatically at documentation build time (see ``conf.py``), so the +CLI reference always matches the installed version of SimDB. It can also be run +by hand from the ``docs/`` directory: + + python generate_cli_docs.py +""" + import os +from pathlib import Path + +DOCS_DIR = Path(__file__).parent.resolve() +TEMPLATE = DOCS_DIR / "cli.md.in" +OUTPUT = DOCS_DIR / "reference" / "cli.md" def run_command(cmd: str) -> str: stream = os.popen(cmd) - output = stream.read() - return output + return stream.read() def extract_command(line: str) -> str: @@ -24,41 +41,31 @@ def extract_sub_commands(output: str) -> list[str]: def generate_block(output: str) -> str: - return f""" -```text -{output.strip()} -``` - """ + return f"\n```text\n{output.strip()}\n```\n" def process_cmd(cmd: str) -> str: - print(cmd) output = run_command(f"simdb {cmd} --help") - if cmd: - sub_commands = extract_sub_commands(output) - else: - sub_commands = [] + sub_commands = extract_sub_commands(output) if cmd else [] text = generate_block(output) for sub_command in sub_commands: text += "\n" + process_cmd(f"{cmd} {sub_command}") - return text def process_line(line: str) -> str: - cmd = extract_command(line) - return process_cmd(cmd) - - -def main(): - with open("cli.md.in", "r") as f_in: - with open("cli.md", "w") as f_out: - for line in f_in: - if line.startswith("{{"): - f_out.write(process_line(line)) - else: - f_out.write(line) + return process_cmd(extract_command(line)) + + +def main() -> None: + OUTPUT.parent.mkdir(parents=True, exist_ok=True) + with open(TEMPLATE) as f_in, open(OUTPUT, "w") as f_out: + for line in f_in: + if line.startswith("{{"): + f_out.write(process_line(line)) + else: + f_out.write(line) if __name__ == "__main__": diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md new file mode 100644 index 00000000..50523180 --- /dev/null +++ b/docs/getting-started/installation.md @@ -0,0 +1,65 @@ +# Installation + +SimDB is distributed as the `imas-simdb` Python package and requires +**Python 3.11 or newer**. + +## Install from PyPI + +```bash +pip install imas-simdb +``` + +## Install from source + +```bash +git clone https://github.com/iterorganization/SimDB.git +cd SimDB +python3 -m venv venv +source venv/bin/activate +pip install -e . +``` + +## Optional extras + +The base install gives you the client. Additional features are available as +extras: + +| Extra | Installs support for | +| --- | --- | +| `server` | Running a SimDB remote server. | +| `postgres` | PostgreSQL as the server database. | +| `imas-validator` | IDS file-content validation. | +| `auth-ldap` | LDAP authentication (server). | +| `auth-keycloak` | Keycloak authentication (server). | +| `auth-ad` | Active Directory authentication (server). | +| `auth` | All three authentication methods. | +| `build-docs` | Building this documentation. | +| `all` | `server`, `imas-validator`, and `postgres` together. | + +Install one or more extras with the usual pip syntax: + +```bash +pip install "imas-simdb[server]" +pip install "imas-simdb[server,postgres,imas-validator]" +pip install -e ".[all]" # from a source checkout +``` + +## Verify + +```bash +simdb --version +simdb --help +``` + +```{tip} +If you get `command not found: simdb`, the install location's `bin` directory is +not on your `PATH`. Activate your virtual environment, or add the pip script +directory to `PATH`. See [Troubleshooting](../troubleshooting.md). +``` + +## Next steps + +- Run through the [quickstart](quickstart.md). +- Working with ITER servers? See [Connect to ITER](../how-to/connect-to-iter.md). +- Setting up a server? See + [Install a server](../how-to/operate-server/install-server.md). diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md new file mode 100644 index 00000000..2e2d35b8 --- /dev/null +++ b/docs/getting-started/quickstart.md @@ -0,0 +1,87 @@ +# Quickstart + +This page gets you from an installed SimDB to a catalogued, queried, and +(optionally) shared simulation in a few minutes. For a fuller walkthrough, see +the [tutorial](../tutorials/first-simulation.md). + +## Prerequisites + +- SimDB [installed](installation.md) (`simdb --version` works). + +## 1. Create a manifest + +A [manifest](../reference/manifest-format.md) describes your simulation. Start +from a template: + +```bash +simdb manifest create manifest.yaml +``` + +Edit it to point at your data and describe the run: + +```yaml +manifest_version: 2 +alias: my-first-simulation +inputs: + - uri: file:///path/to/input/parameters.txt +outputs: + - uri: file:///path/to/results/output.nc +metadata: + - machine: ITER + - code: + name: JETTO + version: "2024.1" + - description: My first catalogued simulation. +``` + +Check that it is well-formed: + +```bash +simdb manifest check manifest.yaml +``` + +## 2. Ingest it locally + +```bash +simdb simulation ingest manifest.yaml +``` + +The simulation is now in your local catalogue. List and inspect it: + +```bash +simdb simulation list +simdb simulation info my-first-simulation +``` + +## 3. Query locally + +```bash +simdb simulation query code.name=JETTO +``` + +See [query operators](../reference/query-operators.md) for the full syntax. + +## 4. Push to a server (optional) + +If you have access to a SimDB server, configure it once: + +```bash +simdb remote config new myserver https://example.org/simdb/api +simdb remote config set-default myserver +``` + +Then validate and push: + +```bash +simdb simulation validate my-first-simulation +simdb simulation push my-first-simulation +``` + +ITER users should follow [Connect to ITER](../how-to/connect-to-iter.md) +instead, which covers the firewall and certificate setup. + +## Where to go next + +- [Catalogue your first simulation](../tutorials/first-simulation.md) (tutorial) +- [Create a manifest](../how-to/create-a-manifest.md) (how-to) +- [Concepts](../explanation/concepts.md) (the bigger picture) diff --git a/docs/how-to/authenticate.md b/docs/how-to/authenticate.md new file mode 100644 index 00000000..9d377c70 --- /dev/null +++ b/docs/how-to/authenticate.md @@ -0,0 +1,50 @@ +# Authenticate to a remote + +Every command that talks to a remote server must be authenticated. By default +you enter a username and password, which you must re-enter whenever your session +expires. Where the server supports it, a token avoids repeated prompts. + +```{note} +The commands below assume you have set a [default remote](configure-remotes.md). +If not, insert the remote name: `simdb remote NAME token new`. +``` + +## Username and password + +By default, remote commands prompt for your username and password. Store your +username so only the password is requested: + +```bash +simdb remote config set-option NAME username my-user +``` + +## Token-based authentication + +If the server supports tokens, generate one. You authenticate once with your +username and password, and SimDB stores a token for subsequent commands: + +```bash +simdb remote token new +``` + +While the token is valid (the lifetime is set per server, 30 days by default) +you can run remote commands without entering credentials again. Delete a stored +token with: + +```bash +simdb remote token delete +``` + +```{important} +Servers behind an F5 firewall (including the ITER server `simdb.iter.org`) +authenticate at the firewall and do **not** support SimDB tokens. For those, you +authenticate through the firewall on each session. See +[Connect to ITER](connect-to-iter.md). +``` + +## Where tokens are stored + +Tokens are saved against the remote in your +[client configuration](../reference/configuration.md) file, which must be +readable only by you (`0600` permissions). They are masked in +`simdb config list` output. diff --git a/docs/how-to/configure-remotes.md b/docs/how-to/configure-remotes.md new file mode 100644 index 00000000..dcd63af5 --- /dev/null +++ b/docs/how-to/configure-remotes.md @@ -0,0 +1,64 @@ +# Configure remotes + +A *remote* is a SimDB server the client can talk to. Remotes are stored in your +[client configuration](../reference/configuration.md) file. Manage them with the +`simdb remote config` commands rather than editing the file by hand, so it stays +valid. + +ITER users: see [Connect to ITER](connect-to-iter.md) for the ITER-specific +URL, firewall, and certificate setup. + +## List remotes + +```bash +simdb remote config list +``` + +## Add a remote + +```bash +simdb remote config new NAME URL +``` + +For example: + +```bash +simdb remote config new myserver https://example.org/simdb/api +``` + +## Set a default + +The default remote is used whenever you omit the remote name from a `remote` or +`simulation` command: + +```bash +simdb remote config set-default myserver +simdb remote config get-default +``` + +## Set per-remote options + +```bash +simdb remote config set-option myserver username my-user +simdb remote config set-option myserver firewall F5 +``` + +Common options are `url`, `username`, `firewall`, and `default`; see the +[configuration reference](../reference/configuration.md#remote-name). + +## Remove a remote + +```bash +simdb remote config delete myserver +``` + +## Test a remote + +```bash +simdb remote test # default remote +simdb remote myserver test # named remote +simdb remote myserver version +``` + +Next: set up [authentication](authenticate.md) so you do not have to enter +credentials on every command. diff --git a/docs/how-to/connect-to-iter.md b/docs/how-to/connect-to-iter.md new file mode 100644 index 00000000..c6ca8bb7 --- /dev/null +++ b/docs/how-to/connect-to-iter.md @@ -0,0 +1,76 @@ +# Connect to ITER + +This guide sets up the SimDB client to talk to the ITER server at +`simdb.iter.org`. It covers the remote configuration, the F5 firewall, and (for +ITER HPC nodes) installing the ITER SSL certificates. + +## Add the ITER remote + +On first run, SimDB pre-populates an `iter` remote. If you need to add it +manually: + +```bash +simdb remote config new iter https://simdb.iter.org/scenarios/api/ +simdb remote config set-option iter firewall F5 +``` + +Listing the remotes should then show the F5 firewall: + +```text +iter: https://simdb.iter.org/scenarios/api/ [firewall: F5] +``` + +Make it your default and set your ITER username: + +```bash +simdb remote config set-default iter +simdb remote config set-option iter username +``` + +## Test the connection + +```bash +simdb remote iter list +``` + +or, if `iter` is your default: + +```bash +simdb remote list +``` + +You will be asked for your ITER username and password, which are checked at the +F5 firewall. + +```{important} +The ITER server authenticates at the F5 firewall and does **not** support SimDB +tokens, so `simdb remote token new` does not apply here. You authenticate +through the firewall on each session. +``` + +## Install the ITER SSL certificate (HPC nodes) + +To use the client on an ITER HPC node you must trust the ITER CA certificates. +First download the root and issuing CA certificates: + +```bash +wget "http://pki.iter.org/CertEnroll/io-ws-pkiroot_ITER%20Organization%20Root%20CA.crt" +wget "http://pki.iter.org/CertEnroll/io-ws-pki1.iter.org_ITER%20Organization%20Issuing%20CA1.crt" +``` + +Convert them to PEM and concatenate into one bundle, here `$HOME/iter.pem`: + +```bash +openssl x509 -inform DER -in "io-ws-pki1.iter.org_ITER Organization Issuing CA1.crt" -out CA1.pem +openssl x509 -inform DER -in "io-ws-pkiroot_ITER Organization Root CA.crt" -out CA2.pem +cat CA1.pem CA2.pem > $HOME/iter.pem +``` + +Point SimDB at the bundle through the `SIMDB_REQUESTS_CA_BUNDLE` environment +variable: + +```bash +export SIMDB_REQUESTS_CA_BUNDLE=$HOME/iter.pem +``` + +Add that line to `$HOME/.bash_profile` so it is set for every session. diff --git a/docs/how-to/contribute/build-the-docs.md b/docs/how-to/contribute/build-the-docs.md new file mode 100644 index 00000000..71a7d62d --- /dev/null +++ b/docs/how-to/contribute/build-the-docs.md @@ -0,0 +1,50 @@ +# Build the documentation + +This documentation is built with [Sphinx](https://www.sphinx-doc.org/) and +[MyST](https://myst-parser.readthedocs.io/) (Markdown), using the +`sphinx-immaterial` theme. It is published on +[Read the Docs](https://simdb.readthedocs.io/). + +## Install the docs toolchain + +From a checkout, install the `build-docs` extra. Because the build imports and +runs `simdb`, install the package itself too: + +```bash +pip install -e ".[build-docs]" +``` + +## Build + +```bash +cd docs +make html +``` + +The site is written to `docs/_build/html`. Open `docs/_build/html/index.html` +in a browser. Use `make clean` to remove the build output and generated files. + +## How the build works + +The documentation source lives directly in `docs/` (there is no separate copy +step). Two parts are generated automatically at build time by `conf.py`, so they +never drift from the code: + +- **`reference/cli.md`** is rendered from `cli.md.in` by capturing the live + `simdb --help` output (`generate_cli_docs.py`). +- **`reference/python-api/`** is produced by `sphinx-apidoc` from the source + docstrings. + +Both are git-ignored; do not commit them. Because they are generated from the +installed package, `simdb` must be importable in the build environment (it is, +if you installed the package as shown above). + +## Writing conventions + +- Pages are Markdown (MyST). Use `{toctree}`, `{note}`, `{warning}`, and similar + fenced directives. +- The documentation follows a [Diataxis](https://diataxis.fr/) structure: + tutorials, how-to guides, reference, and explanation. Put new pages in the + section that matches their purpose and add them to the toctree in + `docs/index.md`. +- Cross-link between pages with relative Markdown links. diff --git a/docs/how-to/contribute/run-migrations.md b/docs/how-to/contribute/run-migrations.md new file mode 100644 index 00000000..0d8ca57c --- /dev/null +++ b/docs/how-to/contribute/run-migrations.md @@ -0,0 +1,53 @@ +# Run database migrations + +SimDB manages its database schema with [Alembic](https://alembic.sqlalchemy.org/). +Migration scripts live in `alembic/versions/`. + +## Configure the database URL + +Alembic is configured by `alembic.ini` in the project root, but the database URL +is **not** stored there. Provide it through the `DATABASE_URL` environment +variable. + +PostgreSQL: + +```bash +export DATABASE_URL="postgresql+psycopg2://user:password@localhost/simdb" +``` + +SQLite (for development): + +```bash +export DATABASE_URL="sqlite:///simdb.db" +``` + +## Apply migrations + +```bash +alembic upgrade head # upgrade to the latest revision +alembic upgrade # upgrade to a specific revision +alembic downgrade -1 # downgrade one revision +``` + +## Inspect state + +```bash +alembic current # current revision +alembic history --verbose # full history +``` + +## Create a migration + +After changing the SQLAlchemy models, autogenerate a migration: + +```bash +alembic revision --autogenerate -m "short description of change" +``` + +Review the generated script in `alembic/versions/` carefully before applying it. +Autogenerate may miss some changes, such as custom column types or server +defaults. Then apply it: + +```bash +alembic upgrade head +``` diff --git a/docs/how-to/contribute/run-tests-and-lint.md b/docs/how-to/contribute/run-tests-and-lint.md new file mode 100644 index 00000000..b8749c11 --- /dev/null +++ b/docs/how-to/contribute/run-tests-and-lint.md @@ -0,0 +1,45 @@ +# Run tests, linting, and type checks + +SimDB's CI runs four checks on every push and pull request, against Python 3.11, +3.12, and 3.13. Run them locally before opening a pull request. + +## Tests + +From the project root: + +```bash +pytest +``` + +Tests run with coverage (configured in `pyproject.toml`). + +## Formatting and linting (Ruff) + +SimDB uses [Ruff](https://docs.astral.sh/ruff/) for both formatting and linting. +The configuration is in `pyproject.toml`. + +```bash +python -m ruff format --check # check formatting +python -m ruff format # auto-format +python -m ruff check # lint +python -m ruff check --fix # auto-fix lint issues +``` + +## Type checking (Ty) + +SimDB uses [Ty](https://github.com/astral-sh/ty) for static type checking: + +```bash +python -m ty check src +``` + +## CI order + +The pipeline (`.github/workflows/build_and_test.yml`) runs, in order: + +1. Formatting (`ruff format --check`) +2. Linting (`ruff check`) +3. Type checking (`ty check src`) +4. Tests (`pytest` with coverage) + +Make sure all four pass locally first. diff --git a/docs/how-to/contribute/set-up-dev-env.md b/docs/how-to/contribute/set-up-dev-env.md new file mode 100644 index 00000000..18692479 --- /dev/null +++ b/docs/how-to/contribute/set-up-dev-env.md @@ -0,0 +1,44 @@ +# Set up a development environment + +This guide sets up SimDB for local development. + +## Clone and check out + +```bash +git clone https://github.com/iterorganization/SimDB.git +cd SimDB +git checkout develop +``` + +`develop` is the main development branch; open pull requests against it. + +## Create a virtual environment and install + +Install SimDB with the `dev` dependency group, which pulls in everything needed +for testing, linting, type checking, and running the server: + +```bash +python3 -m venv venv --prompt SimDB +source venv/bin/activate +pip install -e . --group dev +``` + +Alternatively, use [uv](https://docs.astral.sh/uv/) to install all +dependencies: + +```bash +uv sync +``` + +## Verify + +```bash +simdb --version +pytest +``` + +## Next steps + +- [Run tests, linting, and type checks](run-tests-and-lint.md). +- [Run database migrations](run-migrations.md). +- [Build the documentation](build-the-docs.md). diff --git a/docs/how-to/create-a-manifest.md b/docs/how-to/create-a-manifest.md new file mode 100644 index 00000000..d45e2e83 --- /dev/null +++ b/docs/how-to/create-a-manifest.md @@ -0,0 +1,85 @@ +# Create a manifest + +A [manifest](../reference/manifest-format.md) is the YAML file you ingest to +catalogue a simulation. This guide shows how to write a good one. For the +complete specification of every field, see the +[manifest format reference](../reference/manifest-format.md). + +## Start from a template + +```bash +simdb manifest create manifest.yaml +``` + +This writes a starter manifest you can edit. + +## Set the version and alias + +Always use version 2, and give the simulation a descriptive, unique, URL-safe +alias: + +```yaml +manifest_version: 2 +alias: iter-baseline-scenario-2024 +``` + +Good alias conventions: + +- a descriptive name such as `machine-scenario-date`; or +- a `pulse_number/run_number` form such as `100001/1`. + +The alias is optional in the manifest; you can also set it at ingest time with +`simdb simulation ingest -a ALIAS`. Use `simdb alias make-unique` if you need a +guaranteed-unique name. + +## List inputs and outputs + +List every data file the simulation used (`inputs`) and produced (`outputs`), +as [URIs](../reference/uri-schemes.md): + +```yaml +inputs: + - uri: file:///work/sims/run42/input/parameters.txt + - uri: imas:hdf5?path=/work/imas/input_data +outputs: + - uri: file:///work/sims/run42/results/output.nc + - uri: imas:mdsplus?path=/work/imas/simulation_output +``` + +Tips: + +- Use absolute paths for `file` URIs. Glob patterns such as `*.nc` are expanded. +- For IMAS data, give the correct backend (`hdf5` or `mdsplus`). +- Include all relevant inputs (initial conditions, parameters, configuration) + and all outputs (results, diagnostics). + +## Describe it with metadata + +```yaml +metadata: + - machine: ITER + - code: + name: JETTO + version: "2024.1" + - description: |- + Baseline H-mode scenario simulation for ITER. + 15 MA plasma current with a Q=10 target. +``` + +Conventions: + +- **machine**: the device name. Always include it. +- **code**: name and version, for reproducibility. +- **description**: context about the run's purpose. +- Add any other key/value pairs you want to be able to query on. Metadata names + may not contain `:`, `=`, or `#`. + +## Check it + +Before ingesting, validate the file's structure: + +```bash +simdb manifest check manifest.yaml +``` + +Then ingest it (see [Ingest and manage simulations](ingest-and-manage.md)). diff --git a/docs/how-to/ingest-and-manage.md b/docs/how-to/ingest-and-manage.md new file mode 100644 index 00000000..13afda44 --- /dev/null +++ b/docs/how-to/ingest-and-manage.md @@ -0,0 +1,71 @@ +# Ingest and manage simulations + +This guide covers working with simulations in your **local** catalogue: +ingesting, listing, inspecting, modifying metadata, and deleting. For querying, +see [Query simulations](query-simulations.md); for sharing, see +[Push and pull](push-pull.md). + +## Ingest + +Add a simulation from a [manifest](create-a-manifest.md): + +```bash +simdb simulation ingest manifest.yaml +``` + +Override the manifest's alias at ingest time: + +```bash +simdb simulation ingest -a my-alias manifest.yaml +``` + +## List + +```bash +simdb simulation list +``` + +Useful options: + +```bash +simdb simulation list -m machine -m code.name # add metadata columns +simdb simulation list --uuid # show UUIDs +simdb simulation list -l 50 # limit rows (default 100) +``` + +## Inspect + +Show everything stored for one simulation, by alias or UUID: + +```bash +simdb simulation info my-alias +``` + +## Modify metadata + +Add or update, and delete, metadata on a local simulation: + +```bash +simdb simulation modify my-alias --set-meta reviewed=yes +simdb simulation modify my-alias --del-meta reviewed +simdb simulation modify my-alias -a new-alias # change the alias +``` + +## Delete + +Delete a single simulation: + +```bash +simdb simulation delete my-alias +``` + +Reset the entire local catalogue: + +```bash +simdb simulation delete --all +``` + +```{note} +`simdb simulation delete --all` replaces the old `simdb database clear` command, +which has been removed. +``` diff --git a/docs/how-to/migrate-al4-mdsplus.md b/docs/how-to/migrate-al4-mdsplus.md new file mode 100644 index 00000000..65b92774 --- /dev/null +++ b/docs/how-to/migrate-al4-mdsplus.md @@ -0,0 +1,42 @@ +# Migrate AL4 MDSplus data + +SimDB reads IMAS data with +[imas-python](https://pypi.org/project/imas-python/), which requires MDSplus +data written with **Access Layer 5 (AL5)** or later. It cannot read older +**Access Layer 4 (AL4)** MDSplus data. If your data is AL4, migrate it to the +AL5 directory layout before referencing it in a manifest. + +## Migrate with `mdsplusIMASDB4to5` + +Use the `mdsplusIMASDB4to5` tool provided by IMAS-Core. It creates the AL5 +directory layout with links to your original files; the original data is not +removed. + +```text +mdsplusIMASDB4to5 [-h] [--dry-run] [-p PATH] [-d DATABASE] [-f] +``` + +| Option | Description | +| --- | --- | +| `--dry-run` | Print the actions without performing them. | +| `-p PATH`, `--path PATH` | Path of the imasdb to map (default `$HOME/public`). | +| `-d DATABASE`, `--database DATABASE` | A specific database to map (default: all). | +| `-f`, `--force` | Create the symlink even if the target file exists. | + +A dry run first is recommended: + +```bash +mdsplusIMASDB4to5 --dry-run -p /path/to/imasdb +``` + +## Reference the migrated data + +Once migrated, reference the new AL5 path in your manifest using the `mdsplus` +backend: + +```yaml +outputs: + - uri: imas:mdsplus?path= +``` + +For details of `mdsplusIMASDB4to5`, see the IMAS-Core documentation. diff --git a/docs/how-to/operate-server/configure-authentication.md b/docs/how-to/operate-server/configure-authentication.md new file mode 100644 index 00000000..90d62776 --- /dev/null +++ b/docs/how-to/operate-server/configure-authentication.md @@ -0,0 +1,66 @@ +# Configure authentication + +A server authenticates users according to its `[authentication]` configuration. +This guide shows the common setups; for every option see the +[server configuration reference](../../reference/server-configuration.md#authentication). + +## No authentication (testing only) + +```ini +[authentication] +type = None +``` + +## Behind a firewall + +When the server runs behind a firewall (such as F5) that authenticates users and +passes their identity in request headers, read the identity from those headers: + +```ini +[authentication] +firewall_auth = True +firewall_user = X-Forwarded-User +firewall_email = X-Forwarded-Email +``` + +Set `firewall_user` and `firewall_email` to the header names your firewall uses. + +## LDAP + +Requires the `auth-ldap` [extra](../../getting-started/installation.md#optional-extras). + +```ini +[authentication] +type = LDAP +ldap_server = ldaps://ldap.example.org +ldap_bind = uid={username},ou=Users,dc=example,dc=org +ldap_query_base = dc=example,dc=org +ldap_query_filter = (uid={username}) +``` + +`{username}` is replaced with the authenticating user's name. See the +[reference](../../reference/server-configuration.md#ldap-type--ldap) for the +optional query-user, uid, and mail settings. + +## Active Directory + +Requires the `auth-ad` [extra](../../getting-started/installation.md#optional-extras). + +```ini +[authentication] +type = ActiveDirectory +ad_server = ad.example.org +ad_domain = EXAMPLE +ad_cert = /path/to/root-ca.crt +``` + +## Admin access + +The `admin` superuser (password set by `server.admin_password`) and any users in +the `admin` [role](../../reference/server-configuration.md#role-name) can use the +`simdb remote admin` commands: + +```ini +[role "admin"] +users = admin,alice,bob +``` diff --git a/docs/how-to/operate-server/configure-validation.md b/docs/how-to/operate-server/configure-validation.md new file mode 100644 index 00000000..b174a506 --- /dev/null +++ b/docs/how-to/operate-server/configure-validation.md @@ -0,0 +1,54 @@ +# Configure validation + +A server can validate simulations automatically when they are uploaded: +checksums, a metadata schema, and optionally the contents of data files. For the +concepts, see [Validation](../../explanation/validation.md). + +## Validate uploads automatically + +In `app.cfg`: + +```ini +[validation] +auto_validate = True +error_on_fail = True +``` + +- `auto_validate` runs validation (including any file validation) on every + upload. +- `error_on_fail` rejects simulations that fail. It requires `auto_validate`. + +## Require metadata + +Create `validation-schema.yaml` in the same directory as `app.cfg`, using +[Cerberus](https://docs.python-cerberus.org/) rules: + +```yaml +description: + required: true + type: string +machine: + required: true + type: string +``` + +Clients can read the active schema with `simdb remote SERVER schema` and check +against it with `simdb simulation validate`. + +## Validate file contents (IDS validator) + +To check the contents of IMAS data files, enable a file validator. The +`ids_validator` requires the `imas-validator` +[extra](../../getting-started/installation.md#optional-extras). + +```ini +[file_validation] +type = ids_validator +bundled_ruleset = True +apply_generic = True +rule_filter_ids = summary,equilibrium +``` + +See the +[file validation options](../../reference/server-configuration.md#file_validation) +for the full list, including custom rule directories and ruleset filters. diff --git a/docs/how-to/operate-server/enable-ssl.md b/docs/how-to/operate-server/enable-ssl.md new file mode 100644 index 00000000..7078c5be --- /dev/null +++ b/docs/how-to/operate-server/enable-ssl.md @@ -0,0 +1,64 @@ +# Enable SSL + +A production SimDB server must serve over HTTPS. There are two ways to enable +SSL, depending on how you run the server. + +## Option A: TLS at Nginx (recommended) + +When [running behind Nginx and Gunicorn](run-behind-nginx-gunicorn.md), let +Nginx terminate TLS. Change `/etc/nginx/conf.d/simdb.conf` to listen on 443 and +redirect HTTP to HTTPS: + +```nginx +server { + listen 443 ssl; + server_name localhost; # or the server's address + + ssl_protocols TLSv1.1 TLSv1.2; + ssl_prefer_server_ciphers on; + ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5; + + ssl_certificate /etc/pki/nginx/server.crt; + ssl_certificate_key /etc/pki/nginx/private/server.key; + + location / { + include proxy_params; + proxy_pass http://unix:/var/run/simdb.sock; + } +} + +server { + if ($host = localhost) { # or the server's address + return 301 https://$host$request_uri; + } + server_name localhost; + listen 80; + return 404; +} +``` + +Point `ssl_certificate` and `ssl_certificate_key` at a certificate and key +issued by a valid signing authority. + +## Option B: TLS at the built-in server + +For the built-in development server, set the SSL options in `app.cfg`: + +```ini +[server] +ssl_enabled = True +ssl_cert_file = /path/to/server.crt +ssl_key_file = /path/to/server.key +``` + +## Generating a self-signed certificate (testing only) + +For local testing you can generate a self-signed certificate. Use a real signing +authority in production. + +```bash +openssl req -x509 -out server.crt -keyout server.key \ + -newkey rsa:2048 -nodes -sha256 \ + -subj '/CN=localhost' -extensions EXT -config <( \ + printf "[dn]\nCN=localhost\n[req]\ndistinguished_name = dn\n[EXT]\nsubjectAltName=DNS:localhost\nkeyUsage=digitalSignature\nextendedKeyUsage=serverAuth") +``` diff --git a/docs/how-to/operate-server/install-server.md b/docs/how-to/operate-server/install-server.md new file mode 100644 index 00000000..4a6772e0 --- /dev/null +++ b/docs/how-to/operate-server/install-server.md @@ -0,0 +1,86 @@ +# Install a server + +This guide installs SimDB with the server components. For running it, see +[Run a development server](run-dev-server.md) and +[Run behind Nginx and Gunicorn](run-behind-nginx-gunicorn.md). + +## Install + +Clone SimDB and create a virtual environment: + +```bash +git clone https://github.com/iterorganization/SimDB.git +cd SimDB +python3 -m venv venv +source venv/bin/activate +``` + +Install with the `all` extra (server, PostgreSQL, and IDS validation): + +```bash +pip install -e ".[all]" +``` + +To install only what you need, combine the relevant +[extras](../../getting-started/installation.md#optional-extras), for example +`pip install -e ".[server,postgres]"`. + +Verify: + +```bash +simdb --version +``` + +## Create the server configuration + +The server reads `app.cfg` from the application configuration directory. Find +it with: + +```bash +dirname "$(simdb config path)" +``` + +On Linux this is typically `/home/$USER/.config/simdb`; on macOS, +`/Users/$USER/Library/Application Support/simdb`. + +Create `app.cfg` there with the settings for your deployment, and set its +permissions to owner-only: + +```bash +chmod 600 app.cfg +``` + +A minimal SQLite configuration: + +```ini +[flask] +secret_key = CHANGE_ME_TO_A_LONG_RANDOM_STRING + +[server] +upload_folder = /var/lib/simdb/simulations +admin_password = CHANGE_ME + +[database] +type = sqlite + +[authentication] +type = None +``` + +See the [server configuration reference](../../reference/server-configuration.md) +for every option, including authentication, validation, caching, email, and +roles, and for a PostgreSQL example. + +```{tip} +To stand up a complete server (with PostgreSQL and Redis) in one command, use +the [Docker Compose deployment](run-with-docker.md) instead of installing by +hand. +``` + +## Next steps + +- [Run with Docker Compose](run-with-docker.md) for an all-in-one deployment. +- [Set up PostgreSQL](set-up-postgresql.md) for production. +- [Configure authentication](configure-authentication.md). +- [Configure validation](configure-validation.md). +- [Run behind Nginx and Gunicorn](run-behind-nginx-gunicorn.md) for production. diff --git a/docs/how-to/operate-server/run-behind-nginx-gunicorn.md b/docs/how-to/operate-server/run-behind-nginx-gunicorn.md new file mode 100644 index 00000000..8de94e2d --- /dev/null +++ b/docs/how-to/operate-server/run-behind-nginx-gunicorn.md @@ -0,0 +1,71 @@ +# Run behind Nginx and Gunicorn + +In production, run the SimDB server as a WSGI service behind a dedicated web +server. This guide uses Gunicorn as the WSGI server and Nginx as the +proxy/load-balancer. It assumes Nginx and Gunicorn are already installed. + +## Set up the Gunicorn service + +Copy the init script from `src/simdb/remote/scripts/simdb.initd` in the SimDB +install directory to `/etc/init.d/simdb`. + +Edit two lines in it: + +- `USER=simdb` to the user the workers should run as. +- `DAEMON=/home/simdb/venv/bin/gunicorn` to the `gunicorn` in your virtual + environment (find it with `which gunicorn` while the venv is active). + +Start and check the service: + +```bash +service simdb start +service simdb status +``` + +## Set up Nginx + +Create `/etc/nginx/conf.d/simdb.conf`: + +```nginx +server { + listen 80; + server_name localhost; # or the server's address + + location / { + include proxy_params; + proxy_pass http://unix:/var/run/simdb.sock; + } +} +``` + +The packaged `src/simdb/remote/scripts/simdb.nginx` can be copied instead. The +`proxy_pass` target must match the `BIND` value in the init script. + +If `/etc/nginx/proxy_params` does not exist, create it: + +```nginx +proxy_set_header Host $http_host; +proxy_set_header X-Real-IP $remote_addr; +proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; +proxy_set_header X-Forwarded-Proto $scheme; +``` + +Make sure `/etc/nginx/nginx.conf` includes `/etc/nginx/conf.d/*.conf` inside its +`http {}` block, then reload: + +```bash +service nginx restart +``` + +## Allow large uploads + +Simulation uploads can be large. Raise the body-size limit (at least 100 MB) in +`/etc/nginx/nginx.conf`: + +```nginx +client_max_body_size 100m; +``` + +## Enable HTTPS + +For production, terminate TLS at Nginx. See [Enable SSL](enable-ssl.md). diff --git a/docs/how-to/operate-server/run-dev-server.md b/docs/how-to/operate-server/run-dev-server.md new file mode 100644 index 00000000..0fa5ab62 --- /dev/null +++ b/docs/how-to/operate-server/run-dev-server.md @@ -0,0 +1,42 @@ +# Run a development server + +SimDB ships a built-in HTTP server for testing and development. + +```{warning} +The built-in server is for testing and development only. For production, run +behind a dedicated web server: see +[Run behind Nginx and Gunicorn](run-behind-nginx-gunicorn.md). +``` + +## Prerequisites + +- SimDB [installed with the server extra](install-server.md). +- An `app.cfg` in the application configuration directory (see + [Install a server](install-server.md)). + +## Start it + +```bash +simdb_server +``` + +The server starts on port 5000. Check it by opening + in a browser, or: + +```bash +curl http://0.0.0.0:5000 +``` + +which returns the available API URLs as JSON. + +## Interactive API docs + +Each API version publishes Swagger UI documentation, for example +. See the +[REST API reference](../../reference/rest-api.md). + +## Troubleshooting the port + +If the server cannot bind to port 5000, another service is using it. Stop that +service, or change the port in the `simdb_server` script (find it with +`which simdb_server`). See also [Troubleshooting](../../troubleshooting.md). diff --git a/docs/how-to/operate-server/run-with-docker.md b/docs/how-to/operate-server/run-with-docker.md new file mode 100644 index 00000000..22b9412a --- /dev/null +++ b/docs/how-to/operate-server/run-with-docker.md @@ -0,0 +1,123 @@ +# Run with Docker Compose + +SimDB ships a Docker Compose setup that runs the server together with its +PostgreSQL database and a Redis instance, applying database migrations +automatically. This is the quickest way to stand up a complete server. + +## Prerequisites + +- Docker and Docker Compose. +- A checkout of the SimDB repository (the `Dockerfile` and `docker-compose.yml` + live at its root). + +## Services + +`docker-compose.yml` defines the following: + +| Service | Role | +| --- | --- | +| `web` | The SimDB server, published on port 5000. | +| `postgres` | PostgreSQL database (user/password/db `simdb`). | +| `redis` | Redis, used as the message broker for background workers. | +| `migrations` | One-shot service that runs `alembic upgrade head`, then exits. | +| `worker`, `beat` | Optional Celery worker and scheduler (see [Background workers](#background-workers)). | + +The image is built from the `Dockerfile`, which uses `uv` and installs SimDB +with the `all` extra. + +## Configure + +The server reads its configuration from `config/simdb.cfg`, which is mounted into +the container. The image sets `SIMDB_SITE_CONFIG_PATH=/app/config/simdb.cfg`, so +edit `config/simdb.cfg` to configure the deployment. A starting point: + +```ini +[flask] +flask_env = development +debug = True +testing = True +secret_key = CHANGE_ME + +[authentication] +type = None + +[server] +upload_folder = /data/simdb/simulations +port = 5000 +ssl_enabled = False +admin_password = CHANGE_ME +imas_remote_host = localhost + +[database] +type = postgres +host = postgres +port = 5432 +username = simdb +password = simdb +database = simdb + +[validation] +path = ./validation +auto_validate = True +error_on_fail = True + +[celery] +broker_url = redis://redis:6379/0 +result_backend = redis://redis:6379/0 + +[partition] +data = /data/simdb/partition +``` + +Note that `[database].host` is `postgres` (the Compose service name) and the +Celery broker points at the `redis` service. The `[server].port` value must +match the published port. + +See the [server configuration reference](../../reference/server-configuration.md) +for all options. + +## Start + +```bash +docker compose up --build +``` + +Compose builds the image, starts PostgreSQL and Redis, waits for them to become +healthy, runs the migrations, and then starts the `web` service. The server is +then reachable at . + +To run in the background: + +```bash +docker compose up --build -d +``` + +Stop everything with: + +```bash +docker compose down +``` + +Database and Redis state persist in named volumes (`postgres_data`, +`redis_data`). + +## Background workers + +The `worker` and `beat` services run under the `with_workers` +[Compose profile](https://docs.docker.com/compose/profiles/), so they are off by +default. Enable them when you want asynchronous processing: + +```bash +docker compose --profile with_workers up --build +``` + +They use the `[celery]` broker and result backend from `config/simdb.cfg`. + +## Building for a specific Python version + +`docker-compose-pyver.yml` and the `PYVER` build argument let you build against +a chosen Python version, for example: + +```bash +PYVER=3.12 docker compose -f docker-compose-pyver.yml up --build +``` diff --git a/docs/how-to/operate-server/set-up-postgresql.md b/docs/how-to/operate-server/set-up-postgresql.md new file mode 100644 index 00000000..676b7a64 --- /dev/null +++ b/docs/how-to/operate-server/set-up-postgresql.md @@ -0,0 +1,89 @@ +# Set up PostgreSQL + +For a production server, use PostgreSQL as the database. This guide covers +installing PostgreSQL, creating the SimDB database, applying migrations, and +pointing the server at it. It is not an exhaustive PostgreSQL guide; see the +[PostgreSQL documentation](https://www.postgresql.org/) for more. + +If PostgreSQL is already running, skip to [Create the database](#create-the-database). + +## Install PostgreSQL + +Install from your system package manager. On RHEL/CentOS: + +```bash +sudo yum -y install postgresql-server postgresql-contrib +``` + +Initialise the database cluster (this creates the default data directory, for +example `/var/lib/pgsql/data`): + +```bash +sudo postgresql-setup initdb +``` + +Start the service and enable it at boot: + +```bash +sudo systemctl start postgresql +sudo systemctl enable postgresql +``` + +## Create the database + +Connect as the `postgres` user: + +```bash +sudo -u postgres psql +``` + +Create the database and a role for the server (this assumes the server runs as +user `simdb`; change the role name to match your server user): + +```sql +CREATE DATABASE simdb; +CREATE ROLE simdb; +ALTER DATABASE simdb OWNER TO simdb; +ALTER ROLE "simdb" WITH LOGIN; +``` + +## Test the connection + +```python +import psycopg2 +psycopg2.connect("postgresql://simdb@localhost:5432") +``` + +```{tip} +If a local connection is refused, check `pg_hba.conf` in the PostgreSQL data +directory and ensure the connection method is `trust` (or another method you can +authenticate with) rather than `ident`. +``` + +## Apply migrations + +The schema is managed with [Alembic](https://alembic.sqlalchemy.org/). Point it +at the database and upgrade to the latest revision: + +```bash +export DATABASE_URL="postgresql+psycopg2://simdb@localhost/simdb" +alembic upgrade head +``` + +See [Run database migrations](../contribute/run-migrations.md) for more. + +## Point the server at PostgreSQL + +In `app.cfg`: + +```ini +[database] +type = postgres +host = localhost +port = 5432 +name = simdb +``` + +The server also needs the PostgreSQL driver, installed by the `postgres` +[extra](../../getting-started/installation.md#optional-extras) +(`pip install -e ".[postgres]"`). diff --git a/docs/how-to/push-pull.md b/docs/how-to/push-pull.md new file mode 100644 index 00000000..6b555080 --- /dev/null +++ b/docs/how-to/push-pull.md @@ -0,0 +1,59 @@ +# Push and pull simulations + +Pushing copies a local simulation up to a server; pulling copies a server +simulation down to your machine. Both need a [configured remote](configure-remotes.md). + +## Push + +Once a simulation is ingested locally and you are happy with its metadata, push +it to make it available to others: + +```bash +simdb simulation push SIM_ID +``` + +This uploads all metadata and copies every referenced input and output. For +`file` URIs the files are transferred directly; for `imas` URIs SimDB discovers +the files to transfer from the backend in the URI. Files are sent over HTTP. + +Validate first to catch problems early (see +[Validate a simulation](validate-a-simulation.md)): + +```bash +simdb simulation validate SIM_ID +``` + +### Replace an earlier simulation + +```bash +simdb simulation push SIM_ID --replaces OLD_SIM_ID +``` + +The previous simulation is marked `deprecated` and gains a `replaced_by` +reference to the new one. Inspect the revision history with +`simdb remote trace SIM_ID`. + +### Add a watcher while pushing + +```bash +simdb simulation push SIM_ID --add-watcher +``` + +See [watchers](../explanation/concepts.md#watchers) and the +`simdb remote watcher` commands in the [CLI reference](../reference/cli.md). + +## Pull + +Pull copies a simulation's metadata into your local catalogue and downloads its +data into a directory you choose: + +```bash +simdb simulation pull REMOTE SIM_ID DIRECTORY +``` + +- `REMOTE` is optional; the default remote is used if omitted. +- `SIM_ID` is the alias or UUID on the remote. +- `DIRECTORY` is where the data is downloaded. + +After pulling, the simulation appears in your local +[queries](query-simulations.md). diff --git a/docs/how-to/query-simulations.md b/docs/how-to/query-simulations.md new file mode 100644 index 00000000..ac89e53c --- /dev/null +++ b/docs/how-to/query-simulations.md @@ -0,0 +1,58 @@ +# Query simulations + +You can search by metadata both in your local catalogue and on a remote server. +This guide shows the commands; for the full operator list and syntax, see +[Query operators](../reference/query-operators.md). + +## Query locally + +```bash +simdb simulation query code.name=JETTO +simdb simulation query pulse=gt:1000 run=0 +``` + +Each constraint is `NAME=[modifier:]VALUE`. Multiple constraints are combined +with AND. Add metadata columns with `-m` and UUIDs with `--uuid`: + +```bash +simdb simulation query machine=ITER -m code.name --uuid +``` + +## Query a remote + +The same syntax works against a server. Remote queries additionally support the +array operators (`agt`, `age`, `alt`, `ale`): + +```bash +simdb remote query machine=ITER +simdb remote iter query code.name=SOLPS-ITER +``` + +If you have set a default remote, omit its name. Example output: + +```text +alias code.name +-------------------- +103027/3 SOLPS-ITER +103028/3 SOLPS-ITER +``` + +## Browse a remote + +List everything on a remote, or inspect one simulation: + +```bash +simdb remote list +simdb remote info SIM_ID +``` + +## Common operators + +| Constraint | Matches | +| --- | --- | +| `machine=ITER` | exactly `ITER` (case-insensitive) | +| `code.name=in:sol` | values containing `sol` | +| `pulse=gt:1000` | values greater than 1000 | +| `sequence=exist:` | simulations that have a `sequence` field | + +See [Query operators](../reference/query-operators.md) for the rest. diff --git a/docs/how-to/use-the-dashboard.md b/docs/how-to/use-the-dashboard.md new file mode 100644 index 00000000..84d827a8 --- /dev/null +++ b/docs/how-to/use-the-dashboard.md @@ -0,0 +1,33 @@ +# Use the dashboard + +A SimDB server provides a web dashboard for browsing simulation metadata in a +browser, complementing the CLI. + +## Open a simulation by UUID + +Use this URL pattern: + +``` +https:///dashboard/uuid/ +``` + +For example, on the ITER server with UUID +`abcdef12345678901234567890abcdef`: + +``` +https://simdb.iter.org/dashboard/uuid/abcdef12345678901234567890abcdef +``` + +## Search in the dashboard + +Alternatively, enter a UUID or alias in the **Alias/UUID** search field on the +dashboard. + +```{tip} +- Use the full 32-character UUID (without dashes) if that is how it is stored. +- If your deployment uses a different base path, adjust `` accordingly. +``` + +You can find a simulation's UUID from the CLI with +`simdb remote info SIM_ID` or by adding `--uuid` to a +[list or query](query-simulations.md). diff --git a/docs/how-to/validate-a-simulation.md b/docs/how-to/validate-a-simulation.md new file mode 100644 index 00000000..df13fd2c --- /dev/null +++ b/docs/how-to/validate-a-simulation.md @@ -0,0 +1,45 @@ +# Validate a simulation + +Validating checks that a simulation is intact and meets a server's +requirements, before you push it. For the concepts behind validation, see +[Validation](../explanation/validation.md). + +## Validate against a server + +```bash +simdb simulation validate SIM_ID +simdb simulation validate REMOTE SIM_ID # name a specific remote +``` + +A `validation successful` message means the simulation is ready to +[push](push-pull.md). + +## Common failures and fixes + +### A data source is missing or changed + +Validation recomputes the checksum of every input and output and compares it to +what was recorded at ingest. A mismatch (or a missing file) fails validation. + +Fix: restore the file, or re-ingest the simulation from an updated manifest so +the recorded checksums match the current data. + +### Missing required metadata + +A server can require specific metadata. See what it requires: + +```bash +simdb remote schema -d 10 +``` + +Fix: add the required fields. For a not-yet-pushed local simulation you can add +metadata with `simdb simulation modify SIM_ID --set-meta NAME=VALUE`, or correct +the manifest and re-ingest. See +[Ingest and manage](ingest-and-manage.md#modify-metadata). + +## Server-side validation + +Servers can also validate automatically on upload (checksums, metadata schema, +and optional file-content validation such as the IDS validator). This is +controlled by the server's `[validation]` and `[file_validation]` settings; see +[Configure validation](operate-server/configure-validation.md). diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..31249f41 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,123 @@ +# IMAS Simulation Database Management Tool + +**SimDB** tracks, manages, validates and shares scientific simulations. You +describe a simulation and the data it produced in a small manifest file, ingest +it into your local catalogue, and, when you are ready, push it to a shared +SimDB server where colleagues can query, validate and reuse it. + +SimDB is built for [IMAS](explanation/glossary.md) fusion-simulation workflows +but works with any files you want to catalogue. + +## Where to start + +- **Getting started**: [Install SimDB](getting-started/installation.md), then + follow the [quickstart](getting-started/quickstart.md). +- **Tutorials**: Learn by doing: + [catalogue your first simulation](tutorials/first-simulation.md) and + [push it to a server](tutorials/push-to-remote.md). +- **How-to guides**: Task recipes for + [manifests](how-to/create-a-manifest.md), + [queries](how-to/query-simulations.md), + [remotes](how-to/configure-remotes.md) and + [running a server](how-to/operate-server/install-server.md). +- **Reference**: The [CLI](reference/cli.md), + [configuration](reference/configuration.md), + [manifest format](reference/manifest-format.md) and + [Python API](reference/python-api/index.md). +- **Explanation**: Understand the [concepts](explanation/concepts.md) and + [architecture](explanation/architecture.md) behind SimDB. + +```{toctree} +:caption: Getting Started +:maxdepth: 2 +:hidden: + +getting-started/installation +getting-started/quickstart +``` + +```{toctree} +:caption: Tutorials +:maxdepth: 2 +:hidden: + +tutorials/first-simulation +tutorials/push-to-remote +``` + +```{toctree} +:caption: How-to Guides +:maxdepth: 2 +:hidden: + +how-to/create-a-manifest +how-to/ingest-and-manage +how-to/query-simulations +how-to/push-pull +how-to/validate-a-simulation +how-to/configure-remotes +how-to/authenticate +how-to/connect-to-iter +how-to/migrate-al4-mdsplus +how-to/use-the-dashboard +``` + +```{toctree} +:caption: Operating a Server +:maxdepth: 2 +:hidden: + +how-to/operate-server/install-server +how-to/operate-server/run-dev-server +how-to/operate-server/run-with-docker +how-to/operate-server/run-behind-nginx-gunicorn +how-to/operate-server/enable-ssl +how-to/operate-server/set-up-postgresql +how-to/operate-server/configure-authentication +how-to/operate-server/configure-validation +``` + +```{toctree} +:caption: Contributing +:maxdepth: 2 +:hidden: + +how-to/contribute/set-up-dev-env +how-to/contribute/run-tests-and-lint +how-to/contribute/run-migrations +how-to/contribute/build-the-docs +``` + +```{toctree} +:caption: Reference +:maxdepth: 2 +:hidden: + +reference/cli +reference/configuration +reference/server-configuration +reference/manifest-format +reference/uri-schemes +reference/query-operators +reference/rest-api +reference/python-api/index +``` + +```{toctree} +:caption: Explanation +:maxdepth: 2 +:hidden: + +explanation/concepts +explanation/architecture +explanation/validation +explanation/glossary +``` + +```{toctree} +:caption: Help +:maxdepth: 1 +:hidden: + +troubleshooting +``` diff --git a/docs/install_guide.md b/docs/install_guide.md deleted file mode 100644 index 3681f7b2..00000000 --- a/docs/install_guide.md +++ /dev/null @@ -1,65 +0,0 @@ -# SimDB Installation Guide - -## Installing simdb - -### Installing from source: - -``` -git clone https://github.com/iterorganization/SimDB.git -cd SimDB -python3 -m venv ./venv -. venv/bin/activate -pip3 install -e . -``` - -### Installing directly from PyPI: - -``` -pip install imas-simdb -``` - -### installing all dependencies (server, imas-validator, database): -``` -pip3 install -e .[all] -``` - -## Installing simdb with specific extras: -### Install IMAS-Validator -``` -pip3 install -e .[imas-validator] -``` - -### Install simdb server dependencies -``` -pip3 install -e .[server] -``` - -### Install PostgreSQL support -``` -pip3 install -e .[postgres] -``` - -### Install authentication dependencies -``` -pip3 install -e .[auth-ldap] -pip3 install -e .[auth-keycloak] -pip3 install -e .[auth-ad] -``` - -### Install documentation dependencies -``` -pip3 install -e .[build-docs] -``` - -### Multiple extras can be combined -``` -pip3 install -e .[server,postgres,imas-validator] -``` - -You should then be able to run the command: - -``` -simdb --help -``` - -**Note:** If you get an error such as `command not found: simdb` then you may need to add the bin folder in your pip install location to your path. diff --git a/docs/iter_certificate.md b/docs/iter_certificate.md deleted file mode 100644 index c20b7978..00000000 --- a/docs/iter_certificate.md +++ /dev/null @@ -1,24 +0,0 @@ -# Installing ITER SSL certificate - -To use the SimDB CLI on an ITER HPC node, you need to first download the root and issuing CA certificates: - -```bash -wget "http://pki.iter.org/CertEnroll/io-ws-pkiroot_ITER%20Organization%20Root%20CA.crt" -wget "http://pki.iter.org/CertEnroll/io-ws-pki1.iter.org_ITER%20Organization%20Issuing%20CA1.crt" -``` - -The certificates need to be converted into the PEM format and concatenated into a single file, in this case stored at `$HOME/iter.pem`: - -```bash -openssl x509 -inform DEM -in io-ws-pki1.iter.org_ITER\ Organization\ Issuing\ CA1.crt -out CA1.pem -openssl x509 -inform DEM -in io-ws-pkiroot_ITER\ Organization\ Root\ CA.crt -out CA2.pem -cat CA1.pem CA2.pem > $HOME/iter.pem -``` - -Before using the SimDB client you need to set the environment variable `SIMDB_REQUESTS_CA_BUNDLE` to point to the file created above: - -```bash -export SIMDB_REQUESTS_CA_BUNDLE=$HOME/iter.pem -``` - -This line can be added to `$HOME/.bash_profile` so that you don't need to set it for each bash terminal. \ No newline at end of file diff --git a/docs/iter_remotes.md b/docs/iter_remotes.md deleted file mode 100644 index 282956db..00000000 --- a/docs/iter_remotes.md +++ /dev/null @@ -1,53 +0,0 @@ -# Connecting to the ITER remotes - -## Adding the ITER remote - -The commands you need to set up an ITER remote is as follows: - -```shell -simdb remote config new iter https://simdb.iter.org/scenarios/api/ -simdb remote config set-option iter firewall F5 -``` - -Now when you list the remotes (using `simdb remote config list`) you should see: - -```shell -... -iter: https://simdb.iter.org/scenarios/api/ [firewall: F5] -... -``` - -You can make this your default remote using: - -```shell -simdb remote config set-default iter -``` - -You may also want to add your ITER username to remote configuration which you can do with: - -```shell -simdb remote config set-option iter username -``` - -## Testing the ITER remote - -Once the iter remote is set up you should be able to list simulations from ITER using: - -```shell -simdb remote iter list -``` - -or if you have set the iter remote to be your default: - -```shell -simdb remote list -``` - -This will ask for your username and password for authentication against the server. - - diff --git a/docs/maintenance_guide.md b/docs/maintenance_guide.md deleted file mode 100644 index e5664af2..00000000 --- a/docs/maintenance_guide.md +++ /dev/null @@ -1,419 +0,0 @@ -# SimDB server maintenance guide - -This guide describes the steps needed to set up and maintain a SimDB server as a production service. The first section details the general steps required to do this, followed by details on how this is done at ITER. - -## Installing SimDB - -First clone the master branch of SimDB: - -```bash -git clone https://github.com/iterorganization/SimDB.git -``` - -Next set up the virtual environment: - -```bash -cd simdb -python3 -m venv venv -source venv/bin/activate -``` - -And install SimDB: - -```bash -pip install -e .[all] -``` - -**Note:** If you plan to run the server with a PostgreSQL database you will also need to install the `psycopg2-binary` library. - -You can test the SimDB installation by running: - -```bash -simdb --version -``` - -## Running the server (using built-in http server) - -**Note:** Running the SimDB server using the built-in http server is for testing/development only and should not be used in production. In production you should run the SimDB server behind a dedicated web-server such as NGinx (see the [Running the server behind nginx & gunicorn](#running-the-server-behind-nginx--gunicorn) section below). - -Once simdb has been installed, before you can run the server you need to create the server configuration file. This file should be created in the application configuration directory which can be located by using: - -``` -dirname "$(simdb config path)" -``` - -For example on Linux this would be: - -``` -/home/$USER/.config/simdb -``` - -On macOS this would be: - -``` -/Users/$USER/Library/Application Support/simdb -``` - -In this directory you should create a file 'app.cfg' specifying the server configuration. This file must have permissions set to `0600` i.e. user read only. - -Options for the server configuration are: - -| Section | Option | Required | Description | -|-----------------|--------------------------|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| database | type | yes | Database type [sqlite, postgres]. | -| database | file | yes (type=sqlite) | Database file (for sqlite) - defaults to remote.db in the user data directory if not specified. | -| database | host | yes (type=postgres) | Database host (for postgres). | -| database | port | yes (type=postgres) | Database port (for postgres). | -| database | name | yes (type=postgres) | Database name (for postgres). | -| server | upload_folder | yes | Root directory where SimDB simulation files are stored. | -| server | ssl_enabled | no | Flag [True, False] to specify whether the debug server uses SSL - this should be set to False for production servers behind dedicated webserver. Defaults to False. | -| server | ssl_cert_file | yes (ssl_enabled=True) | Path to SSL certificate file if ssl_enabled is True. | -| server | ssl_key_file | yes (ssl_enabled=True) | Path to SSL key file if ssl_enabled is True. | -| server | admin_password | yes | Password for admin superuser. | -| server | token_lifetime | no | Number of days generated tokens are valid for - defaults to 30 days. | -| server | imas_remote_host | no | Host name to set on ingested IMAS URIs which will be used to fetch the data via the specified IMAS remote access server. I.e. imas:hdf5?path=foo becomes imas://:/uda?path=foo&backend=hdf5 on ingest. | -| server | imas_remote_port | no | Port to set on ingested IMAS URIs on ingest. See imas_remote_host for more details. | -| flask | flask_env | no | Flask server environment [development, production] - defaults to production. | -| flask | debug | no | Flag [True, Flase] to specify whether Flask server is run with debug mode enabled - defaults to True if flask_env='development', otherwise False. | -| flask | testing | no | Flag [True, False] to specify whether exceptions are propagated rather than being handled by Flask's error handlers - defaults to False. | -| flask | secret_key | yes | Secret key used to encrypt server messages including authentication tokens - should be at least 20 characters long. | -| flask | swagger_ui_doc_expansion | no | Default state of the Swagger UI documentations [none, list, full]. | -| validation | auto_validate | no | Flag [True, False] to set whether the server should run validation on uploaded simulations (including running any selected file_validation) automatically. Defaults to False. | -| validation | error_on_fail | no | Flag [True, False] to set whether simulations that fail validation should be rejected - auto_validate must be set to True if this flag is set to True. Defaults to False | -| email | server | yes | SMTP server used to send emails from the SimDB server. | -| email | port | yes | SMTP server port port. | -| email | user | yes | SMTP server user to send emails from . | -| email | password | yes | SMTP server user password. | -| development | disable_checksum | yes | Flag [True, False] to set whether integrity checks should be perform or not. Defaults to False | - -### File validation options - -| Section | Option | Required | Description | -|------------------|--------------------------|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| file_validation | type | no | Name of file Validator to use to validate simulation data i.e "ids_validator", "test_validator" etc. At the moment only "ids_validator" is available. | -| file_validation | extra_rule_dirs | yes (ids_validator only) | Paths to directory containing additional rulesets used by ids_validator i.e. "./path/to/ruleset_dir_1,/path_to_ruleset_dir_2,etc." | -| file_validation | rulesets | yes (ids_validator only) | Name of rulesets [generic, extra_ruleset] (directory containing python scripts) used for IDS validation i.e. "my_custom_ruleset,summary_time,etc." | -| file_validation | bundled_ruleset | yes (ids_validator only) | Flag [True, False] to load the rulesets bundled with ids_validator. Defaults to True | -| file_validation | apply_generic | yes (ids_validator only) | Flag [True, False] to apply generic rulesets. Defaults to True | -| file_validation | rule_filter_name | yes (ids_validator only) | Only rulesets containing specified names will be applied, i.e. "summary_test_1,core_profiles_test_1,etc." | -| file_validation | rule_filter_ids | yes (ids_validator only) | Only rulesets concerning specified IDS will be applied, i.e. "summary,equilibrium,etc." | - -### Authentication options - -| Section | Option | Required | Description | -|----------------|----------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| authentication | type | yes | Name of the authentication method used by the server to authenticate users - current options are [ActiveDirectory, LDAP, None]. See sections below for details of extra options required for the Active Directory and LDAP authentication options. | -| authentication | firewall_auth | no | Flag [True, False] to specify that the server is being run behind a firewall and that the authentication should be read from the firewall headers. | -| authentication | firewall_user | no | Name of the firewall header to use for the user name. Required if firewall_auth is True. | -| authentication | firewall_email | no | Name of the firewall header to use for the user's email. Required if firewall_auth is True. | - -### Activate Directory authentication options - -| Section | Option | Required | Description | -|----------------|-----------|----------|---------------------------------------------------------------| -| authentication | ad_server | yes | Active directory server used for user authentication. | -| authentication | ad_domain | yes | Active directory domain used for user authentication. | -| authentication | ad_cert | no | Path to the root ca certificate used for user authentication. | - -### LDAP authentication options - -| Section | Option | Required | Description | -|----------------|---------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| authentication | ldap_server | yes | LDAP server URI. | -| authentication | ldap_bind | yes | Bind string - this can contain {username} which will be replaced by the username of the user attempting to authenticate, i.e. "uid={username},ou=Users,dc=eufus,dc=eu". | -| authentication | ldap_query_user | no | Bind user used to run LDAP queries, i.e. "uid=f2bind,ou=Users,dc=eufus,dc=eu" - if not provided the queries are run as the authenticated user. | -| authentication | ldap_query_password | no | Password corresponding to ldap_query_user. Only required if ldap_query_user is specified. | -| authentication | ldap_query_base | yes | Base point to start the query from, i.e. "dc=eufus,dc=eu". | -| authentication | ldap_query_filter | yes | Query filter used to find the user - this can contain {username} which will be replaced by the username of the user attempting to authenticate, i.e. "(uid={username})". | -| authentication | ldap_query_uid | no | Name of the user parameter in the LDAP search query - defaults to 'uid'. | -| authentication | ldap_query_mail | no | Name of the email parameter in the LDAP search query - defaults to 'mail'. | - -### Caching options - -| Section | Option | Required | Description | -|---------|-----------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| cache | type | no | Type of caching to use. Options include NullCache (default), SimpleCache, FileSystemCache. SimpleCache is a memory based cache and FileSystemCache caches using files. Configuration options for these are given below. | -| cache | dir | no | Directory to store cache. Used only for FileSystemCache. | -| cache | default_timeout | no | The default timeout that is used if no timeout is specified. Unit of time is seconds. | -| cache | threshold | no | The maximum number of items the cache will store before it starts deleting some. Used only for SimpleCache and FileSystemCache | - -More caching options can be found in the [Flask-Caching documentation](https://flask-caching.readthedocs.io/en/latest/#built-in-cache-backends). You can convert the caching options for the library to SimDB configuration by removing the `CACHE_` prefix and converting to lowercase, i.e. `CACHE_ARGS` becomes `args` in the `[cache]` section. - -### Role options - -| Section | Option | Required | Description | -|---------|--------|----------|----------------------------------------------------------------| -| role | users | yes | A comma separated list of the the users assigned to this role. | - -Each role must be given a name in the section header, and whilst defining any roles is optional each `role` section must -have a `users` option. - -For example: - -```yaml -[role "admin"] -users = admin,user1,user2 -``` - -Currently only the `admin` role is used in SimDB (this is the set of users able to perform CLI command in the `admin` -command subgroup). - -### Example configuration files - -Example of app.cfg for SQLite: - -``` -[flask] -flask_env = development -debug = True -testing = True -secret_key = CHANGE_ME - -[server] -upload_folder = /tmp/simdb/simulations -ssl_enabled = False -admin_password = admin - -[database] -type = sqlite - -[validation] -auto_validate = True -error_on_fail = True - -[email] -server = smtp.email.com -port = 465 -user = test@email.com -password = abc123 - -[development] -disable_checksum = True -``` - -Example of app.cfg for PostgreSQL (see [Setting up PostgreSQL database](setting_up_postgres.md)): - -``` -... - -[database] -type = postgres -host = localhost -port = 5432 - -DB_TYPE = "postgres" -DB_HOST = "localhost" -DB_PORT = 5432 -UPLOAD_FOLDER = "/tmp/simdb/simulations" -DEBUG = False -SSL_ENABLED = True - -... -``` -Now create a validation schema in the application configuration directory, which can be located by using: - -``` -dirname "$(simdb config path)" -``` -In this directory, you should create a file ‘validation-schema.yaml’ specifying the validation schema. -Example of validation-schema.yaml: - -``` -description: - required: true - type: string -``` - -Once the server configuration has been created you should be able to run - -``` -simdb_server -``` -And see some console output such as: - -``` - * Serving Flask app "simdb.remote.app" (lazy loading) - * Environment: production - WARNING: This is a development server. Do not use it in a production deployment. - Use a production WSGI server instead. - * Debug mode: on - * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) -``` - -**Note:** If it fails to run with an error stating that it cannot bind to a port then you will need to see check whatever service is running on port 5000 and shut this down if possible. If you need to modify the port you will need to edit the `simdb_server` script (which you can locate using `which simdb_server`), changing the port number. - -Follow the url in the output (you can do this using a browser or using curl, e.g. `curl http://0.0.0.0:5000`), and you should see the returned JSON data: - -``` -{ urls: [ "http://0.0.0.0:5000/api/v0.1.1" ] } -``` - -This is running the Flask's internal webserver and should only be used for development or testing. For production the server should be run behind a dedicated webserver and load balancer, see below for details for how to do this using Gunicorn and Nginx. - -## Using SSL - -If you want to run using SSL encryption you will need to provide a server certificate and private key in the application configuration directory. - -A way to generate these, is using the openssl command: - -``` -openssl req -x509 -out server.crt -keyout server.key \ --newkey rsa:2048 -nodes -sha256 \ - -subj '/CN=localhost' -extensions EXT -config <( \ -printf "[dn]\nCN=localhost\n[req]\ndistinguished_name = dn\n[EXT]\nsubjectAltName=DNS:localhost\nkeyUsage=digitalSignature\nextendedKeyUsage=serverAuth") -``` - -However, you will want to use a valid signing authority in production. - -## Running the server behind nginx & gunicorn - -To run the server in production you should run it as wsgi service behind a dedicated web server. To run using nginx (as a load-balancer/proxy) and gunicorn (as the web server) we need to set up the services as follows. - -**Note:** The instructions below assume you already have nginx and gunicorn installed. - -### Set up gunicorn service - -Copy the init.d script from `src/simdb/remote/scripts/simdb.initd` in the simdb install directory (i.e. `/usr/local/lib/python3.7/site-packages/simdb/remote`) as `/etc/init.d/simdb`. - -You will need to modify the line `USER=simdb` to change to user to whichever user you wish to run the simdb as (the gunicorn service will run as root but the workers will run in user space). You will also need to modify the line `DAEMON=/home/simdb/venv/bin/gunicorn` to change the path to point towards the gunicorn installed in your virtual environment - you can find this path by running `which gunicorn` whilst the virtual environment is active. - -Once you have copied and modified the init.d script you can start the gunicorn service using: - -``` -service simdb start -``` - -And check that it is running using: - -``` -service simdb status -``` - -### Set up nginx service - -Create a simdb.conf script in `/etc/nginx/conf.d/simdb.conf` - -``` -server { - listen 80; - server_name localhost; # or the address of the server you are running - - location / { - include proxy_params; - proxy_pass http://unix:/var/run/simdb.sock; - } -} -``` - -Alternatively, copy the script provided as `simdb/remote/simdb.nginx` (in the simdb installation directory, i.e. `/usr/local/lib/python3.7/site-packages/simdb/remote`) to: - -``` -/etc/nginx/conf.d/simdb.conf -``` - -The `proxy_pass` line should point to the endpoint of the gunicorn service (set by the `BIND` variable in the init.d script). - -**Note:** If you do not have a proxy_params file in `/etc/nginx` you can create one containing the following: - -``` -proxy_set_header Host $http_host; -proxy_set_header X-Real-IP $remote_addr; -proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; -proxy_set_header X-Forwarded-Proto $scheme; -``` - -**Note:** check that the line `include /etc/nginx/conf.d/*.conf;` is defined in your `/etc/nginx/nginx.conf` script, if not you can add it inside the `http {}` section. - -Now you can restart nginx using: - -``` -service nginx restart -``` - -You should now be able to check the simdb server is running by going to the http address defined in your nginx site (localhost:80 in the example above). - -#### Nginx Request Entity Size - -You may need to increase the size of uploaded files that Nginx will accept. For SimDB this should be at least 100MB. - -You can set this by changing the following option in your `/etc/nginx/nginx.conf` file: - -``` -client_max_body_size 100m; -``` - -### Using SSL with the Gunicorn/Nginx - -In production, you should be using HTTPS not HTTP for the SimDB server. To do this with Nginx you can change the simdb.conf in the `/etc/nginx/sites-available` that you created in the previous section. - -Change this to be: - -``` -server { - listen 443 ssl; - server_name localhost; # or the address of the server you are running - - # Use only TLS - ssl_protocols TLSv1.1 TLSv1.2; - - # Tell client which ciphers are available - ssl_prefer_server_ciphers on; - ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5; - - # Certificates - ssl_certificate /etc/pki/nginx/server.crt; - ssl_certificate_key /etc/pki/nginx/private/server.key; - - location / { - include proxy_params; - proxy_pass http://unix:/var/run/simdb.sock; - } -} - -server { - # Redirect HTTP traffic to HTTPS - if ($host = localhost) { # or the address of the server you are running - return 301 https://$host$request_uri; - } - - server_name localhost; # or the address of the server you are running - listen 80; - - return 404; -} -``` - -The `ssl_certificate` and `ssl_certificate_key` should be set to point to the SSL certificate and key that you have generated using a valid signing authority for the server. - -## Setting up PostgreSQL database - -For the production server you should be using a production DBMS. To use PostgreSQL as the DBMS you can use the following instructions. - -First, install PostgreSQL: - -```bash -sudo yum -y install postgresql-server postgresql-contrib -``` - -Next, initialise the database: - -```bash -postgresql-setup initdb -``` - -You then need to connect to the database as the `postgres` user. You can do this using: - -```bash -sudo -u postgres psql -``` - -And run the following: - -```sql -CREATE DATABASE simdb; -CREATE ROLE simdb; -ALTER DATABASE simdb OWNER TO simdb; -ALTER ROLE "simdb" WITH LOGIN; -``` - -This is assuming your webserver is running as user `simdb`. If not, you should change the role name above to match whichever user you are running the server under. diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md new file mode 100644 index 00000000..c4cfa3a5 --- /dev/null +++ b/docs/reference/configuration.md @@ -0,0 +1,117 @@ +# Client configuration + +The SimDB command line client is configured through an INI-style file named +`simdb.cfg`. This page documents where that file lives, how settings are +resolved, and which options it understands. For the server's `app.cfg`, see +[Server configuration](server-configuration.md). + +## Where the configuration lives + +SimDB reads up to two `simdb.cfg` files, plus environment variables: + +| Source | Typical location | Notes | +| --- | --- | --- | +| Site config | system config directory for `simdb` | Optional, shared by all users on the machine. | +| User config | per-user config directory for `simdb` | Created automatically on first run. | +| Environment | `SIMDB_*` variables | See [Environment variables](#environment-variables). | + +To print the exact path of your user config file: + +```bash +simdb config path +``` + +On Linux this is usually `~/.config/simdb/simdb.cfg`; on macOS it is +`~/Library/Application Support/simdb/simdb.cfg`. + +```{important} +On Linux and macOS the user config file **must** have `0600` permissions (read +and write for the owner only). SimDB refuses to start if the permissions are +wrong, because the file can contain authentication tokens. The CLI sets these +permissions for you when it writes the file; if you edit it by hand, run +`chmod 600 "$(simdb config path)"` afterwards. +``` + +## How settings are resolved + +Settings are applied in order, with later sources overriding earlier ones: + +1. Environment variables (`SIMDB_*`). +2. The site config file. +3. The user config file. + +Passing `-c/--config-file FILE` to `simdb` reads `FILE` instead of the site and +user files (environment variables still apply). + +## Managing configuration from the CLI + +It is recommended to manage the file through the CLI rather than editing it by +hand, so it always stays valid: + +```bash +simdb config list # show all options +simdb config get user.email # read one option +simdb config set user.email me@example.org +simdb config delete user.email +``` + +Option names use dotted notation. A name like `remote.iter.url` maps to the +`url` option of the `[remote "iter"]` section. Tokens are masked in +`simdb config list` output. + +Remotes have dedicated commands; prefer those over `config set` for remote +options. See [Configure remotes](../how-to/configure-remotes.md). + +## Options + +### `[user]` + +| Option | Description | +| --- | --- | +| `name` | Your username, used as the default when authenticating to remotes. | +| `email` | Your email address, used when registering as a watcher. | + +### `[remote "NAME"]` + +One section per configured remote server. Manage these with +`simdb remote config` (see [Configure remotes](../how-to/configure-remotes.md)). + +| Option | Description | +| --- | --- | +| `url` | Base URL of the remote SimDB API. Required. | +| `default` | `True` on the remote used when no remote name is given. | +| `username` | Username to authenticate with for this remote. | +| `token` | Authentication token, set by `simdb remote token new`. Masked in listings. | +| `firewall` | Firewall login type in front of the server, for example `F5`. | + +### `[db]` + +| Option | Description | +| --- | --- | +| `file` | Path to the local SQLite catalogue. Defaults to `sim.db` in the user data directory (for example `~/.local/share/simdb/sim.db`). | + +### `[development]` + +| Option | Description | +| --- | --- | +| `disable_checksum` | `True` to skip checksum calculation. For testing only; never set this for real data. | + +## Environment variables + +Any configuration option can be set with an environment variable. Take the +dotted option name, replace dots with underscores, prefix with `SIMDB_`, and +uppercase it: + +| Option | Environment variable | +| --- | --- | +| `remote.iter.url` | `SIMDB_REMOTE_ITER_URL` | +| `user.email` | `SIMDB_USER_EMAIL` | + +SimDB also recognises these special variables: + +| Variable | Effect | +| --- | --- | +| `SIMDB_CONFIG_FILE` | Path to a config file to load (same as `-c`). | +| `SIMDB_USER_CONFIG_PATH` | Override the location of the user config file. | +| `SIMDB_SITE_CONFIG_PATH` | Override the location of the site config file. | +| `SIMDB_REQUESTS_CA_BUNDLE` | Path to a CA certificate bundle for verifying HTTPS connections to remotes. See [Connect to ITER](../how-to/connect-to-iter.md). | diff --git a/docs/reference/manifest-format.md b/docs/reference/manifest-format.md new file mode 100644 index 00000000..46da7e64 --- /dev/null +++ b/docs/reference/manifest-format.md @@ -0,0 +1,122 @@ +# Manifest format + +A **manifest** is a YAML file that describes one simulation: a human-readable +alias, the input and output data it is associated with, and free-form metadata. +You ingest a manifest with `simdb simulation ingest` to add the simulation to +your local catalogue. + +For a step-by-step guide to writing one, see +[Create a manifest](../how-to/create-a-manifest.md). To generate a starter file, +run `simdb manifest create manifest.yaml`, and to check a file run +`simdb manifest check manifest.yaml`. + +## Example + +```yaml +manifest_version: 2 +alias: iter-baseline-scenario-2024 +inputs: + - uri: file:///work/sims/run42/input/parameters.txt + - uri: imas:hdf5?path=/work/imas/input_data +outputs: + - uri: file:///work/sims/run42/results/output.nc + - uri: imas:mdsplus?path=/work/imas/simulation_output +metadata: + - machine: ITER + - code: + name: JETTO + version: "2024.1" + - description: |- + Baseline H-mode scenario simulation for ITER. + 15 MA plasma current with a Q=10 target. +responsible_name: j.smith +``` + +## Top-level sections + +Only the sections listed below are allowed. Any other top-level key is a +validation error. + +| Section | Required | Description | +| --- | --- | --- | +| `manifest_version` | Yes | Manifest format version. Must be the integer `2`. If omitted, version 2 is assumed (with a warning). | +| `inputs` | Yes | List of input data objects (see [Inputs and outputs](#inputs-and-outputs)). May be empty. | +| `outputs` | Yes | List of output data objects. May be empty. | +| `alias` | No | Human-readable name for the simulation (see [Alias](#alias)). | +| `metadata` | No | List of metadata name/value pairs (see [Metadata](#metadata)). | +| `responsible_name` | No | Name of the person responsible for the simulation. | + +```{note} +Earlier manifest versions (0 and 1) used `path`/`imas`/`uuid` keys and a +`workflow` section. SimDB only ingests version 2 manifests. Rewrite any older +manifest in the version 2 form shown above. +``` + +## Inputs and outputs + +`inputs` and `outputs` are each a list of single-entry mappings with a `uri` +key: + +```yaml +inputs: + - uri: file:///absolute/path/to/file + - uri: imas:hdf5?path=/path/to/imas/data +``` + +Rules: + +- Only the `file` and `imas` URI schemes are allowed. See + [URI schemes](uri-schemes.md) for the full syntax. +- `file` paths must be absolute. Glob patterns are expanded, so + `file:///data/run42/*.nc` adds every matching file. The variables + `$MANIFEST_DIR` (the directory containing the manifest) and `~` are expanded. +- Duplicate URIs within the same section are rejected. + +When a simulation is pushed to a server, SimDB locates the files behind these +URIs, checksums them, and transfers copies to the server. + +## Alias + +The alias is an optional, human-readable identifier. Rules: + +- It must be unique within a given SimDB instance. +- It must be URL-safe: it may not contain characters that change when + percent-encoded (for example spaces, `#`, `%`, `?`, `=`, `,`, `*`, `(`, `)`). +- By convention it is descriptive, for example `iter-baseline-scenario` or a + `pulse_number/run_number` form such as `100001/1`. +- Once a simulation has been pushed to a server its alias is fixed. + +You can set the alias in the manifest, or override it at ingest time with +`simdb simulation ingest -a ALIAS manifest.yaml`. Helper commands +`simdb alias search` and `simdb alias make-unique` are documented in the +[CLI reference](cli.md). + +## Metadata + +`metadata` is a list of single name/value pairs. Values may be scalars or +nested mappings: + +```yaml +metadata: + - machine: ITER + - code: + name: SOLPS-ITER + version: "3.0.8" + - description: Free-text description of the run. + - pulse: 100001 +``` + +Rules and conventions: + +- Each list item must be a single name/value pair. +- Metadata names may not contain the characters `:`, `=`, or `#`, because these + are used by the [query syntax](query-operators.md). +- `machine`, `code`, and `description` are the conventional fields and are + expected by most servers; include them for every simulation. +- Servers can require additional metadata through a validation schema. Check a + server's requirements with `simdb remote SERVER schema` and validate before + pushing with `simdb simulation validate`. See + [Validate a simulation](../how-to/validate-a-simulation.md). + +Metadata is stored alongside the simulation and is what you search on with +[`simdb simulation query` and `simdb remote query`](query-operators.md). diff --git a/docs/reference/python-api/index.md b/docs/reference/python-api/index.md new file mode 100644 index 00000000..fdd6ab5f --- /dev/null +++ b/docs/reference/python-api/index.md @@ -0,0 +1,17 @@ +# Python API + +SimDB is primarily a command line tool, but its internals are importable as the +`simdb` Python package. The pages below are generated automatically from the +source code docstrings with `sphinx-apidoc`. + +```{note} +This API is intended for contributors and for advanced scripting. It is **not** +a stable public interface: module layout and signatures may change between +releases. For everyday use, prefer the [CLI](../cli.md). +``` + +```{toctree} +:maxdepth: 2 + +simdb +``` diff --git a/docs/reference/query-operators.md b/docs/reference/query-operators.md new file mode 100644 index 00000000..f35289d6 --- /dev/null +++ b/docs/reference/query-operators.md @@ -0,0 +1,78 @@ +# Query operators + +`simdb simulation query` (local) and `simdb remote query` (remote) find +simulations by matching against their [metadata](manifest-format.md#metadata). +This page lists the available comparison operators. For worked examples, see +[Query simulations](../how-to/query-simulations.md). + +## Constraint syntax + +Each constraint has the form: + +``` +NAME=[modifier:]VALUE +``` + +- `NAME` is the metadata field to match. Nested fields use dotted names, for + example `code.name`. +- `modifier` is one of the operators below. If omitted, equality (`eq`) is used. +- String comparisons are case-insensitive. +- Passing several constraints matches simulations that satisfy **all** of them + (logical AND). + +```bash +simdb simulation query code.name=SOLPS-ITER +simdb simulation query pulse=gt:1000 run=0 +simdb remote iter query machine=ITER +``` + +## Operators + +These operators are available for both local and remote queries: + +| Modifier | Meaning | +| --- | --- | +| `eq` | Equal to `VALUE` (the default when no modifier is given). | +| `ne` | Not equal to `VALUE`. | +| `in` | Field contains `VALUE` (substring match). | +| `ni` | Field does not contain `VALUE`. | +| `gt` | Greater than `VALUE`. | +| `ge` | Greater than or equal to `VALUE`. | +| `lt` | Less than `VALUE`. | +| `le` | Less than or equal to `VALUE`. | +| `exist` | The field exists, regardless of its value. Provide no value: `NAME=exist:`. | + +### Array operators (remote only) + +When a metadata value is an array, these operators match if **any** element +satisfies the comparison. They are available on remote queries: + +| Modifier | Meaning | +| --- | --- | +| `agt` | Any element greater than `VALUE`. | +| `age` | Any element greater than or equal to `VALUE`. | +| `alt` | Any element less than `VALUE`. | +| `ale` | Any element less than or equal to `VALUE`. | + +## Examples + +```bash +# Exact match (eq is implied) +simdb simulation query responsible_name=j.smith + +# Substring match, case-insensitive +simdb simulation query workflow.name=in:test + +# Numeric comparison combined with an exact match +simdb simulation query pulse=gt:1000 run=0 + +# Simulations that have a "sequence" metadata field at all +simdb simulation query sequence=exist: + +# Remote array query: any time slice above 5.0 +simdb remote iter query time=agt:5.0 +``` + +Use `-m/--meta-data NAME` to add extra metadata columns to the output, and +`--uuid` to include the simulation UUID. See the [CLI reference](cli.md) for all +options. diff --git a/docs/reference/rest-api.md b/docs/reference/rest-api.md new file mode 100644 index 00000000..feca53be --- /dev/null +++ b/docs/reference/rest-api.md @@ -0,0 +1,55 @@ +# REST API + +A SimDB server exposes a versioned REST API over HTTPS. The CLI talks to this +API for you, so most users never call it directly. This page is for people +integrating with the server or developing against it. + +## Versions + +The API is versioned. The current version is **v1.2**; clients require a server +that is compatible with `1.2.x`. + +| Version | Status | Notes | +| --- | --- | --- | +| v1 | Legacy | Original API. No new features. | +| v1.1 | Legacy | Incremental additions over v1. | +| v1.2 | Current | Adds a staging-directory endpoint and array query operators (`agt`, `age`, `alt`, `ale`). | + +## Interactive documentation (Swagger) + +Each version publishes interactive Swagger UI documentation that lists every +endpoint and lets you try requests: + +- Local development server: +- ITER: + +The root URL of a server returns the list of available API URLs as JSON. + +## Endpoint groups + +Within a version, endpoints are grouped by resource: + +| Group | Purpose | +| --- | --- | +| `simulations` | Create, query, retrieve, push/pull, and trace simulations. | +| `files` | Upload and download simulation data files. | +| `metadata` | Query simulation metadata. | +| `watchers` | Manage watchers on a simulation. | +| `validation_schema` | Retrieve the server's validation schema. | +| `token` | Issue authentication tokens. | + +## Authentication + +Requests are authenticated according to the server's configured method (token, +LDAP, Active Directory, or a firewall in front of the server). Token-based +authentication issues a JWT from the token endpoint, which the CLI stores per +remote. See [Authenticate](../how-to/authenticate.md) and the server's +[authentication configuration](server-configuration.md#authentication). + +## Checking a server from the CLI + +```bash +simdb remote SERVER version # server SimDB version +simdb remote SERVER test # validate connectivity and auth +simdb remote SERVER directory # storage directory (API >= 1.2) +``` diff --git a/docs/reference/server-configuration.md b/docs/reference/server-configuration.md new file mode 100644 index 00000000..f7608098 --- /dev/null +++ b/docs/reference/server-configuration.md @@ -0,0 +1,235 @@ +# Server configuration + +A SimDB server reads its settings from an INI-style file named `app.cfg` in the +application configuration directory. Find that directory with: + +```bash +dirname "$(simdb config path)" +``` + +The file must have `0600` permissions (owner read/write only), because it +contains secrets such as the admin password and Flask secret key. + +This page is the reference for every `app.cfg` option. For task-oriented setup, +see the [Operating a server](../how-to/operate-server/install-server.md) guides. + +## `[database]` + +| Option | Required | Description | +| --- | --- | --- | +| `type` | Yes | Database type: `sqlite` or `postgres`. | +| `file` | If `type=sqlite` | SQLite database file. Defaults to `remote.db` in the user data directory. | +| `host` | If `type=postgres` | Database host. | +| `port` | If `type=postgres` | Database port. | +| `name` | If `type=postgres` | Database name. | + +See [Set up PostgreSQL](../how-to/operate-server/set-up-postgresql.md). + +## `[server]` + +| Option | Required | Description | +| --- | --- | --- | +| `upload_folder` | Yes | Root directory where simulation files are stored. | +| `admin_password` | Yes | Password for the `admin` superuser. | +| `port` | No | Port the built-in server listens on. Defaults to 5000. | +| `ssl_enabled` | No | `True`/`False`: whether the built-in server uses SSL. Set `False` behind a dedicated web server. Defaults to `False`. | +| `ssl_cert_file` | If `ssl_enabled=True` | Path to the SSL certificate file. | +| `ssl_key_file` | If `ssl_enabled=True` | Path to the SSL key file. | +| `token_lifetime` | No | Days that generated tokens stay valid. Defaults to 30. | +| `imas_remote_host` | No | Host set on ingested IMAS URIs so data can be fetched via an IMAS remote access server. For example `imas:hdf5?path=foo` becomes `imas://:/uda?path=foo&backend=hdf5` on ingest. | +| `imas_remote_port` | No | Port set on ingested IMAS URIs. See `imas_remote_host`. | + +## `[flask]` + +| Option | Required | Description | +| --- | --- | --- | +| `secret_key` | Yes | Key used to sign server messages and authentication tokens. Use at least 20 characters. | +| `flask_env` | No | `development` or `production`. Defaults to `production`. | +| `debug` | No | `True`/`False`. Defaults to `True` when `flask_env=development`, otherwise `False`. | +| `testing` | No | `True`/`False`: propagate exceptions instead of handling them. Defaults to `False`. | +| `swagger_ui_doc_expansion` | No | Default Swagger UI state: `none`, `list`, or `full`. | + +## `[validation]` + +| Option | Required | Description | +| --- | --- | --- | +| `auto_validate` | No | `True`/`False`: run validation on uploaded simulations automatically. Defaults to `False`. | +| `error_on_fail` | No | `True`/`False`: reject simulations that fail validation. Requires `auto_validate=True`. Defaults to `False`. | + +## `[file_validation]` + +Options for validating the contents of simulation data files. Currently only +the `ids_validator` is available. See +[Configure validation](../how-to/operate-server/configure-validation.md). + +| Option | Required | Description | +| --- | --- | --- | +| `type` | No | Name of the file validator, for example `ids_validator`. | +| `extra_rule_dirs` | For `ids_validator` | Comma-separated directories containing extra rulesets. | +| `rulesets` | For `ids_validator` | Comma-separated ruleset names to apply. | +| `bundled_ruleset` | For `ids_validator` | `True`/`False`: load rulesets bundled with `ids_validator`. Defaults to `True`. | +| `apply_generic` | For `ids_validator` | `True`/`False`: apply generic rulesets. Defaults to `True`. | +| `rule_filter_name` | For `ids_validator` | Only apply rulesets whose names match these comma-separated values. | +| `rule_filter_ids` | For `ids_validator` | Only apply rulesets for these comma-separated IDS names. | + +## `[email]` + +Outgoing SMTP server used to send watcher notifications. + +| Option | Required | Description | +| --- | --- | --- | +| `server` | Yes | SMTP server hostname. | +| `port` | Yes | SMTP server port. | +| `user` | Yes | SMTP user to send mail from. | +| `password` | Yes | SMTP user password. | + +## `[authentication]` + +| Option | Required | Description | +| --- | --- | --- | +| `type` | Yes | Authentication method: `ActiveDirectory`, `LDAP`, or `None`. | +| `firewall_auth` | No | `True`/`False`: read authentication from firewall headers (server runs behind a firewall). | +| `firewall_user` | If `firewall_auth=True` | Name of the firewall header carrying the username. | +| `firewall_email` | If `firewall_auth=True` | Name of the firewall header carrying the user's email. | + +### Active Directory (`type = ActiveDirectory`) + +| Option | Required | Description | +| --- | --- | --- | +| `ad_server` | Yes | Active Directory server. | +| `ad_domain` | Yes | Active Directory domain. | +| `ad_cert` | No | Path to the root CA certificate. | + +### LDAP (`type = LDAP`) + +| Option | Required | Description | +| --- | --- | --- | +| `ldap_server` | Yes | LDAP server URI. | +| `ldap_bind` | Yes | Bind string. May contain `{username}`, for example `uid={username},ou=Users,dc=eufus,dc=eu`. | +| `ldap_query_base` | Yes | Search base, for example `dc=eufus,dc=eu`. | +| `ldap_query_filter` | Yes | Filter to find the user. May contain `{username}`, for example `(uid={username})`. | +| `ldap_query_user` | No | Bind user for queries. If omitted, queries run as the authenticated user. | +| `ldap_query_password` | No | Password for `ldap_query_user`. Required if `ldap_query_user` is set. | +| `ldap_query_uid` | No | Name of the user parameter in the search result. Defaults to `uid`. | +| `ldap_query_mail` | No | Name of the email parameter in the search result. Defaults to `mail`. | + +See [Configure authentication](../how-to/operate-server/configure-authentication.md). + +## `[cache]` + +| Option | Required | Description | +| --- | --- | --- | +| `type` | No | `NullCache` (default), `SimpleCache`, or `FileSystemCache`. | +| `dir` | No | Directory for `FileSystemCache`. | +| `default_timeout` | No | Default cache timeout in seconds. | +| `threshold` | No | Maximum number of items before eviction (`SimpleCache`/`FileSystemCache`). | + +More options are available; take any setting from the +[Flask-Caching documentation](https://flask-caching.readthedocs.io/en/latest/#built-in-cache-backends), +drop the `CACHE_` prefix and lowercase it, for example `CACHE_ARGS` becomes +`args`. + +## `[development]` + +| Option | Required | Description | +| --- | --- | --- | +| `disable_checksum` | No | `True`/`False`: skip integrity checks. For testing only. Defaults to `False`. | + +## `[celery]` + +Used by the optional background workers in the +[Docker Compose deployment](../how-to/operate-server/run-with-docker.md). + +| Option | Required | Description | +| --- | --- | --- | +| `broker_url` | For workers | Message broker URL, for example `redis://redis:6379/0`. | +| `result_backend` | For workers | Result backend URL, for example `redis://redis:6379/0`. | + +## `[partition]` + +| Option | Required | Description | +| --- | --- | --- | +| `data` | No | Directory used for partitioned data, for example `/data/simdb/partition`. | + +## `[role "NAME"]` + +Defines a named role. Each role section needs a `users` option. + +| Option | Required | Description | +| --- | --- | --- | +| `users` | Yes | Comma-separated list of usernames in this role. | + +Currently only the `admin` role is used: it grants access to the +`simdb remote admin` subcommands. + +```ini +[role "admin"] +users = admin,user1,user2 +``` + +## Example: SQLite server + +```ini +[flask] +flask_env = development +debug = True +testing = True +secret_key = CHANGE_ME_TO_A_LONG_RANDOM_STRING + +[server] +upload_folder = /tmp/simdb/simulations +ssl_enabled = False +admin_password = admin + +[database] +type = sqlite + +[validation] +auto_validate = True +error_on_fail = True + +[email] +server = smtp.example.org +port = 465 +user = simdb@example.org +password = CHANGE_ME + +[authentication] +type = None +``` + +## Example: PostgreSQL server + +```ini +[server] +upload_folder = /var/lib/simdb/simulations +ssl_enabled = False +admin_password = CHANGE_ME + +[flask] +secret_key = CHANGE_ME_TO_A_LONG_RANDOM_STRING + +[database] +type = postgres +host = localhost +port = 5432 +name = simdb + +[authentication] +type = None +``` + +## Validation schema + +Servers can require specific metadata through a `validation-schema.yaml` file in +the same configuration directory as `app.cfg`. It uses +[Cerberus](https://docs.python-cerberus.org/) rules: + +```yaml +description: + required: true + type: string +``` + +Clients can inspect the active schema with `simdb remote SERVER schema`. See +[Validation](../explanation/validation.md). diff --git a/docs/reference/uri-schemes.md b/docs/reference/uri-schemes.md new file mode 100644 index 00000000..a8ab27d1 --- /dev/null +++ b/docs/reference/uri-schemes.md @@ -0,0 +1,102 @@ +# URI schemes + +The `inputs` and `outputs` in a [manifest](manifest-format.md) reference data +through URIs. SimDB understands two schemes: `file` for ordinary files and +`imas` for IMAS data entries. + +## `file` scheme + +For ordinary files on the machine where you run the CLI: + +``` +file:/// +``` + +Notes: + +- The path must be absolute (note the three slashes: `file://` + `/path`). +- Glob patterns are expanded, so `file:///data/run42/*.nc` references every + matching file. +- `$MANIFEST_DIR` (the directory containing the manifest) and `~` are expanded + before the path is resolved. + +Examples: + +```text +file:///work/sims/run42/input/parameters.txt +file:///work/sims/run42/results/*.nc +``` + +## `imas` scheme + +IMAS URIs locate an IMAS data entry. They come in two forms: local and remote. + +### Local IMAS data + +For an IMAS data entry accessible from the machine running the CLI: + +``` +imas:?path= +``` + +| Part | Description | +| --- | --- | +| `backend` | Backend used to open the data, for example `hdf5` or `mdsplus`. | +| `path` | Path to the folder containing the IMAS data files. | + +Alternatively, an entry can be addressed by shot, run, and database instead of a +path: `imas:?shot=&run=&database=`. + +Examples: + +```text +imas:mdsplus?path=/work/imas/shared/imasdb/iter/3/135011/2 +imas:hdf5?path=/work/imas/shared/imasdb/ITER_SCENARIOS/3/131002/60 +``` + +When a local IMAS URI is pushed to a server, SimDB rewrites it as a remote data +URI so the data can be reached from machines other than yours. The server's +`imas_remote_host`/`imas_remote_port` settings control this rewrite (see +[Server configuration](server-configuration.md)). + +### Remote IMAS data + +For an IMAS data entry hosted on a remote data server (for example a UDA +server): + +``` +imas://:/uda?path=&backend= +``` + +| Part | Description | +| --- | --- | +| `server` | Hostname of the remote data server, for example `uda.iter.org`. | +| `port` | Port to connect to. Optional; a default is used if omitted. | +| `path` | Path to the data on the remote server. | +| `backend` | Backend used to open the data. | + +Examples: + +```text +imas://uda.iter.org:56565/uda?path=/work/imas/shared/imasdb/ITER/3/131024/51&backend=hdf5 +imas://uda.iter.org/uda?path=/work/imas/shared/imasdb/ITER/3/131024/51&backend=hdf5 +``` + +```{note} +Make sure the chosen port is reachable through your network firewall. If a +remote IMAS URI cannot be reached, contact your system administrator. See also +[Troubleshooting](../troubleshooting.md). +``` + +```{note} +SimDB uses [imas-python](https://pypi.org/project/imas-python/) to read IMAS +data. MDSplus data must have been written with Access Layer 5 (AL5) or later; +Access Layer 4 (AL4) data must be migrated first. See +[Migrate AL4 MDSplus data](../how-to/migrate-al4-mdsplus.md). +``` + +## `simdb` scheme (internal) + +SimDB also uses a `simdb:` scheme internally to reference a simulation +already registered in the catalogue. This scheme is not used in manifests; you +do not need to write it by hand. diff --git a/docs/setting_up_postgres.md b/docs/setting_up_postgres.md deleted file mode 100644 index 2475e5de..00000000 --- a/docs/setting_up_postgres.md +++ /dev/null @@ -1,26 +0,0 @@ -# Setting up a PostgreSQL database - -This section will give some guidance to setting up a PostgreSQL server for SimDB. If -PostgreSQL is already set up and running on the machine this section can be skipped -and the connection details set in the SimDB configuration file. This is not intended -to be an exhaustive guide to PostgreSQL (more details can be found on the [PostgreSQL website](https://www.postgresql.org/)). - -## Installing PostgreSQL - -PostgreSQL should be installed from an available system package. This should install -the database and create the default data directory (/var/lib/pgsql/data on CentOS -Linux). The PostgreSQL service should then be started and enabled on the system -(`system postgres start` and `system postgres enable` on Linux). - -## Connecting to PostgreSQL - -The SimDB server will need to be able to connect to the database. You can test this connection using: - -```python -import psycopg2 -psycopg2.connect("postgresql://simdb@:") -``` - -replacing `` with the PostgreSQL hostname and `` with the PostgreSQL port, e.g. `"postgresql://simdb@localhost:5432"`. - -If you have issue connecting to a localhost database then you may need to check your `pg_hba.conf` in the PostgreSQL data folder and check the connection setting is set to `trust` rather than `ident`. \ No newline at end of file diff --git a/docs/sphinx/conf.py b/docs/sphinx/conf.py deleted file mode 100644 index bb2b19f8..00000000 --- a/docs/sphinx/conf.py +++ /dev/null @@ -1,234 +0,0 @@ -# -*- coding: utf-8 -*- -# -# Configuration file for the Sphinx documentation builder. -# -# This file does only contain a selection of the most common options. For a -# full list see the documentation: -# http://www.sphinx-doc.org/en/master/config - -# -- Path setup -------------------------------------------------------------- - -# If extensions (or modules to document with autodoc) are in another directory, -# add these directories to sys.path here. If the directory is relative to the -# documentation root, use os.path.abspath to make it absolute, like shown here. -# -import os -import sys - -sys.path.insert(0, os.path.abspath("../../src")) -import simdb - - -# -- Project information ----------------------------------------------------- - -project = "IMAS Simulation Database Management Tool" -copyright = "2018-2025, ITER Organization" -author = "J. Hollocombe, D. Muir" - -# The short X.Y version -version = ".".join(simdb.version.split(".")[:2]) -project += f" Version {version}" -# The full version, including alpha/beta/rc tags -release = simdb.__version__ - - -# -- General configuration --------------------------------------------------- - -# If your documentation needs a minimal Sphinx version, state it here. -# -# needs_sphinx = '1.0' - -# Add any Sphinx extension module names here, as strings. They can be -# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom -# ones. -extensions = [ - "sphinx.ext.autodoc", - "sphinx.ext.doctest", - "sphinx.ext.coverage", - "sphinx.ext.mathjax", - "sphinx.ext.viewcode", - "myst_parser", - "sphinx_immaterial", # Sphinx immaterial theme -] -# Add any paths that contain templates here, relative to this directory. -templates_path = ["_templates"] - -# The suffix(es) of source filenames. -# You can specify multiple suffix as a list of string: -# -source_suffix = { - ".rst": "restructuredtext", - ".md": "markdown", -} - -# source_parsers = { -# '.md': 'recommonmark.parser.CommonMarkParser', -# } - -# The master toctree document. -master_doc = "index" - -# The language for content autogenerated by Sphinx. Refer to documentation -# for a list of supported languages. -# -# This is also used if you do content translation via gettext catalogs. -# Usually you set "language" from the command line for these cases. -language = 'en' - -# List of patterns, relative to source directory, that match files and -# directories to ignore when looking for source files. -# This pattern also affects html_static_path and html_extra_path. -exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] - -# The name of the Pygments (syntax highlighting) style to use. -pygments_style = None - - -# -- Options for HTML output ------------------------------------------------- - -# The theme to use for HTML and HTML Help pages. See the documentation for -# a list of builtin themes. -# -# html_theme = 'sphinx_rtd_theme' -html_theme = "sphinx_immaterial" - -# Theme options are theme-specific and customize the look and feel of a theme -# further. For a list of options available for each theme, see the -# documentation. -# -html_theme_options = { - "palette": [ - { - "media": "(prefers-color-scheme: light)", - "scheme": "default", - "primary": "blue", - "accent": "light-blue", - "toggle": { - "icon": "material/lightbulb-outline", - "name": "Switch to dark mode", - }, - }, - { - "media": "(prefers-color-scheme: dark)", - "scheme": "slate", - "primary": "blue", - "accent": "light-blue", - "toggle": { - "icon": "material/lightbulb", - "name": "Switch to light mode", - }, - }, - ], - "features": [ - "navigation.expand", - "navigation.tabs", - "navigation.sections", - "navigation.top", - "search.share", - "toc.follow", - "toc.sticky", - ], - "repo_url": "https://github.com/iterorganization/SimDB", - "repo_name": "SimDB", -} - -# Add any paths that contain custom static files (such as style sheets) here, -# relative to this directory. They are copied after the builtin static files, -# so a file named "default.css" will overwrite the builtin "default.css". -html_static_path = [] - -# Custom sidebar templates, must be a dictionary that maps document names -# to template names. -# to template names. -# -# The default sidebars (for documents that don't match any pattern) are -# defined by theme itself. Builtin themes are using these templates by -# default: ``['localtoc.html', 'relations.html', 'sourcelink.html', -# 'searchbox.html']``. -# -# html_sidebars = {} - - -# -- Options for HTMLHelp output --------------------------------------------- - -# Output file base name for HTML help builder. -htmlhelp_basename = "simdb" - - -# -- Options for LaTeX output ------------------------------------------------ - -latex_elements = { - # The paper size ('letterpaper' or 'a4paper'). - # - # 'papersize': 'letterpaper', - # The font size ('10pt', '11pt' or '12pt'). - # - # 'pointsize': '10pt', - # Additional stuff for the LaTeX preamble. - # - # 'preamble': '', - # Latex figure (float) alignment - # - # 'figure_align': 'htbp', -} - -# Grouping the document tree into LaTeX files. List of tuples -# (source start file, target name, title, -# author, documentclass [howto, manual, or own class]). -latex_documents = [ - ( - master_doc, - "simdb.tex", - "IMAS Simulation Database Documentation", - "J. Hollocombe, D. Muir", - "manual", - ), -] - - -# -- Options for manual page output ------------------------------------------ - -# One entry per manual page. List of tuples -# (source start file, name, description, authors, manual section). -man_pages = [ - (master_doc, "simdb", "IMAS Simulation Database Documentation", [author], 1) -] - - -# -- Options for Texinfo output ---------------------------------------------- - -# Grouping the document tree into Texinfo files. List of tuples -# (source start file, target name, title, author, -# dir menu entry, description, category) -texinfo_documents = [ - ( - master_doc, - "simdb", - "IMAS Simulation Database Documentation", - author, - "simdb", - "One line description of project.", - "Miscellaneous", - ), -] - - -# -- Options for Epub output ------------------------------------------------- - -# Bibliographic Dublin Core info. -epub_title = project - -# The unique identifier of the text. This can be a ISBN number -# or the project homepage. -# -# epub_identifier = '' - -# A unique identification for the text. -# -# epub_uid = '' - -# A list of files that should not be packed into the epub file. -epub_exclude_files = ["search.html"] - - -# -- Extension configuration ------------------------------------------------- diff --git a/docs/sphinx/index.rst b/docs/sphinx/index.rst deleted file mode 100644 index 4cecaeca..00000000 --- a/docs/sphinx/index.rst +++ /dev/null @@ -1,30 +0,0 @@ -IMAS Simulation Database Management Tool Version |version| -========================================== - -SimDB is the IMAS simulation database management tool designed to track, manage and validate simulations and allow for -these simulations to be sent for remote archiving and verification. - -A user guide can be found at `here `_. - -Contents --------- - -.. toctree:: - :maxdepth: 2 - :caption: User Documentation - - install_guide - tutorial - user_guide - cli - iter_remotes - .. iter_certificate - -.. toctree:: - :maxdepth: 2 - :caption: Developer Documentation - - developer_guide - maintenance_guide - simdb - design \ No newline at end of file diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md new file mode 100644 index 00000000..fc9c0b5c --- /dev/null +++ b/docs/troubleshooting.md @@ -0,0 +1,81 @@ +# Troubleshooting + +Common problems and how to resolve them. If your issue is not here, see +[Getting help](#getting-help). + +## `command not found: simdb` + +The install location's `bin` directory is not on your `PATH`. + +- Activate the virtual environment you installed into + (`source venv/bin/activate`), or +- add the pip script directory to your `PATH`. + +See [Installation](getting-started/installation.md). + +## Configuration file permission errors + +On Linux and macOS the user config file must be readable only by you. If SimDB +complains about incorrect permissions: + +```bash +chmod 600 "$(simdb config path)" +``` + +See [Client configuration](reference/configuration.md). + +## Cannot connect to a remote / SSL certificate errors + +- Check the remote URL: `simdb remote config list`, then `simdb remote test`. +- On an ITER HPC node, install the ITER CA bundle and set + `SIMDB_REQUESTS_CA_BUNDLE`. See [Connect to ITER](how-to/connect-to-iter.md). +- For remote IMAS data, make sure the data server's port is reachable through + your firewall. Contact your system administrator if not. + +## Authentication keeps prompting + +If the server supports tokens, create one so you are not asked every time: + +```bash +simdb remote token new +``` + +Servers behind an F5 firewall (including `simdb.iter.org`) do not support +tokens; you authenticate at the firewall each session. See +[Authenticate](how-to/authenticate.md). + +## Validation fails + +The two usual causes: + +- **A data source is missing or its checksum changed** since ingestion. Restore + the file or re-ingest from an updated manifest. +- **Required metadata is missing.** Check the server's requirements with + `simdb remote schema -d 10` and add the fields. + +See [Validate a simulation](how-to/validate-a-simulation.md). + +## Cannot read MDSplus (AL4) data + +SimDB reads Access Layer 5 (AL5) data or later. Migrate AL4 MDSplus data first. +See [Migrate AL4 MDSplus data](how-to/migrate-al4-mdsplus.md). + +## Server will not start: port already in use + +Another service is using port 5000. Stop it, or change the port. For the +built-in server set `[server] port` in `app.cfg`; behind Nginx/Gunicorn change +the bind address. See +[Run a development server](how-to/operate-server/run-dev-server.md). + +## PostgreSQL connection refused + +For a local database, check `pg_hba.conf` in the PostgreSQL data directory and +ensure the connection method is `trust` (or another you can authenticate with) +rather than `ident`. See +[Set up PostgreSQL](how-to/operate-server/set-up-postgresql.md). + +## Getting help + +- Open an issue on [GitHub](https://github.com/iterorganization/SimDB/issues). +- Browse the [reference](reference/cli.md) and + [explanation](explanation/concepts.md) sections. diff --git a/docs/tutorial.md b/docs/tutorial.md deleted file mode 100644 index 5de45e84..00000000 --- a/docs/tutorial.md +++ /dev/null @@ -1,242 +0,0 @@ -# SimDB CLI Tutorial - -This tuturial covers the basics of how to use the SimDB CLI to catalogue a simulation and interacting with remote -simulation databases. - -## Checking the CLI - -The first thing to do is check that the SimDB CLI is available. You can do this by running: - -```bash -simdb --version -``` - -This should return something similar to: - -``` -simdb, version x.y.z -``` - -This indicates the CLI is available and shows what version has been installed. - -## CLI help - -The SimDB CLI has internal help documentation that you can run by providing the `--help` argument. This can be done at -different levels of commands and will show the help documentation for that level of command. For example `simdb --help` -shows the top level help, whereas `simdb remote --help` shows the help for the `remote` command. - -Running: - -```bash -simdb --help -``` - -Should show the following: - -``` -Usage: simdb [OPTIONS] COMMAND [ARGS]... - -Options: - --version Show the version and exit. - -d, --debug Run in debug mode. - -v, --verbose Run with verbose output. - -c, --config-file FILENAME Config file to load. - --help Show this message and exit. - -Commands: - alias Query remote and local aliases. - config Query/update application configuration. - database Manage local simulation database. - manifest Create/check manifest file. - provenance Create the PROVENANCE_FILE from the current system. - remote Interact with the remote SimDB service. - sim Alias for None. - simulation Manage ingested simulations. -``` - -## Creating a simulation manifest - -The first step in ingesting a simulation is to create the manifest file. This is a YAML document that describes your simulation and its associated data. - -### Quick Start - -You can create a new manifest file template using the command: - -```bash -simdb manifest create manifest.yaml -``` - -This generates a basic template that you can customize for your simulation. - -### Manifest Structure Guidelines - -**Manifest Version** - -Always use the latest manifest version to ensure compatibility: -```yaml -manifest_version: 2 -``` - -**Simulation Alias** - -Provide a unique, descriptive identifier for your simulation: -```yaml -alias: iter-baseline-scenario-2024 -``` - -Best Practices: -- Use descriptive names that indicate the simulation purpose -- Consider using a naming convention like `machine-scenario-date` -- Common patterns include: `pulse_number/run_number` (e.g., `100001/1`) -- Ensure uniqueness within your SimDB instance - -**Input and Output Files** - -Specify all data files associated with your simulation: - -```yaml -inputs: - - uri: file:///path/to/input/parameters.txt - - uri: imas:hdf5?path=/work/imas/input_data - -outputs: - - uri: file:///path/to/results/output.nc - - uri: imas:mdsplus?path=/work/imas/simulation_output -``` - -Guidelines: -- Use absolute paths for `file://` URIs -- For IMAS data, specify the correct backend (`hdf5` or `mdsplus`) -- Include all relevant input files (initial conditions, parameters, configuration) -- List all output files (results, diagnostics, visualizations) - -**Metadata Section** - -The metadata section contains descriptive information about your simulation: - -```yaml -metadata: - - machine: ITER - - code: - name: JETTO - version: "2024.1" - - description: |- - Baseline H-mode scenario simulation for ITER - 15MA plasma current with Q=10 target - - ids_properties: - creation_date: "2024-12-05 10:30:00" -``` - -Metadata Best Practices: -- **machine**: Always specify the tokamak or device name -- **code**: Include both name and version for reproducibility -- **description**: Provide context about the simulation purpose and key features -- **ids_properties**: Include creation date if not available in IDS data - -### Validating a Manifest File - -Before ingesting your manifest, it's important to validate it to ensure it's well-formed. SimDB provides a validation command: - -```bash -simdb manifest check manifest.yaml -``` - -This command will check your manifest file for: -- Correct YAML syntax -- Required fields (manifest_version, outputs, metadata etc.) -- Valid URI formats for inputs and outputs -- Proper metadata structure -- Alias naming rules compliance - -## Ingesting the manifest - -Now that you have a manifest file you can ingest it using the following command: - -```bash -simdb simulation ingest manifest.yaml -``` - -This will ingest the simulation into your local simulation database. You can see what has been ingested using: - -```bash -simdb simulation list -``` - -And the simulation you have just ingested with: - -```bash -simdb simulation info -``` - -## Pushing the simulation to a remote server - -The SimDB client is able to communication with multiple remote servers. You can see which remote servers are available -on your local client using: - -```bash -simdb remote config list -``` - -First, you will need to add the remote server and set it as default (name and url may differ): - -```bash -simdb remote --new iter https://simdb.iter.org/scenarios/api -simdb remote --set-default iter -``` - -You can test that the remote server is valid and also list the simulations available on it: - -```bash -simdb remote test -simdb remote list -``` - -You can now check that your simulation is valid for a given remote server, as different servers may have different -rules and required fields: - -```bash -simdb simulation validate -``` - -Typical validation issues are: -- one of the data sources (input or output) being absent or not verifying the checksum (i.e. something changed since the ingestion); -- failing to comply with the list of mandatory metadata for the targeted remote. - -It it possible to know which validation schema applies on a given remote: - -```bash -simdb remote schema -d 10 -``` - -If the `validate` command results in a `validation successful` message, then you can push your simulation: - -```bash -simdb simulation push -``` - -If the simulation is expected to replace another one already present in the remote server: - -```bash -simdb simulation push --replaces -``` - -The previous simulation will be marked as deprecated and contain a new `replaced_by` metadata that points to -``. It's also possible to see the chained history of older versions if they exist: - -```bash -simdb remote trace -``` - - -## Authentication - -Whenever you run a remote command you will notice that you have to authenticate against the remote server. This can be -avoided by creating an authentication token for servers that allow such a method (not applicable for simdb.iter.org -which uses F5 firewall as authentication layer): - -```bash -simdb remote token new -``` - -This will request a token from the remote server which is stored in a locally to allow you to authenticate against the -server without having to provide credentials on each command. diff --git a/docs/tutorials/first-simulation.md b/docs/tutorials/first-simulation.md new file mode 100644 index 00000000..42e38ca8 --- /dev/null +++ b/docs/tutorials/first-simulation.md @@ -0,0 +1,120 @@ +# Tutorial: catalogue your first simulation + +This tutorial walks you through cataloguing a simulation in your local SimDB, +from checking the CLI to inspecting the ingested result. It assumes SimDB is +[installed](../getting-started/installation.md). When you are done, continue with +[Push your simulation to a server](push-to-remote.md). + +## Step 1: check the CLI + +Confirm SimDB is available: + +```bash +simdb --version +``` + +You should see something like `simdb, version 0.15.2`. Every command has help +available with `--help`, at any level: + +```bash +simdb --help +simdb simulation --help +``` + +The top-level help lists the command groups: + +```text +Commands: + alias Query remote and local aliases. + config Query/update application configuration. + manifest Create/check manifest file. + provenance Create the PROVENANCE_FILE from the current system. + remote Interact with the remote SimDB service. + simulation Manage ingested simulations. +``` + +`sim` is an alias for `simulation`. + +## Step 2: create a manifest + +A simulation is described by a [manifest](../reference/manifest-format.md): a +YAML file listing the data the simulation used and produced, plus metadata about +it. Generate a starter template: + +```bash +simdb manifest create manifest.yaml +``` + +Open `manifest.yaml` and fill it in. A complete example: + +```yaml +manifest_version: 2 +alias: iter-baseline-scenario-2024 +inputs: + - uri: file:///work/sims/run42/input/parameters.txt + - uri: imas:hdf5?path=/work/imas/input_data +outputs: + - uri: file:///work/sims/run42/results/output.nc + - uri: imas:mdsplus?path=/work/imas/simulation_output +metadata: + - machine: ITER + - code: + name: JETTO + version: "2024.1" + - description: |- + Baseline H-mode scenario simulation for ITER. + 15 MA plasma current with a Q=10 target. +``` + +A few things to know (the [how-to](../how-to/create-a-manifest.md) and +[reference](../reference/manifest-format.md) cover them in full): + +- Always use `manifest_version: 2`. +- The `alias` is optional but recommended; it must be unique and URL-safe. +- `inputs` and `outputs` use `file` and `imas` + [URIs](../reference/uri-schemes.md). `file` paths must be absolute; glob + patterns are expanded. +- `machine`, `code`, and `description` are the conventional metadata fields. + +## Step 3: validate the manifest + +Before ingesting, check the file is well-formed: + +```bash +simdb manifest check manifest.yaml +``` + +This checks the YAML syntax, the required sections, the URI formats, the +metadata structure, and the alias rules. Fix any reported problems. + +## Step 4: ingest the simulation + +```bash +simdb simulation ingest manifest.yaml +``` + +This adds the simulation to your local catalogue, computing a checksum for each +referenced file. To override the manifest's alias at ingest time: + +```bash +simdb simulation ingest -a my-alias manifest.yaml +``` + +## Step 5: inspect what you ingested + +List your local simulations: + +```bash +simdb simulation list +``` + +And show the full detail of one, by alias or UUID: + +```bash +simdb simulation info iter-baseline-scenario-2024 +``` + +## What you have learned + +You created a manifest, validated it, ingested a simulation, and inspected it, +all locally. Next, share it: [Push your simulation to a server](push-to-remote.md). diff --git a/docs/tutorials/push-to-remote.md b/docs/tutorials/push-to-remote.md new file mode 100644 index 00000000..93ffa641 --- /dev/null +++ b/docs/tutorials/push-to-remote.md @@ -0,0 +1,96 @@ +# Tutorial: push your simulation to a server + +This tutorial continues from +[Catalogue your first simulation](first-simulation.md). You have a simulation in +your local catalogue; now you will validate it against a server and push it so +others can find it. + +```{note} +ITER users should follow [Connect to ITER](../how-to/connect-to-iter.md) for the +server URL, firewall, and certificate setup, then return here from Step 2. +``` + +## Step 1: add a remote + +A *remote* is a configured SimDB server. List the remotes you already have: + +```bash +simdb remote config list +``` + +Add a server and make it your default so you do not have to name it every time: + +```bash +simdb remote config new myserver https://example.org/simdb/api +simdb remote config set-default myserver +``` + +## Step 2: test the connection + +Check the remote is reachable and list what is already on it: + +```bash +simdb remote test +simdb remote list +``` + +You will be asked to authenticate. To avoid re-entering credentials on every +command, see [Authenticate](../how-to/authenticate.md). + +## Step 3: validate before pushing + +Different servers can require different metadata, so validate your simulation +against the target server first: + +```bash +simdb simulation validate iter-baseline-scenario-2024 +``` + +If validation fails, the two usual causes are: + +- a data file is missing or its checksum no longer matches (something changed + since ingestion); or +- the simulation is missing metadata the server requires. + +You can see what a server requires with: + +```bash +simdb remote schema -d 10 +``` + +Fix any issues (see [Validate a simulation](../how-to/validate-a-simulation.md)) +until you get a `validation successful` message. + +## Step 4: push + +```bash +simdb simulation push iter-baseline-scenario-2024 +``` + +This uploads the metadata and copies all referenced input and output data to the +server. For `file` URIs the files are transferred directly; for `imas` URIs +SimDB discovers the underlying files from the backend. + +### Replacing an earlier simulation + +If this run supersedes one already on the server: + +```bash +simdb simulation push iter-baseline-scenario-2024 --replaces OLD_SIM_ID +``` + +The previous simulation is marked `deprecated` and gains a `replaced_by` +reference pointing to the new one. Follow the chain of revisions with: + +```bash +simdb remote trace iter-baseline-scenario-2024 +``` + +## What you have learned + +You configured a remote, validated a simulation against it, and pushed it. +From here: + +- [Query simulations](../how-to/query-simulations.md) on the server. +- [Pull simulations](../how-to/push-pull.md) back to your machine. +- [Use the dashboard](../how-to/use-the-dashboard.md) to browse in a browser. diff --git a/docs/user_guide.md b/docs/user_guide.md deleted file mode 100644 index 6b4a8ebd..00000000 --- a/docs/user_guide.md +++ /dev/null @@ -1,304 +0,0 @@ -# SimDB user guide - -This page covers the core functionality of the SimDB command line, and some common use cases. - -Further details on the command line interface can be found [here](cli.md). - - -## Backwards Incompatible Changes - -This section describes backwards incompatible changes to the SimDB CLI -- `simdb database clear` removed: Use `simdb sim delete --all` instead to clear all local simulations. - - -## Basic usage - -SimDB is a command line interface (CLI) that can be used to store metadata about simulation runs and their associated data. These simulations are stored locally for the user until they are pushed to a remote SimDB server where they can then be queried by any user. - -To run the SimDB CLI you can use the following: - -```bash -simdb --version -``` - -This will print out the version of SimDB available. - -All of the SimDB commands have help available via the CLI by using the `--help` argument, i.e. - -```bash -simdb --help -``` - -Will print the top-level help, whereas - -```bash -simdb simulation --help -``` - -Will print the help available for the `simulation` command. - -## Key Concepts - -Before diving into SimDB functionality, it's important to understand these key terms: - -**Local vs Remote Simulations:** -- **Local simulation**: Simulation metadata and data stored in your personal SimDB database on your machine. Only you can access it. -- **Remote simulation**: Simulation metadata and data stored on a SimDB server, accessible to authorized users across the organization. - -**Local vs Remote IMAS Data:** -- **Local IMAS data**: IMAS datasets accessible directly from the file system where you're running the SimDB CLI. -- **Remote IMAS data**: IMAS datasets hosted on a remote data server and accessed via network protocols. - -**Workflow**: Typically, you create and manage simulations locally, then push them to a remote SimDB server for sharing. The data referenced by your simulation can be either local (on your machine) or remote (on a data server). - -**IMAS Access Layer compatibility**: -SimDB uses `imas-python` to read IMAS data. [imas-python](https://pypi.org/project/imas-python/) requires that MDSplus data files to be ingested were written with Access Layer 5 (AL5) or later, and does not support reading MDSplus files written with Access Layer 4 (AL4). If your IMAS data was written in MDSplus using AL4 (e.g., MDSplus-based AL4 databases), you must first convert it to AL5 format before use. See [AL4 MDSplus data migration](user_guide.md#al4-mdsplus-data-migration) below. - -## Local simulation management - -In order to ingest a local simulation you need a manifest file. This is a `yaml` file which contains details about the simulation and what data is associated with it. See the [Tutorial - Creating a simulation manifest](tutorial.md#creating-a-simulation-manifest) for detailed guidelines on how to create a well-formed manifest. - -An example manifest file is: - -```yaml -manifest_version: 2 -alias: simulation-alias -inputs: -- uri: file:///my/input/file -- uri: imas:hdf5?path=/path/to/imas/data -outputs: -- uri: file:///my/output/file -- uri: imas:hdf5?path=/path/to/more/data -metadata: -- machine: name of machine i.e. ITER. -- code: - name: code name i.e. ASTRA, JETTO, DINA, CORSICA, METIS, SOLPS, JINTRAC etc. - version: code version -- description: | - Sample plasma physics simulation for ITER tokamak modeling -- ids_properties: - creation_date: 'YYYY-MM-DD HH:mm:ss' -``` - -| Key | Description | -| --- | --- | -| manifest_version | The version of the manifest file format. Always use the latest version (currently 2) for new manifest files. This ensures compatibility and access to the latest features. | -| alias | An optional unique identifier for the simulation. If not provided here, you can specify it via the CLI during ingestion. Must follow alias naming rules (see below). | -| inputs/outputs | Lists of simulation input and output files. Supported URI schemes:
• file - Standard file system paths
• imas - IMAS entry URIs (see IMAS URI schema below) | -| metadata | Contains simulation metadata and properties. The metadata section associates information with the summary IDS data:
• summary - A hierarchical dictionary structure containing key-value pairs that provide summary information extracted from IDS datasets. This includes condensed representations of simulation results, computed quantities, free descriptions, any references, and creation dates if not available in summary IDS. - -## Alias Naming Rules -
  • Must be unique within the SimDB
-
  • Cannot start with a digit (0-9) or forward slash (/)
-
  • Cannot end with a forward slash (/)
-
  • Should be descriptive and meaningful for easy identification
- -Examples of valid aliases: -
  • iter-baseline-scenario
-
  • 100001/1 (pulse_number/run_number)
- -## IMAS URI schema - -IMAS URIs specified in the manifest can either be in the form of remote data URIs or local data URIs. - -The IMAS local data URI is used to locate an IMAS data entry accessible from the machine where the client -is being run. The URI schema looks like: - -``` -imas:?path= -``` - -Where: - -| Argument | Description | -|----------|-----------------------------------------------------------| -| Backend | The backend to use to open the files on the remote server | -| Path | The path to the folder containing the IMAS data files | - -Some examples of local URIs are: - -```text -imas:mdsplus?path=/work/imas/shared/imasdb/iter/3/135011/2 -imas:hdf5?path=/work/imas/shared/imasdb/ITER_SCENARIOS/3/131002/60 -``` - -When a local IMAS URI is pushed to the server, it is automatically transformed into a remote data URI -to enable access from machines that are remote from the server. - -The IMAS remote data URI is used to locate a remote IMAS data entry. The IMAS URI schema for remote data looks like: -``` -imas://:/uda?path=&backend= -``` - -Where: - -| Argument | Description | -|----------|-----------------------------------------------------------| -| Server | The name of the remote data server i.e. uda.iter.org | -| Port | The port to connect to on the remote data server | -| Path | The path to the data files on the remote server | -| Backend | The backend to use to open the files on the remote server | - -Example URIs: - -With explicit port: -`imas://uda.iter.org:56565/uda?path=/work/imas/shared/imasdb/ITER/3/131024/51&backend=hdf5` - -Without port (uses default): -`imas://uda.iter.org/uda?path=/work/imas/shared/imasdb/ITER/3/131024/51&backend=hdf5` - -**Note:** Ensure that the specified port is accessible through your network firewall. Contact your system administrator if you experience connectivity issues. - -## AL4 MDSplus data migration -SimDB uses [imas-python](https://pypi.org/project/imas-python/) to read IMAS data. `imas-python` requires that MDSplus data to be read was written with Access Layer 5 (AL5) or later and **does not support the older Access Layer 4 (AL4).** - -If you have existing IMAS data stored in an AL4 MDSplus, you must migrate it to the AL5 directory layout before referencing it in a SimDB manifest. This can be done using the `mdsplusIMASDB4to5` tool provided by IMAS-Core, which creates the new AL5 directory layout with links to the original data files (the original data is not removed). -```mdsplusIMASDB4to5 [-h] [--dry-run] [-p PATH] [-d DATABASE] [-f]``` - -| Options | Description | -|--------------------------------------|------------------------------------------------------------| -| `--dry-run` | Print actions but do not perform them | -| `-p PATH`, `--path PATH` | Specify path where imasdb to map (by default $HOME/public) | -| `-d DATABASE`, `--database DATABASE` | Specify a database to be map (by default all) | -| `-f`, `--force` | Force the creation of symlink even if the file exists | - -Once the migration is complete, reference the new AL5 path in your manifest using the mdsplus backend: -``` -outputs: -- uri: imas:mdsplus?path= -``` -For further details on the `mdsplusIMASDB4to5` tool, refer to the IMAS-Core documentation. - -## Remote SimDB servers - -### Configuration file - -SimDB stores remote server configuration in ~/.config/simdb/simdb.cfg. This file is automatically created on the first run of the SimDB CLI and is pre-populated with a connection entry for the default ITER SimDB server: -``` -[remote "iter"] -url = https://simdb.iter.org/scenarios/api -default = True -username = $USER -firewall = F5 -``` - -You can inspect the current remotes at any time with: - -```bash -simdb remote config list -``` -You can also view or edit `simdb.cfg` directly, but it is recommended to use the `simdb remote config` CLI commands to manage remotes to ensure the file stays correctly formatted. - -**ITER users:** See [Connecting to the ITER remotes](https://simdb.readthedocs.io/en/latest/iter_remotes.html) for a step-by-step guide to setting-up and testing the ITER remote connection. - -The SimDB CLI is able to interact with remote SimDB servers to push local simulations or to query existing simulations. This is done via the simdb remote command: - -```bash -simdb remote --help -``` - -Configuring of SimDB remotes is done via the `config` subcommand: - -```bash -simdb remote config --help -``` - -To see which remotes are available you can use the following: - -```bash -simdb remote config list -``` - -To add a new remote you can use: - -```bash -simdb remote config new -``` - -i.e. - -```bash -simdb remote config new ITER https://simdb.iter.org/scenarios/api/ -``` - -In order to not have to specify the remote name when using any of the SimDB CLI remote subcommands you can set a remote to be default. The default remote will be used whenever the remote name is not explicitly passed to a remote subcommand. Setting a default remote can be done using: - -```bash -simdb remote config set-default -``` - -### Authentication - -In order to interact with SimDB remote servers you must be authenticated against that server. By default, this is done using username/password which will need to be entered whenever your session times out or expires. - -If your server supports token-based authentication, you can generate an authentication token using username/password, which is then stored against that remote to reduce the number of times you have to manually enter your authentication details. While that token is valid (token lifetimes are determined on a per-server basis) you can run remote commands against that server without having to provide authentication details. - -In order to generate a remote authentication token you need to run: - -```bash -simdb remote token new -``` - -Running this command will require you to authenticate against the server as normal but once it has run it will store an authentication token against the remote so that you will not need to enter authentication credentials when running other remote commands. - -You can delete a stored token by running: - -```bash -simdb remote token delete -``` - -**Note:** All the commands in this section assume there is a default remote that has been set (see above) so omit the remote name in the command. If no default has been set then the remote name needs to be inserted into the command, i.e. `simdb remote token new`. - -## Pushing simulations to a remote - -Once you have ingested your simulation locally and are happy with the metadata that has been stored alongside it, you may choose to push this simulation to a remote SimDB server to make it publicly available. You do this by: - -```bash -simdb simulation push -``` - -This will upload all the metadata associated with your simulation to the remote server as well as taking copies of all input and output data specified. For non-IMAS data the `file` URIs will be used to locate the files to transfer, whereas for `imas` URIs SimDB will discover which files need to be transferred based on the IMAS backend specified in the URI. The files are copied to the server using an HTTP data transfer. - -## Pulling simulations from a remote - -The mirror to pushing simulations is the `pull` command. This command will pull the simulation metadata from the SimDB remote to your local SimDB database and download the simulation data into a directory of your choosing. Once you have pulled a simulation it will appear in any local SimDB queries you perform. The command looks as follows: - -```bash -simdb simulation pull [REMOTE] -``` - -The `REMOTE` argument is optional and if omitted will use your specified default remote. The `SIM_ID` is the alias or uuid of the simulation on the remote you wish to pull, and the `DIRECTORY` argument specifies the location you wish to download the data to. - -## Querying remotes - -You can query all the simulations available from a remote SimDB server using: - -```bash -simdb remote list -``` - -and you can see all the stored metadata against a remote simulation using: - -```bash -simdb remote info -``` - -## Accessing Simulation Metadata via the SimDB Dashboard - -You can view a simulation's metadata directly in the SimDB dashboard using its UUID. - -Format: -``` -https:///dashboard/uuid/ -``` - -Example (server: https://simdb.iter.org, UUID: `abcdef12345678901234567890abcdef`): -``` -https://simdb.iter.org/dashboard/uuid/abcdef12345678901234567890abcdef -``` - -Alternatively, you can search for simulations in the SimDB dashboard by entering the UUID or alias in the **Alias/UUID** search field on the dashboard interface. - -Notes: -- Use the full 32-character UUID (no dashes) if that is how it is stored. -- If your deployment uses a different base path, adjust `` accordingly. diff --git a/pyproject.toml b/pyproject.toml index 17816432..f8f56e2e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -85,9 +85,7 @@ build-docs = [ "sphinx-immaterial>=0.11.14", "sphinx-autodoc-typehints>=1.12.0", "myst-parser>=0.18.0", - "nbsphinx>=0.8.0", "docutils>=0.17", - "recommonmark>=0.7.0", ] postgres = [ "psycopg2-binary>=2.8.0",