From a8400b5be9facb71b078b8f21b609071c635c4f4 Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Wed, 11 Mar 2026 13:48:51 +0000 Subject: [PATCH 01/15] Add JASMIN link --- docs/source/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/index.rst b/docs/source/index.rst index 65fee66..3af98ae 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -36,3 +36,4 @@ Other options include specifying the particular checks to run or to compare with :hidden: GitHub + checksit on JASMIN From 5a9557782dc1a4c637b3f583cc9a0994573b7dca Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Wed, 11 Mar 2026 13:53:13 +0000 Subject: [PATCH 02/15] Add how to create venv --- docs/source/install.rst | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/docs/source/install.rst b/docs/source/install.rst index a8277a5..008f377 100644 --- a/docs/source/install.rst +++ b/docs/source/install.rst @@ -5,15 +5,22 @@ Source ------ It is recommended to create a fresh Python virtual environment for installing -``checksit``, which can be installed directly from GitHub: +``checksit``: -.. code-block:: +.. code-block:: bash + + python -m venv checksit-venv + source checksit-venv/bin/activate + +Then, ``checksit`` can be installed from the source code on GitHub: + +.. code-block:: bash pip install git+https://github.com/cedadev/checksit.git or by cloning the repository and installing that: -.. code-block:: +.. code-block:: bash git clone https://github.com/cedadev/checksit.git cd checksit From 3ec399b3d94ef63689844b59987d199fa2966aff Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Wed, 11 Mar 2026 13:56:19 +0000 Subject: [PATCH 03/15] Add jasmin admonition --- docs/source/index.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/index.rst b/docs/source/index.rst index 3af98ae..bad776a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -9,6 +9,9 @@ On a basic level a user can point the checksit tool at a given file and it will some basic checks based on some matches that it will try to perform. Other options include specifying the particular checks to run or to compare with known 'good' files. +.. note:: + For a quick overview on how to run checksit on JASMIN, follow the link in the side bar. + .. toctree:: :hidden: :maxdepth: 1 From 51883c0254e3eb996f89a743c1e6dd6aec84c9b7 Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Wed, 11 Mar 2026 15:28:54 +0000 Subject: [PATCH 04/15] Rewrite usage --- docs/source/usage.rst | 95 ++++++++++++++++++++++++++----------------- 1 file changed, 58 insertions(+), 37 deletions(-) diff --git a/docs/source/usage.rst b/docs/source/usage.rst index 0d7cee0..519417c 100644 --- a/docs/source/usage.rst +++ b/docs/source/usage.rst @@ -1,70 +1,91 @@ Usage ===== -Simplest check --------------- +Basic use +--------- -First, ``cd`` into the ``checksit`` repository and activate the environment ``checksit`` was -installed into. **As default, checksit needs to be run from the top level of the checksit -repository**. For installations that followed the directions on the installation page, that -will look like +Assuming that ``checksit`` was installed following the guide on the +`installation page `_, the command ``checksit`` should be available in +the terminal if the virtual environment is active. To check a file, the command +``check`` can be used, followed by the path to the file to check, for example: -.. code-block:: +.. code-block:: bash - cd ~/checksit - source venv/bin/activate + checksit check my-file.ext -Then ``checksit`` can be run using the following, as an example: +``checksit`` will then look at the file, attempt to work out what template or specs to +check the file with, and then print out the results of the checks. -.. code-block:: - checksit check /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc +How does ``checksit`` decide what checks to use? +------------------------------------------------ -``checksit`` will then look at the file given and attempt to find either a template file to -compare against or a series of specs to match with, and then print out the results of the checks. +``checksit`` uses templates and specs to perform checks on files. A template is a file +with a similar structure to the file being checked, and specs are files that define +rules and vocabularies that the contents of the file being checked need to meet. -Multiple Files --------------- -If you want to check multiple files, you can do so by using the ``check-files`` command and list -all the files to check, for example: +When checking a file, if no template or spec files are given in the ``check`` command, +``checksit`` will attempt to find the most suitable checks to use. +It does that with the following steps: +1. ``checksit`` looks to see if there are any file-specific checks that have been + defined for that particular file. These include the checks for NCAS data standards. + For more information on how these file-specific checks are determined, see the + `file-specifics page `_ +2. If no file-specific checks are found, and template checks have not been turned off, + ``checksit`` will look for a template file that matches the file being checked. + - It finds a template file by first searching if the file to be checked matches + against any known rules in the checksit config file with a defined template file to + use (e.g. UKCP09 data). + - If that doesn't produce a template file ``checksit`` searches the template cache, + searching for a file with a similar name. + - If a template file still isn't found, ``checksit`` uses a default template, called + `ceda-base.yml`. -.. code-block:: - checksit check-files /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20681201-20691130.nc +Manually specifying templates and specs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A specific template can be chosen for ``checksit`` to use by specifying the file with +the ``-t/--template`` flag when running the check command: +.. code-block:: bash -Specify Template ----------------- + checksit check --template=template-cache/rls_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.cdl /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc -A specific template can be chosen for ``checksit`` to use using the ``-t/--template`` flag +A spec file, or number of spec files, can also be given to ``checksit`` to compare the file against, +using the ``-s/--specs`` flag. These files, in YAML format, point to functions and define parameters +that will be used to check the file with .. code-block:: - checksit check --template=template-cache/rls_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.cdl /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc + checksit check --specs=ceda-base /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc -If the file being checked is a file which you might want to check other files against, a template -can be created when checking this file by using the ``--auto-cache`` flag, e.g. +.. note:: + Multiple spec files should be separated by commas, with no spaces, e.g. + ``--specs=ceda-base,ceda-ukcp18`` -.. code-block:: +Even with specs defined, ``checksit`` will still attempt to find a template, or use a +given one, to check the file with. To only use specs, the template option must be +switched off by specifying ``-t off``. - checksit check --auto-cache /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc -Specify Specs -------------- +Multiple Files +-------------- -A spec file, or number of spec files, can also be given to ``checksit`` to compare the file against, -using the ``-s/--specs`` flag. These files, in YAML format, point to functions and define parameters -that will be used to check the file with +If you want to check multiple files, you can do so by using the ``check-files`` command and list +all the files to check, for example: .. code-block:: - checksit check --specs=ceda-base /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc + checksit check-files /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20681201-20691130.nc -``checksit`` will still attempt to find a template, or use a given one, to check the file with. To -only use specs, the template option can be switched off by specifying ``-t off``. +``checksit`` will check all files individually, meaning it could use different specs and/or template checks for each file, unless the template and/or specs are specifically given using the ``-t/--template`` and ``-s/--specs`` flags. Brief other flags ----------------- -Coming soon... +Some other options that can be given to the ``check`` and ``check-files`` commands include: +- ``-l/--log-mode``: whether ``checksit`` should output in "standard" (default) or "compact" mode. +- ``-w/--ignore-warnings``: if flag is given, warnings from file checks will not be printed in the output. +- ``-p/--skip-spellcheck``: if flag is given, spellcheck that attempts to find close matches to any failed checks will be skipped. From 96689ad9327515aafcedaec9ebfdb1c8f079deec Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Wed, 11 Mar 2026 15:30:12 +0000 Subject: [PATCH 05/15] Link to understanding output --- docs/source/usage.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/source/usage.rst b/docs/source/usage.rst index 519417c..5c5626b 100644 --- a/docs/source/usage.rst +++ b/docs/source/usage.rst @@ -14,7 +14,9 @@ the terminal if the virtual environment is active. To check a file, the command checksit check my-file.ext ``checksit`` will then look at the file, attempt to work out what template or specs to -check the file with, and then print out the results of the checks. +check the file with, and then print out the results of the checks. See the +`understanding ouput page `_ for more information on how to +read and interpret the output from ``checksit``.. How does ``checksit`` decide what checks to use? From 5bba790dd1bc4f641022b77b34bdc35da4443557 Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Wed, 11 Mar 2026 16:01:51 +0000 Subject: [PATCH 06/15] Change section type --- docs/source/usage.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/usage.rst b/docs/source/usage.rst index 5c5626b..1097e36 100644 --- a/docs/source/usage.rst +++ b/docs/source/usage.rst @@ -45,7 +45,7 @@ It does that with the following steps: Manually specifying templates and specs -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +--------------------------------------- A specific template can be chosen for ``checksit`` to use by specifying the file with the ``-t/--template`` flag when running the check command: From d143467f1ef07c2a0df51c44ceaaf09ff79ed759 Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Thu, 12 Mar 2026 10:10:08 +0000 Subject: [PATCH 07/15] Add subsection on where to find specs --- docs/source/usage.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/source/usage.rst b/docs/source/usage.rst index 1097e36..e70e6f3 100644 --- a/docs/source/usage.rst +++ b/docs/source/usage.rst @@ -71,6 +71,19 @@ given one, to check the file with. To only use specs, the template option must b switched off by specifying ``-t off``. +Where can I find specs to use? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Specs are provided with the ``checksit`` python package. Available specs can be seen in +the `GitHub repository `_ for +``checksit``. + +.. note:: + When specifying specs, the path to the spec file should be given relative to the + ``specs/groups`` folder, and without the file extension, e.g. ``--specs=ceda-base`` + or ``--specs=ncas-amof-2.0.0/amof-global-attrs``. + + Multiple Files -------------- From 000477654fdbf60b0a59021e0d3e6e7580773c46 Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Thu, 12 Mar 2026 10:56:52 +0000 Subject: [PATCH 08/15] Rewrite to include how checksit identifies files --- docs/source/specifics.rst | 76 ++++++++++++++++++++++++++------------- 1 file changed, 52 insertions(+), 24 deletions(-) diff --git a/docs/source/specifics.rst b/docs/source/specifics.rst index f4d610d..c08fb1e 100644 --- a/docs/source/specifics.rst +++ b/docs/source/specifics.rst @@ -3,48 +3,56 @@ File specific actions ``checksit`` has some specific actions depending on the file given. -NCAS-GENERAL ------------- +NCAS Data +--------- -Files that are designed to the NCAS-GENERAL standard are recognised by ``checksit``\ , and specs -referring to the correct version of the standard are automatically searched for and used by -``checksit``\ , with specs to include checking file name format, global attributes, dimensions -and variables for the used deployment mode and data product. For example, for a file with data -from an automatic weather station (\ ``ncas-aws-10``\ ) using version 2.0.0 of the standard, +If the ``checksit check`` command is given a file and no template or specs are +specified, then ``checksit`` will try to identify if the file is meant to comply with +one of the NCAS standards (NCAS-General, NCAS-Radar or NCAS-Image). ``checksit`` will +designate a file as an "NCAS standard" file if one of the following conditions is met: -.. code-block:: - - checksit check ncas-aws-10_iao_20231117_surface-met_v1.0.nc +* The file contains the global attribute "Conventions" and the value of this attribute + contains "NCAS-" (case insensitive match). +* The file contains the "XMP-photoshop:Instructions" metadata tag and the value of this + tag contains "National Centre for Atmospheric Science" (case insensitive match). +* The name of the file starts with "ncas-" (case sensitive match). -is the same as +If any of these conditions match, then ``checksit`` will try to identify which NCAS +standard the file is meant to comply with. -.. code-block:: - checksit check -t off -s ncas-amof-2.0.0/amof-file-name,ncas-amof-2.0.0/amof-common-land,ncas-amof-2.0.0/amof-surface-met,ncas-amof-2.0.0/amof-global-attrs ncas-aws-10_iao_20231117_surface-met_v1.0.nc +NCAS-General +^^^^^^^^^^^^ -NCAS-IMAGE ----------- +If the name of the file ends with `.nc`, and the file contains the global attribute +"Conventions" with a value that contains one of "NCAS-General", "NCAS-AMOF", or +"NCAS-AMF" (case insensitive match), then the file is designated as an NCAS-General +file. ``checksit`` then determines which specs are needed to perform the correct +checks, including checking file name format, global attributes, dimensions, and +variables used for the deployment mode and data product. -The NCAS-IMAGE standard is also identified by ``checksit``\ , and the appropriate specs can be -found to check both global tags and photo or plot specific tags, i.e. +For example, for a file with data from an automatic weather station +(\ ``ncas-aws-10``\ ) using version 2.0.0 of the standard, .. code-block:: - checksit check ncas-cam-9_cao_20231117_photo_v1.0.nc + checksit check ncas-aws-10_iao_20231117_surface-met_v1.0.nc is the same as .. code-block:: - checksit check -t off -s ncas-image-1.0.0/amof-image-global-attrs,ncas-image-1.0.0/amof-photo ncas-cam-9_cao_20231117_photo_v1.0.nc + checksit check -t off -s ncas-amof-2.0.0/amof-file-name,ncas-amof-2.0.0/amof-common-land,ncas-amof-2.0.0/amof-surface-met,ncas-amof-2.0.0/amof-global-attrs ncas-aws-10_iao_20231117_surface-met_v1.0.nc + NCAS-Radar ----------- +^^^^^^^^^^ -The NCAS-Radar standard is also recognised by ``checksit``\ , with the correct specs identified and -used if no template or spec options are specified. Unlike the NCAS-GENERAL and NCAS-IMAGE standards, -NCAS-Radar does not have specific data product specs, instead there are a number of different spec -files covering different areas of the standard. These spec files are: +If the file name ends with `.nc`, and the file contains the global attribute +"Conventions" with a value that contains "NCAS-Radar" (case insensitive match), then +the file is identified as an NCAS-Radar file. There are a number of different spec +files that cover different areas of the standard which ``checksit`` will use to check +against the files. These spec files are: .. code-block:: @@ -59,3 +67,23 @@ files covering different areas of the standard. These spec files are: radar-parameters sensor-pointing-variables sweep-variables + + +NCAS-Image +^^^^^^^^^^ + +If the name of the file ends with one of `.png`, `.jpg`, or `.jpeg` (case insensitive +match), and the file contains the "XMP-photoshop:Instructions" metadata tag with a +value that contains "National Centre for Atmospheric Science" (case insensitive match), +then the file is identified as an NCAS-Image file. The appropriate specs are then found +to check both global tags and photo or plot specific tags. For example, + +.. code-block:: + + checksit check ncas-cam-9_cao_20231117_photo_v1.0.nc + +is the same as + +.. code-block:: + + checksit check -t off -s ncas-image-1.0.0/amof-image-global-attrs,ncas-image-1.0.0/amof-photo ncas-cam-9_cao_20231117_photo_v1.0.nc From 8c767bbd3689ebae858c078f406df05718d91635 Mon Sep 17 00:00:00 2001 From: Joshua Hampton Date: Thu, 12 Mar 2026 11:17:52 +0000 Subject: [PATCH 09/15] Simplify and put links to docs more prominently --- README.md | 121 ++++++++---------------------------------------------- 1 file changed, 17 insertions(+), 104 deletions(-) diff --git a/README.md b/README.md index b994852..15d0e84 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,16 @@ File-checking made simple + +## Documentation + +See the [Read The Docs page](https://checksit.readthedocs.io/en/latest) for more +details on how to install and run checksit. + +Visit the [JASMIN help page](https://help.jasmin.ac.uk/docs/software-on-jasmin/community-software-checksit/) +for guidance on how to use checksit on JASMIN. + + ## Installation Create a venv, then install, either directly from GitHub: @@ -20,113 +30,16 @@ pip install . ## Usage -A brief description of how to use checksit is given here. For more detail, visit the [documentation site](https://checksit.readthedocs.io/en/latest). - -checksit is comprised of four key components - [check](#checksit-check), [describe](#checksit-describe), [show-specs](#checksit-show-specs), and [summary](#checksit-summary) - +A brief description of how to use checksit is given here. For more detail, visit the +[documentation site](https://checksit.readthedocs.io/en/latest). -## checksit check +### checksit check -Check file against a template. - -### Basic Usage +To check a file: ``` checksit check /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc ``` -* Checks format of file. -* checksit searches its template cache for a similar file to compare against - - -### Main Features - -#### Define template -``` -checksit check --template=template-cache/rls_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.cdl /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc -``` -* Use `--template` flag to define a template to use -* Template can be in template-cache or any file user has access to -* Note: cdl files are a representation of a netCDF file, being the output from `ncdump -h` on the netCDF file - - -#### Map variable names -``` -checksit check -m cltAnom=cloud_area_fraction /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc -``` -* Allows mapping of variable name, for the case that the name of a variable is different between the file to be checked and the template -* Format - `-m