NeXus–NOMAD metainfo generator: Phase 3 (parser v2 + app v2)#810
Draft
lukaspie wants to merge 21 commits into
Draft
NeXus–NOMAD metainfo generator: Phase 3 (parser v2 + app v2)#810lukaspie wants to merge 21 commits into
lukaspie wants to merge 21 commits into
Conversation
added 20 commits
June 29, 2026 18:24
NamedConceptContext was missing a links field, so groups that define only <link> children (no fields) were not generating named concept classes. - Add links list to NamedConceptContext - Collect link children in _build_named_concept() - Guard on concept.quantities or concept.links - Template named concept block uses concept.links (not top-level links) - NXmonopd now generates MonopdData(Data) with polar_angle and data as NeXusLink quantities pointing to the detector fields
…tiple BASESECTIONS_MAP bases - _package.py: use dir(mod) instead of __all__ to include all generated sections from a module, including named concept classes. Previously __all__ only listed the primary class, leaving named concepts in the per-module auto-package with an unresolvable qualified name that crashed the JS frontend (_allBaseSections error). - _mapping.py: BASESECTIONS_MAP values changed from single-tuple to list[str] (fully-qualified class names). NXentry now maps to both Measurement and EntryData so Entry(Object, basesections.Measurement, EntryData) is generated, making archive.data = Arpes() valid. - nxdl_to_metainfo.py: _base_from_extends() updated to return list[str] of extra bases; added _split_fqn() helper; build_context() passes nomad_extra_bases list to template. render() now accepts out_path so ruff runs with the correct pyproject.toml context for isort fixes. - base_class.py.j2: template iterates nomad_extra_bases, emits imports for non-basesections modules in correct isort position. - entry.py: regenerated with Entry(Object, basesections.Measurement, EntryData).
…s; fix inheritance chains Generator (nxdl_to_metainfo.py): - Named concepts now generated when a group has app-specific sub-groups (children with nx_class absent from the base NXDL class), not only when it has own fields/links. Fixes missing MpesInstrument, XpsInstrument etc. - Application-derived named concepts inherit from the parent application's concept class (XpsInstrument(MpesInstrument), AfmInstrument(SpmInstrument)) rather than the generic NX base class. Uses group_naming_at(1) to compute the parent concept name from the parent's actual group XML, handling variadic-to-specific slot specialization correctly. - own_children() used throughout _build_named_concept so each concept only claims members defined at its own NXDL level. - _section_fqn and all FQN strings use _METAINFO_PACKAGE_ROOT constant. nexus_tree.py (NexusNode API additions): - own_children(): children whose nxdl_base matches this node's own file - children_at_definition(file): children at a specific NXDL definition level - definition_file_at(idx): .base path of inheritance[idx] - NexusGroup.group_naming_at(idx): (name, name_type, nx_class) at a specific inheritance level for concept-name computation without building full nodes
…classes LinkContext now carries target_quantity, resolved via NexusNode tree traversal of the NXDL concept path. The Jinja2 template emits the target's type/dimensionality/unit/shape instead of a bare type=str, so link quantities (e.g. NXlauetof time_of_flight) carry real numeric types and units. Regenerated all 20 affected application files.
Entry and all category="application" classes inherit EntryData via Entry. Group them under one "Experiment" category in NOMAD Oasis's "Create new entry from schema" dialog, configured once in the generator template.
Generator now computes eln_component/eln_default for every Quantity based on its Python type and shape. The Jinja2 template emits a_eln=ELNAnnotation(...) with the appropriate ELNComponentEnum variant for all scalar, non-Bytes quantities (str → StringEditQuantity, Datetime → DateTimeEditQuantity, bool → BoolEditQuantity, MEnum → EnumEditQuantity with single-value default, numeric → NumberEditQuantity with a_display unit). Arrays are excluded. SchemaAnnotation(enabled=False) added to NXentry and NXroot m_def to hide these base classes from the "Create from schema" dialog. All 280 generated files updated; 473 tests pass.
New parallel parser stack using the generated Python metainfo (nexus_base_classes + nexus_applications) with annotation-based HDF5 navigation instead of XML/NXDL schema resolution at parse time. - NexusParserV2 + NomadVisitorV2: walks HDF5 via NexusSchemaResolver, resolves quantities via per-class _SectionIndex (no __field/__group suffixes, no _rename_nx_for_nomad) - One archive per NXentry; archive.data is the Entry/Arpes/Xps instance directly - m_nx_data_path stored as a JSON HDF5-path -> archive-path map - New nexus_app_v2 explore app - handler.py: prescan() + on_prescan_group() hook for pre-pass metadata collection - Acceptance-gate tests in test_parsing_v2.py
Every NeXus file now also produces a "root" child archive holding a
Root(Experiment) instance that links all NXentry archives from that
file as ExperimentSteps (resolved lazily in normalize() via
m_context + generate_entry_id).
- BASESECTIONS_MAP: NXroot -> Experiment + EntryData
- Root entry_name = "{file stem} (NeXus file)"; results.eln.methods
lists the unique technique names (Arpes, Xps, ...) across all
NXentries in the file
- A NomadVisitorV2 pass over the root group populates Root's own
NXroot attributes (file_name, file_time, creator, NeXus_version, ...)
- Per-entry entry_name is now "{file stem} - {entry name}" so entries
named "entry"/"entry1" remain distinguishable across files
NeXusLink quantities whose target type was resolved by the generator (type != str) are now populated like regular fields instead of storing the HDF5 link path as a string -- h5py transparently dereferences the link so hdf_node already carries the target's real data. Conflict-renamed quantities (e.g. name_quantity for NXDL "name") now mirror their value into the NOMAD-native quantity they shadow (e.g. name), via a new _SectionIndex.shadow_map. Fixes the remaining NXlauetof assertions: name_group.time_of_flight and sample.name.
_SectionIndex.find_subsection used a naive .replace("GROUPNAME","").replace("NAME","")
+ startswith check for NXDL nameType="partial" groups, requiring the HDF5 group name
to literally start with the NXDL name (e.g. "eventID..."). The canonical algorithm
(get_nx_namefit) instead treats uppercase letter runs in the given name as wildcards,
so "event1" is just as valid a match for "eventID" as "eventID1". Fixed to use
get_nx_namefit(hdf_name, annotation.name, name_partial=True).
Also adds test_parsing_opt_v2.py, a v2 port of test_parsing_opt.py's storage-layout/
dtype matrix (33 parametrized cases), asserting structural reachability of the
NXdata group. The original mean/min/max/size/ndim value assertions are commented out
pending FIELD_STATISTICS support for array-shaped quantities in NXdata-derived
classes, deferred to Phase 4.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on #801 (Phases 1 + 2). This branch is fully rebased onto it. For the full plan, all phases, and the Architecture Decision Records, see the
data-modelingrepo: https://github.com/FAIRmat-NFDI/data-modeling/tree/main/nexus-metainfo (specifically here for Phase 3 implementation).Summary
Phase 1+2 generated the static Python metainfo (base classes + applications). This PR adds a new parser (
nexus_parser_v2) that reads HDF5 directly into the generated Python classes via theira_nexus_*annotations, and a new search app (nexus_app_v2) that mirrors the old app, but uses the new names in the schema. Both are purely additive, new entry points; nothing existing is touched or removed (for now).Testing this branch
To test with the new schema, install
pynxtoolsfrom this branch and add these entry points to yournomad.yaml(in addition to or instead of the existingnexus_schema/nexus_parser/nexus_app):Upload any NeXus (
.nxs/HDF5) file as usual;nexus_parser_v2matches it and produces archives using the Phase 1+2 generated classes (Entry,Xps,Arpes, …), browsable in the "NeXus v2" app.Key decisions
1. One archive per
NXentry, plus oneRootarchive per fileA NeXus file can hold multiple
NXentrygroups. Each becomes its own archive:archive.datais theEntry/Xps/Arpes/… instance directly. There is no wrapper section, unlike parser v1'sRootcontainer. A separateRoot(Experiment)archive is also produced per file, populated from the file's own NXroot attributes (file_name,file_time,creator,NeXus_version, …) and linking everyNXentryarchive in that file as anExperimentStep(resolved lazily viam_context+generate_entry_id, since sibling entries aren't necessarily already persisted).NXrootis mapped tobasesections.ExperimentinBASESECTIONS_MAPfor this. Entry names are disambiguated as"{file stem} - {entry name}"so thatentry/entry1-named groups (the overwhelming common case) stay distinguishable across files; the Root archive itself is named"{file stem} (NeXus file)".2. Annotation-driven matching replaces XML/NXDL resolution at parse time
parser_v2.pydoes not re-derive the schema from NXDL at parse time at all. Instead, it it builds a_SectionIndexper generatedSectionclass once, from thea_nexus_group/a_nexus_fieldannotations already baked into the Phase 1+2 classes (NeXusGroup.nx_class → SubSection, field concept name →Quantity), and usesNexusSchemaResolveronly for HDF5-side concept matching (variadic names,NXdatasignal/axes hints). Due to the new schema, paths read more naturally (e.g.data.instrument[0].energy_resolution.resolution, notdata.ENTRY[0].INSTRUMENT[0].energy_resolution.resolution__field).3. A new search app,
nexus_app_v2Separate from the existing NeXus app, but with the same layout. The columns/menu/dashboard are built against the v2 paths (no
__field/__groupsuffixes). Locked tosection_defs.definition_qualified_name = pynxtools.nomad.metainfo.base_classes.Entry, so it only ever shows v2-parsed entries (and the application definitions that inherit fromEntry).4. Link quantities and conflict-renamed quantities are populated correctly at parse time
Two parser-level fixes needed for the typed-link resolution and BaseSection-collision renaming (both already generator-side in #801) to actually produce correct values: a
NeXusLinkquantity whose target type was resolved to something other thanstris now populated like a regular field (h5py already dereferences the link transparently) instead of storing the raw HDF5 link path as a string. And a quantity renamed for aBaseSectioncollision (e.g.name_quantityfor an NXDL field literally namedname) now also mirrors its value into the NOMAD-native quantity it shadows (e.g.name) via a_SectionIndex.shadow_map, so both the NXDL-faithful path and the BaseSection-native path (used bynormalize(), search, etc.) carry the value.5.
results.material/ chemical formula normalization wired up for v2 archivesEntry/Rootinstances now populateresults.material(chemical formula, elements) during normalization, matching parser v1's behavior. This is needed for the new app's "Elements" menu (periodic table, formula filters) to return anything.What is NOT in this PR (deferred, see #801's phase table)
pynxtools-*plugin apps to the v2 schema/parserNORMALIZER_MAPlogic tonormalize()methodsmetainfo_to_nxdl.pyround-trip exporterschema.py/parser.py/NexusBaseSection/NORMALIZER_MAP