Skip to content

carstenartur/Taxonomy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

425 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Taxonomy Architecture Analyzer

CI/CD Coverage Tests License: MIT SBOM GitHub release

Turn a business requirement into a validated architecture view — in one step.

Describe what you need in plain English. The Taxonomy Architecture Analyzer scores every node in the C3 Taxonomy Catalogue (~2,500 elements across 8 architecture layers) using AI, discovers architecture relations, and generates exportable diagrams — with full names, impact hotspots, and clear layer labels visible at every step.

flowchart LR
    A["✏️ Requirement"] --> B["🤖 AI Analysis"]
    B --> C["🌳 Scored Tree"]
    C --> D["🏗️ Architecture View"]
    D --> E["📐 Export"]

    style A fill:#e3f2fd
    style B fill:#fff3e0
    style C fill:#e8f5e9
    style D fill:#f3e5f5
    style E fill:#fce4ec
Loading

The AI analysis step is hierarchical — it scores root categories first, then distributes each root's relevance budget into its children at every taxonomy level. This produces a scoring trace (how relevance narrows from root to leaf) and an architecture impact view (which layers and relations are affected). See the showcase below for a worked example.


Architecture Impact Showcase

"Provide an integrated communication platform for hospital staff, enabling real-time voice and data exchange between departments, with a clinical dashboard application for patient handoff tracking and team coordination."

How scoring works — hierarchical, not isolated:

The system does not match a few keywords to final nodes. Instead, scoring progresses hierarchically through the taxonomy — the AI first evaluates each of the 8 root categories (Capabilities, Communications Services, Core Services, …), then distributes each root's relevance budget into its children at every taxonomy level. Intermediate nodes also receive scores and carry architectural meaning. This produces two complementary results:

  1. Scoring Trace — how relevance narrows step-by-step from root → intermediate → leaf (example paths shown below)
  2. Architecture Impact — which concrete elements and cross-layer relations are affected (the graph and table below)

What the system found for this requirement:

Scoring path (root → leaf) Root budget Leaf result Why
CP → CP-1000 → CP-1023 Communication and Information System Capabilities 92% 85% Direct communication capability match
CO → CO-1000 → CO-1011 Communications Access Services 88% 80% Real-time voice/data exchange
CR → CR-1000 → CR-1047 Infrastructure Services 81% 75% Infrastructure supporting the platform
UA → UA-1000 → UA-1179 → UA-1574 Unified Communication Applications 74% 62% Hospital staff unified comms application
BP → BP-1000 → BP-1327 → BP-1490 → BP-1697 Medical Command, Control And Communication 71% 52% Clinical workflow and coordination
IP → IP-1000 → IP-1078 → IP-1023 → IP-2106 CIS Coordination 60% 38% Data exchange coordination products
BR → BR-1000 → BR-1057 → BR-1023 → BR-1334 CIS Coordination and Advice Roles 55% 32% Staff coordination roles

Each root score is a budget distributed into children. A leaf score of 85% under a 92% root means that child consumed most of the parent's relevance. Intermediate nodes (CP-1000, CO-1000, CR-1000, …) also received scores — they are not just structural pass-throughs.

About leaf specificity: The scoring trace shows how the system navigates from abstract roots to the most specific available node.

The result spans 7 architecture layers with traceable relations, ⚠ impact hotspots (≥ 80%), and exportable diagrams. The graph below was generated by the real pipeline — not hand-crafted.

6 concrete architecture elements across 7 layers, connected by 17 traced relations — generated by the real pipeline.

flowchart TD
    subgraph Capabilities["🔵 Capabilities"]
        CP_1023(["Communication and Information System Ca…<br/>★ ⚠ 85%"])
    end
    subgraph Business_Roles["🟢 Roles"]
        BR["Business Roles<br/>61%"]
    end
    subgraph Business_Processes["🟢 Processes"]
        BP_1490["Health Services<br/>58%"]
    end
    subgraph Core_Services["🟠 Core Services"]
        CR_1047(["Infrastructure Services<br/>★ 75%"])
    end
    subgraph COI_Services["🟠 COI Services"]
        CI["COI Services<br/>74%"]
    end
    subgraph User_Applications["🟣 Applications"]
        UA_1574["Unified Communication Applications<br/>62%"]
    end
    subgraph Communications_Services["🔴 Communications"]
        CO_1011(["Communications Access Services<br/>★ ⚠ 80%"])
        CO_1050["Transit Services<br/>55%"]
    end
    CP_1023 -->|realizes| CO_1011
    CP_1023 -->|realizes| CR_1047
    CO_1011 -->|depends on| CR_1047
    CO_1011 -->|supports| CR_1047
    CR_1047 -->|fulfills| CP_1023
    CR_1047 -->|supports| UA_1574
    UA_1574 -->|uses| CO_1011
    UA_1574 -->|uses| CR_1047
    CO_1011 -->|supports| BP_1490
    CR_1047 -->|supports| BP_1490
    UA_1574 -->|supports| BP_1490
    CP_1023 -->|realizes| CO_1050
    classDef cap fill:#4A90D9,color:#fff,stroke:#2171B5
    classDef proc fill:#27AE60,color:#fff,stroke:#1E8449
    classDef role fill:#27AE60,color:#fff,stroke:#1E8449
    classDef svc fill:#F39C12,color:#fff,stroke:#D68910
    classDef app fill:#8E44AD,color:#fff,stroke:#6C3483
    classDef info fill:#3498DB,color:#fff,stroke:#2980B9
    classDef comm fill:#E74C3C,color:#fff,stroke:#C0392B
    classDef hotspot fill:#D32F2F,color:#fff,stroke:#B71C1C,stroke-width:3px
    class CP_1023 cap
    class CP_1023 hotspot
    class BR role
    class BP_1490 proc
    class CR_1047 svc
    class CI svc
    class UA_1574 app
    class CO_1011 comm
    class CO_1011 hotspot
    class CO_1050 comm
Loading

Legend: ★ = direct match (anchor) · ⚠ = impact hotspot (≥ 80%) · Rounded nodes = anchors/hotspots · % = relevance score · Arrow labels = relation type

Scoring trace — how relevance narrows from root to leaf

The LLM does not score leaf nodes in isolation. It first evaluates each root category (distributing a total relevance budget), then recursively narrows the score into child nodes at each taxonomy level. Every intermediate node receives a score and carries architectural meaning — the result is hierarchical narrowing, not isolated leaf matching.

Scoring Path Score Role
CP Capabilities 92% Root category
 ├ CP-1000 Capabilities 90% Intermediate (L1) — narrows 92%
  ├ CP-1023 Communication and Information System Capabilities 85% Leaf — narrowed from 90%
  ├ CP-1010 Battlespace Management Capabilities 40% Intermediate (L2) — narrows 90%
   └ CP-1030 Cyberspace Battlespace Management Capabilities 30% Leaf — narrowed from 40%
CO Communications Services 88% Root category
 ├ CO-1000 Communications Services 86% Intermediate (L1) — narrows 88%
  ├ CO-1011 Communications Access Services 80% Leaf — narrowed from 86%
  ├ CO-1063 Transport Services 70% Intermediate (L2) — narrows 86%
   ├ CO-1050 Transit Services 55% Intermediate (L3) — narrows 70%
    └ CO-1019 Frame Switching Services 52% Leaf — narrowed from 55%
CR Core Services 81% Root category
 ├ CR-1000 Core Services 79% Intermediate (L1) — narrows 81%
  ├ CR-1047 Infrastructure Services 75% Intermediate (L2) — narrows 79%
   ├ CR-1039 Infrastructure CIS Security Services 52% Intermediate (L3) — narrows 75%
    └ CR-1021 Digital Certificate Services 48% Leaf — narrowed from 52%
UA User Applications 74% Root category
 ├ UA-1000 User Applications 72% Intermediate (L1) — narrows 74%
  ├ UA-1179 Communication and Collaboration Applications 68% Intermediate (L2) — narrows 72%
   └ UA-1574 Unified Communication Applications 62% Leaf — narrowed from 68%
BP Business Processes 71% Root category
 ├ BP-1000 Business Processes 69% Intermediate (L1) — narrows 71%
  ├ BP-1327 Enable 65% Intermediate (L2) — narrows 69%
   ├ BP-1490 Health Services 58% Intermediate (L3) — narrows 65%
    └ BP-1697 Medical Command, Control And Communication 52% Leaf — narrowed from 58%
IP Information Products 60% Root category
 ├ IP-1000 Information Products 58% Intermediate (L1) — narrows 60%
  ├ IP-1078 Operation Enabling Information Products 48% Intermediate (L2) — narrows 58%
   ├ IP-1023 CIS Information Products 42% Intermediate (L3) — narrows 48%
    └ IP-2106 CIS Coordination 38% Leaf — narrowed from 42%
BR Business Roles 55% Root category
 ├ BR-1000 Business Roles 53% Intermediate (L1) — narrows 55%
  ├ BR-1057 Functional Military Roles 45% Intermediate (L2) — narrows 53%
   ├ BR-1023 CIS Staff Roles 38% Intermediate (L3) — narrows 45%
    └ BR-1334 CIS Coordination and Advice Roles 32% Leaf — narrowed from 38%
CI COI Services 45% Root category
 └ (no leaf nodes scored above threshold)

Each root score is the budget that the LLM distributes among its children. A leaf score of 85% under a root of 92% means that child consumed most of the parent's relevance.

Pipeline details — included elements and relationships

Included Elements — selected by the pipeline (anchors + propagated + enriched leaf nodes):

Code Name Layer Relevance Path Role Included Because
CP Capabilities Capabilities 92% CP (root) ★ Anchor direct-match
CP-1000 Capabilities Capabilities 90% CP > CP-1000 ★ Anchor direct-match
CO Communications Services Communications Services 88% CO (root) ★ Anchor direct-match
CO-1000 Communications Services Communications Services 86% CO > CO-1000 ★ Anchor direct-match
CP-1023 Communication and Information System Capabilities Capabilities 85% CP > CP-1000 > CP-1023 ★ Anchor direct-match
CR Core Services Core Services 81% CR (root) ★ Anchor direct-match
CO-1011 Communications Access Services Communications Services 80% CO > CO-1000 > CO-1011 ★ Anchor direct-match
CR-1000 Core Services Core Services 79% CR > CR-1000 ★ Anchor direct-match
CR-1047 Infrastructure Services Core Services 75% CR > CR-1000 > CR-1047 ★ Anchor direct-match
UA User Applications User Applications 74% UA (root) ★ Anchor direct-match
UA-1000 User Applications User Applications 72% UA > UA-1000 ★ Anchor direct-match
BP Business Processes Business Processes 71% BP (root) ★ Anchor direct-match
CO-1063 Transport Services Communications Services 70% CO > CO-1000 > CO-1063 ★ Anchor direct-match
CI COI Services COI Services 74% CI (root) Propagated propagated via REALIZES from CP
UA-1179 Communication and Collaboration Applications User Applications 68% UA > UA-1000 > UA-1179 Enriched leaf leaf-enrichment: top-scoring in UA
BP-1327 Enable Business Processes 65% BP > BP-1000 > BP-1327 Enriched leaf leaf-enrichment: top-scoring in BP
UA-1574 Unified Communication Applications User Applications 62% UA > UA-1000 > UA-1179 > UA-1574 Enriched leaf leaf-enrichment: top-scoring in UA
BR Business Roles Business Roles 61% BR (root) Propagated propagated via SUPPORTS from CR
BP-1490 Health Services Business Processes 58% BP > BP-1000 > BP-1327 > BP-1490 Enriched leaf leaf-enrichment: top-scoring in BP
CO-1050 Transit Services Communications Services 55% CO > CO-1000 > CO-1063 > CO-1050 Enriched leaf leaf-enrichment: top-scoring in CO

Impact Relationships — concrete cross-category architecture connections:

Source Target Relation Type Relevance Derived From
CP-1023 Communication and Information System Capabilities CR-1047 Infrastructure Services REALIZES 75% impact: CP-1023 → CR-1047 (derived from CP → CR)
CR-1047 Infrastructure Services CP-1023 Communication and Information System Capabilities FULFILLS 75% impact: CR-1047 → CP-1023 (derived from CR → CP)
CR-1047 Infrastructure Services UA-1574 Unified Communication Applications SUPPORTS 62% impact: CR-1047 → UA-1574 (derived from CR → UA)
UA-1574 Unified Communication Applications CR-1047 Infrastructure Services USES 62% impact: UA-1574 → CR-1047 (derived from UA → CR)
CR-1047 Infrastructure Services BP-1490 Health Services SUPPORTS 58% impact: CR-1047 → BP-1490 (derived from CR → BP)
UA-1574 Unified Communication Applications BP-1490 Health Services SUPPORTS 58% impact: UA-1574 → BP-1490 (derived from UA → BP)
CP-1023 Communication and Information System Capabilities CO-1050 Transit Services REALIZES 55% impact: CP-1023 → CO-1050 (derived from CP → CO)
CO-1050 Transit Services BP-1490 Health Services SUPPORTS 55% impact: CO-1050 → BP-1490 (derived from CO → BP)
CO-1050 Transit Services CR-1047 Infrastructure Services DEPENDS_ON 55% impact: CO-1050 → CR-1047 (derived from CO → CR)
CO-1050 Transit Services CR-1047 Infrastructure Services SUPPORTS 55% impact: CO-1050 → CR-1047 (derived from CO → CR)
UA-1574 Unified Communication Applications CO-1050 Transit Services USES 55% impact: UA-1574 → CO-1050 (derived from UA → CO)

Trace Relationships — root-level propagation for scoring traceability:

Source Target Relation Type Propagated Relevance Hop
CP CI REALIZES 74% 1
CP CO REALIZES 74% 1
CP CR REALIZES 74% 1
CO BP SUPPORTS 66% 1
CO CR SUPPORTS 66% 1
CR BP SUPPORTS 61% 1
CR BR SUPPORTS 61% 1
CR UA SUPPORTS 61% 1
CR CP FULFILLS 57% 1
UA BP SUPPORTS 55% 1
CO CR DEPENDS_ON 53% 1
CR CI DEPENDS_ON 49% 1
UA CI USES 48% 1
UA CO USES 48% 1
UA CR USES 48% 1
CI BP SUPPORTS 39% 2
CI BR SUPPORTS 39% 2
CI CP FULFILLS 36% 2

Interactive views — the same requirement analyzed in the web UI:

The scored taxonomy tree (left) shows hierarchical scoring across all 8 root categories — colour-coded by relevance, with every intermediate and leaf node carrying its own score. The architecture impact view (right) shows the derived multi-layer result with swimlanes, anchors (★), hotspots (⚠), and the detail table.

Scored taxonomy tree showing hierarchical scoring across all root categories

Scored taxonomy tree — each node shows its relevance score, colour-coded from green (high) to grey (low). Root, intermediate, and leaf nodes all carry scores.

Architecture impact view with swimlanes, anchors, and detail table

Architecture impact view — swimlane layout groups elements by layer. The detail table below lists every included element with its hierarchy path, relevance, and inclusion reason.


Core Workflow (UI)

Step What you do What happens
1 Enter a requirement in the analysis panel Free-text input
2 Click Analyze with AI AI scores every taxonomy node (0–100)
3 Explore the scored tree Colour-coded results across 6 view modes
4 Review relations and proposals Accept/reject AI-generated architecture relations
5 Export One-click export to ArchiMate XML, Visio, Mermaid, or JSON

Full page layout

Export buttons Export buttons

Key Features

Area Capabilities
Analysis AI-scored taxonomy mapping · semantic, hybrid, and graph search · relevance propagation · full node names and layer labels
Architecture Interactive impact maps with ★ anchors and ⚠️ hotspots · relation proposals with review workflow · gap analysis · pattern detection
Graph Upstream/downstream exploration · failure-impact analysis · requirement impact
DSL Text-based architecture DSL · JGit-backed versioning with branching and merge
Export ArchiMate 3.x XML · Visio .vsdx · Mermaid · JSON · Reports (Markdown, HTML, DOCX)

Installation

Prerequisites

Requirement Notes
Java 21+ JDK for building, JRE for running
Maven 3.9+ Build only
LLM API key or LLM_PROVIDER=LOCAL_ONNX Required for AI analysis; browsing and search work without it

Where to start

Your goal Recommended option
🚀 Quickest way to try it Container Image — one docker run command
🏢 Production deployment Docker + HTTPS — Caddy reverse proxy with automatic TLS
🛠️ Development & contribution Run locally — Maven + JDK

Run locally (development only)

git clone https://github.com/carstenartur/Taxonomy.git
cd Taxonomy

# Build the sibling modules first (required once, or after changes)
mvn install -DskipTests

# Then start the application from the app module
cd taxonomy-app

# With Gemini (default)
GEMINI_API_KEY=your-key mvn spring-boot:run

# Fully offline (no API key)
LLM_PROVIDER=LOCAL_ONNX mvn spring-boot:run

# Browse-only (no AI analysis)
mvn spring-boot:run

Open http://localhost:8080 and log in with admin / admin.

⚠️ localhost only. The commands above start an unencrypted HTTP server for local development. Never expose port 8080 to the internet. For any non-local deployment, use the Docker + HTTPS setup below.

→ Now follow the Core Workflow above to run your first analysis.

Container Image

The official Docker image is published to GitHub Container Registry on every push to main:

ghcr.io/carstenartur/taxonomy

Pull and run (quick start):

docker pull ghcr.io/carstenartur/taxonomy:latest
docker run -p 8080:8080 ghcr.io/carstenartur/taxonomy:latest
# Open http://localhost:8080 — never expose port 8080 to the internet
Tag Example Description
latest ghcr.io/carstenartur/taxonomy:latest Most recent build from the default branch (main)
main ghcr.io/carstenartur/taxonomy:main Identical to latest (branch-name tag)
sha-<hash> ghcr.io/carstenartur/taxonomy:sha-abc1234 Pinned to a specific commit — use for reproducible deployments

See the Container Image Guide for Docker Compose usage, environment variables, volume mounts, and upgrade notes.

Docker (production — with HTTPS)

For any deployment beyond localhost, use Docker Compose with a reverse proxy that provides automatic HTTPS. The repository includes a ready-to-use docker-compose.prod.yml with Caddy for automatic TLS certificate provisioning:

# 1. Clone and configure
git clone https://github.com/carstenartur/Taxonomy.git
cd Taxonomy
cp .env.example .env          # edit .env with your domain and API key

# 2. Start (HTTPS on port 443, automatic Let's Encrypt certificate)
docker compose -f docker-compose.prod.yml up -d

Open https://your-domain.example.com and log in with the password you set in .env.

See the Deployment Guide for VPS, Render.com, and cloud deployment instructions, alternative reverse proxies (nginx), and Spring Boot native SSL.

Docker without HTTPS (local testing only):

docker run -p 8080:8080 -e LLM_PROVIDER=LOCAL_ONNX ghcr.io/carstenartur/taxonomy:latest
# Access at http://localhost:8080 — never expose this to the internet

Build & Test

mvn compile           # Compile only
mvn test              # Unit + Spring context tests (no Docker needed)
mvn verify            # Unit + integration tests (requires Docker)

Advanced: REST API (Automation & Integration)

For scripting, CI pipelines, and system integration — click to expand

The primary way to use this product is through the web-based GUI (see Core Workflow above).

For scripting, CI pipelines, and system integration, a REST API is available:

Note: The REST API is not intended as a replacement for the GUI for end-user workflows. All user-facing features are designed to be used through the web interface first.


Repository Structure

Taxonomy/
├── taxonomy-domain/     # Pure domain types (DTOs, enums) — no framework dependencies
├── taxonomy-dsl/        # Architecture DSL: parser, serializer, validator, differ
├── taxonomy-export/     # Export formats: ArchiMate, Visio, Mermaid, Diagram
├── taxonomy-app/        # Spring Boot application: REST API, services, persistence, UI
├── docs/                # Documentation and auto-generated screenshots
└── pom.xml              # Parent POM (4 modules, Spring Boot 4, Java 21)

Documentation

Essential guides

Document Description
User Guide End-user guide with screenshots and workflow walkthroughs
Examples Worked examples for analysis, impact, proposals, export
Deployment Docker, Render.com, health checks
Security Authentication, roles, permissions, deployment hardening
Architecture System design, modules, DSL storage, pipelines
Developer Guide Module architecture, testing, extending the system
Workspace Versioning Context bar, variants, sync, merge, cherry-pick
AI Providers Supported LLM providers and configuration
All documentation — configuration, operations, integrations, and more
Document Description
Concepts & Glossary Key terms and domain model
Document Import PDF/DOCX import, candidate extraction, source provenance
Framework Import Import APQC, ArchiMate, C4, UAF frameworks
Preferences Runtime preferences (LLM, DSL/Git, size limits)
API Reference REST API quick-reference with request/response examples
Curl Examples End-to-end automation examples
Configuration Environment variables and settings
Container Image GHCR image, Docker Compose, tags, volumes, upgrades
Deployment Checklist Pre-deployment verification checklist
Database Setup PostgreSQL, MSSQL, Oracle configuration
Keycloak & SSO SSO/OIDC/SAML integration with Keycloak
Keycloak Migration Migrating from form-login to Keycloak/OIDC
Operations Guide Operational procedures and monitoring
Git Integration JGit DFS repository, branching, REST endpoints
Repository Topology Workspace provisioning, topology modes, sync
Relation Seeds Seed data format, provenance, CSV schema
Feature Matrix Feature completeness tracking (GUI, REST, docs, i18n)
UI Gap Analysis JavaScript module inventory and workspace UI status
AI Transparency AI/LLM usage transparency documentation
Data Protection GDPR and data protection compliance
Knowledge Conservation Use case: architecture knowledge preservation

Government Readiness (Behördentauglichkeit)

The Taxonomy Architecture Analyzer includes comprehensive documentation for deployment in German government and public administration environments. See also: Security, Data Protection, AI Transparency, Deployment Checklist, and Knowledge Conservation in the Documentation table above.

Document Description
BSI KI Checklist BSI criteria checklist for AI models in federal administration
AI Literacy Concept Training concept per EU AI Act Art. 4 (AI Literacy)
Accessibility / BITV 2.0 BITV 2.0 / WCAG 2.1 accessibility concept and action plan
Digital Sovereignty Digital sovereignty, openCode compatibility, DVC architecture
Administration Integration FIM / 115 / XÖV integration roadmap

Key capabilities for government use:

  • 🔒 Air-gapped operationLLM_PROVIDER=LOCAL_ONNX for fully offline deployment
  • 🇪🇺 EU data residency — Mistral (France/EU) as cloud LLM alternative
  • 📋 SBOM — CycloneDX Software Bill of Materials generated at build time
  • 🏛️ Open Source — MIT license, full source code, no vendor lock-in
  • 🔐 SSO/OIDC — Keycloak integration for government identity providers (see Keycloak & SSO Setup)

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Run tests (mvn test)
  4. Commit your changes
  5. Open a pull request

Important: For user-facing features, please read the Definition of Done before opening a PR. Features that only add a REST endpoint without GUI support are not considered complete.

License

This project is licensed under the MIT License.

About

AI-assisted architecture and taxonomy workbench with requirements analysis, versioned DSL, JGit-backed history, and export to ArchiMate, Visio, Mermaid, and JSON.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors