Update release with new fabrica-based services; remove old services#50
Update release with new fabrica-based services; remove old services#50travisbcotton wants to merge 63 commits into
Conversation
9bf779d to
ef8d070
Compare
|
Just a couple of other notes before merging. We need to update the
We also need to update |
|
Another note...we're going to update the CoreDHCP config in Here's snippet of the tutorial config should look like after the changes: - coresmd: |
svc_base_uri=https://demo.openchami.cluster:8443
ipxe_base_uri=http://172.16.0.254:8081
ca_cert=/root_ca/root_ca.crt
cache_valid=30s
lease_time=1h
single_port=false
- bootloop: |
lease_file=/tmp/coredhcp.db
script_path=default
lease_time=5m
ipv4_start=172.16.0.200
ipv4_end=172.16.0.250 |
|
A couple of changes:
|
8ab666e to
fc525d0
Compare
|
We'll need |
|
We'll have to note these major changes in the release notes once this is merged. We'll want to bump the minor version on the tag. |
651ce7d to
caa1bd3
Compare
|
Should we provide a Edit: Just to add, here's the default boot-service config.yaml: systemd/configs/boot-service.yaml# SPDX-FileCopyrightText: 2025 OpenCHAMI Contributors
#
# SPDX-License-Identifier: MIT
# OpenCHAMI Boot Service Configuration Example
#
# This is a comprehensive example configuration file for the OpenCHAMI boot service.
# To use this configuration:
# 1. Copy this file to config.yaml: cp config.example.yaml config.yaml
# 2. Customize the settings below for your environment
# 3. Remove or comment out sections you don't need
#
# Configuration precedence (highest to lowest):
# 1. Command-line flags
# 2. Environment variables (e.g., BOOT_SERVICE_PORT=8082)
# 3. Configuration file (config.yaml)
# 4. Default values
# =============================================================================
# SERVER CONFIGURATION
# =============================================================================
# HTTP server settings
port: 8082 # Port to listen on
host: "0.0.0.0" # Interface to bind to (0.0.0.0 for all interfaces)
read_timeout: 30 # HTTP read timeout in seconds
write_timeout: 30 # HTTP write timeout in seconds
idle_timeout: 120 # HTTP idle timeout in seconds
# =============================================================================
# STORAGE CONFIGURATION
# =============================================================================
# Data storage settings
data_dir: "./data" # Directory for storing boot configurations
storage_type: "file" # Storage backend: "file", "database" (future)
# Database settings (when storage_type: "database")
# database:
# driver: "postgres"
# host: "localhost"
# port: 5432
# name: "boot_service"
# user: "boot_user"
# password: "boot_password"
# ssl_mode: "require"
# max_connections: 25
# connection_timeout: 30
# =============================================================================
# FEATURE TOGGLES
# =============================================================================
# Authentication
enable_auth: false # Enable TokenSmith JWT authentication
# Set to true for production environments
# Metrics and monitoring
enable_metrics: true # Enable Prometheus metrics endpoint
metrics_port: 9092 # Port for metrics endpoint (/metrics)
# API compatibility
enable_legacy_api: true # Enable legacy BSS-compatible endpoints
# Disable to force use of new API only
# =============================================================================
# AUTHENTICATION CONFIGURATION (when enable_auth: true)
# =============================================================================
auth:
# Core authentication settings
enabled: false # Must match enable_auth above
# JWT validation method (choose one):
# Option 1: JWKS URL (recommended for production)
jwks_url: "https://auth.openchami.org/.well-known/jwks.json"
jwks_refresh_interval: "1h" # How often to refresh JWKS cache
# Option 2: Static RSA public key (for development/testing)
# jwt_public_key: |
# -----BEGIN PUBLIC KEY-----
# MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
# -----END PUBLIC KEY-----
# JWT validation options
jwt_issuer: "https://auth.openchami.org" # Expected token issuer
jwt_audience: "boot-service" # Expected token audience
validate_expiration: true # Check token expiration
validate_issuer: true # Validate issuer claim
validate_audience: true # Validate audience claim
# Authorization requirements
required_claims: ["sub", "iss", "aud"] # Required JWT claims
required_scopes: ["boot:read"] # Required OAuth2 scopes
# Development/testing options (never use in production)
allow_empty_token: false # Allow requests without tokens
non_enforcing: false # Log auth failures but don't block requests
# =============================================================================
# HARDWARE STATE MANAGER INTEGRATION
# =============================================================================
# HSM (Hardware State Manager) settings
hsm_url: "http://localhost:27779" # URL of the HSM service
# Set to your HSM endpoint
# TokenSmith-backed HSM service authentication
# When both hsm_url and tokensmith_url are configured, boot-service exchanges a
# bootstrap token for short-lived service tokens and adds them to HSM requests.
# Standardized env vars: TOKENSMITH_URL, TOKENSMITH_BOOTSTRAP_TOKEN,
# TOKENSMITH_TARGET_SERVICE, TOKENSMITH_SCOPES, TOKENSMITH_REFRESH_SKEW_SEC
tokensmith_url: "http://localhost:8080"
tokensmith_target_service: "hsm"
tokensmith_scopes: "hsm:read"
tokensmith_refresh_skew_sec: 120
# tokensmith_bootstrap_token: "<bootstrap-jwt>" # Prefer env var for secrets
# Environment fallback: TOKENSMITH_BOOTSTRAP_TOKEN
# HSM authentication (when HSM requires auth)
# hsm_auth:
# type: "service_token" # Authentication type for HSM
# service_name: "boot-service"
# token_endpoint: "http://tokensmith:8080/token"
# =============================================================================
# EXTERNAL SERVICES
# =============================================================================
# TokenSmith authentication service (when enable_auth: true)
tokensmith:
url: "http://localhost:8080" # TokenSmith service URL
timeout: 30 # Request timeout in seconds
# Service-to-service authentication
service_auth:
enabled: false # Enable service tokens
service_name: "boot-service" # This service's identifier
token_endpoint: "/token" # Token endpoint path
# BSS (Boot Script Service) integration
bss:
enabled: false # Enable BSS integration
url: "http://localhost:27778" # BSS service URL
timeout: 30 # Request timeout in seconds
# =============================================================================
# LOGGING AND MONITORING
# =============================================================================
# Logging configuration
logging:
level: "info" # Log level: debug, info, warn, error
format: "json" # Log format: json, text
output: "stdout" # Log output: stdout, stderr, file
# file: "/var/log/boot-service.log" # Log file (when output: file)
# Health check configuration
health:
enabled: true # Enable health check endpoint
endpoint: "/health" # Health check URL path
timeout: 5 # Health check timeout in seconds
# =============================================================================
# PERFORMANCE AND SCALING
# =============================================================================
# Request limits
limits:
max_request_size: "10MB" # Maximum request body size
max_concurrent: 100 # Maximum concurrent requests
rate_limit: 1000 # Requests per minute per IP
# Caching (future feature)
# cache:
# enabled: false
# type: "memory" # Cache type: memory, redis
# ttl: "5m" # Cache TTL
# max_size: "100MB" # Maximum cache size
# =============================================================================
# DEVELOPMENT AND TESTING
# =============================================================================
# Development mode settings
development:
enabled: false # Enable development mode
cors_enabled: true # Enable CORS for browser testing
cors_origins: ["*"] # Allowed CORS origins
debug_endpoints: false # Enable debug/diagnostic endpoints
mock_services: false # Use mock external services
# =============================================================================
# DEPLOYMENT ENVIRONMENT EXAMPLES
# =============================================================================
# Uncomment and modify one of these sections for your deployment environment:
# --- Development Environment ---
# enable_auth: false
# enable_metrics: true
# logging:
# level: "debug"
# development:
# enabled: true
# debug_endpoints: true
# --- Production Environment ---
# enable_auth: true
# enable_metrics: true
# auth:
# enabled: true
# jwks_url: "https://auth.openchami.org/.well-known/jwks.json"
# jwt_issuer: "https://auth.openchami.org"
# jwt_audience: "boot-service"
# required_scopes: ["boot:read"]
# logging:
# level: "info"
# format: "json"
# --- Kubernetes/Container Environment ---
# port: 8080
# host: "0.0.0.0"
# data_dir: "/data"
# auth:
# jwks_url: "http://tokensmith:8080/.well-known/jwks.json"
# jwt_issuer: "openchami-tokensmith"
# jwt_audience: "openchami-cluster"
# hsm_url: "http://smd:27779"
# logging:
# format: "json"
# output: "stdout" |
We may want to add default hostname rules since the default if none is to prefix with The above will make the node hostnames be like |
synackd
left a comment
There was a problem hiding this comment.
Initial code review without testing this yet.
240792f to
a800adc
Compare
synackd
left a comment
There was a problem hiding this comment.
Testing now. I get:
sed: can't read /etc/containers/systemd/opaal.container: No such file or directory
when running the openchami-certificate-update script.
If getting rid of hydra, we'll want to remove references to it, e.g. in
release/scripts/openchami_profile.sh
Line 27 in d77457c
We can probably just get rid of those functions.
There was a problem hiding this comment.
We might have to mount a volume now and set --data-dir with some of the upstream changes to how metadata-service works. I get a "permission denied" error when I try to start it with pr-8 with the current Exec.
I tried adding a volume like Volume=/opt/workdir/data:/data and chmod 777 /opt/workdir/data and that fixes the permission denied issue for me.
May 12 14:51:27 openchami-testing.novalocal systemd[1]: Started The metadata-service container.
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: 2026/05/12 14:51:27 Starting github.com/OpenCHAMI/metadata-service server...
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Error: failed to initialize file storage: failed to create file backend: failed to create base directory /data: mkdir /data: permission denied
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Usage:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: ochami-metadata-server serve [flags]
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Flags:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --data-dir string Directory for file storage (default "/data")
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: -h, --help help for serve
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --host string Host to bind to (default "0.0.0.0")
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --idle-timeout int Idle timeout in seconds (default 60)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: -p, --port int Port to listen on (default 8080)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --read-timeout int Read timeout in seconds (default 15)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --wireguard-only Restrict access to WireGuard network only
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --wireguard-server string Enable WireGuard userspace controller (CIDR, e.g. 100.97.0.1/16)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --wireguard-state-file string Path to WireGuard state file for persistence (default "/data/wireguard/state.yaml")
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --write-timeout int Write timeout in seconds (default 15)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Global Flags:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --config string config file (default is $HOME/.ochami-metadata.yaml)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --debug Enable debug logging
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: 2026/05/12 14:51:27 failed to initialize file storage: failed to create file backend: failed to create base directory /data: mkdir /data: permission deniedWe'll also have to remove the --tokensmith-url flag as well at least for now.
There was a problem hiding this comment.
I added a volume for metadata to store things. also removed --tokensmith-url until that is added back in
There was a problem hiding this comment.
I think this was literally added back today with this PR:
OpenCHAMI/metadata-service#12
There was a problem hiding this comment.
Unresolving for further discussion/action. I assume we'll want this.
There was a problem hiding this comment.
Posted a comment to bump the linked issue. Looks like it needs to be rebased/resolved before merging.
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
…container to use it Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
…se it Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
3cc4ae4 to
49b143d
Compare
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
…ient arg Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
|
Alex asked me to try testing this out. I have tried using the (PR-52) OpenCHAMI Installer that I developed and it got stuck waiting for Hydra: The step in question (which comes from the quadlet tutorial) is trying (10 times) to get the DEMO_ACCESS_TOKEN using: Is this no longer the correct way to obtain the access token? I notice that Hydra has been removed and ??? replaced with TokenSmith ???. Is there a new way to get the token from TokenSmith? |
|
That's because that function hasn't been updated yet. See https://github.com/OpenCHAMI/openchami.org/pull/100/changes/BASE..08154d8ad060aae113378f3e9a6fb3f1d27515a4#diff-b2078f278f9bf7a1610c6ae9134a53c18cc543d5d13f4c4425b9123ca9184e17R1242 and related discussion. |
|
Thanks! That helps. From there I should be able to figure out what I will need to do to get the installer up to date... |
|
Just wondering, since all of the stuff needed to generate the token is known but different, why not keep the abstraction in the form of |
That's the plan. We just haven't made the update yet for the tutorial to work. Working changes for the tutorial documentation are here. |
|
Ok. So, the fact that this is still in progress is an reflection of future work you are planning on and the tutorial changes are, at least for now, a parallel work in progress that will be minimized as this moves toward completion, but is currently needed to allow people to work with this PR? That makes sense. That also informs how I will work alongside of this. I will plan to incorporate the changes to the tutorial locally and temporarily into my code so I can get a sense of the deviation, but not plan to update my code with those changes until I see the final result. |
Yes, we're trying to change everything at once so every works in the tutorial like before and we can keep it as frictionless as possible. We're still figuring out some parts of it and testing the new services here though. FYI if you need the new command for the access token to continue testing, here it is: export DEMO_ACCESS_TOKEN=$(sudo podman exec tokensmith /bin/sh -c "/usr/local/bin/tokensmith user-token create --audience smd --key-file /tokensmith/data/keys/private.pem --subject 'admin@example.com' --scopes 'admin' --enable-local-user-mint") |
|
Yep. I picked that up from the linked tutorial PR. Thanks! |
Pull Request Template
Thank you for your contribution! Please ensure the following before submitting:
Checklist
make test(or equivalent) locally and all tests passgit commit -s) with my real name and email<filename>.licensesidecarLICENSES/directoryDescription
Please include a summary of the change and which issue is fixed.
Also include relevant motivation and context.
Fixes #(issue)
Type of Change
For more info, see Contributing Guidelines.