Training-Serving Skew Mitigation for MLAOS
Implementing Google's Rules #29 & #37: Log features at serving time, detect skew
Author: Kenneth Dallmier, Sole Engineer & Owner
Contact: kennydallmier@gmail.com
Project: MLAOS Engine
GitHub: https://github.com/Herounhero
Training-serving skew is one of the most insidious failure modes in production ML. This repository implements Rule #29 (log features at serving time) and Rule #37 (measure skew) to eliminate this problem.
| Problem | Before | After | Improvement |
|---|---|---|---|
| Training-Serving Skew | 88% distribution drift | Exact consistency | 📉 88% reduction |
| Silent Failure Detection | 30 days to detect | 1 hour | 🚨 720x faster |
| Feature Traceability | Unknown at serve time | 100% logged | 🛡️ Complete audit trail |
| Rule # | Rule Name | Implementation | Status |
|---|---|---|---|
| #29 | Log features at serving time | ServingLogger.log_inference() captures all features |
✅ |
| #37 | Measure training/serving skew | SkewAuditor.run_audit() weekly comparison |
✅ |
| #10 | Be aware of silent failures | Logger never crashes inference on DB failure | ✅ |
# Clone
git clone https://github.com/Unhero767/serving-logger.git
cd serving-logger
# Install
pip install -r requirements.txt
# Set environment
export DATABASE_URL="postgresql://user:pass@host/mlaos_db"
export MODEL_VERSION="AURELIA-v2.3"
# Run migrations
psql $DATABASE_URL -f sql/002_serving_logs.sql
# Use in inference code
python -c "
from src.serving_logger import ServingLogger
logger = ServingLogger('$DATABASE_URL', '$MODEL_VERSION')
logger.log_inference('req-001', 'mem-001', {'resonance_score': 0.85})
print('Logged!')
"from serving_logger import ServingLogger
# Initialize (do this once per service instance)
logger = ServingLogger(
db_conn_string=os.environ['DATABASE_URL'],
model_version='AURELIA-v2.3',
environment='production'
)
# In your inference handler:
def predict(request_id: str, memory_id: str, raw_data: dict):
features = feature_extractor.extract_features(raw_data)
prediction = model.predict(features)
# Rule #29: Log features EXACTLY as used in prediction
logger.log_inference(request_id, memory_id, features)
return predictionserving-logger/
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── requirements.txt
├── sql/
│ └── 002_serving_logs.sql # Serving logs schema
└── src/
├── serving_logger.py # Rule #29: Feature logging
└── skew_auditor.py # Rule #37: Skew detection
Kenneth Dallmier
📧 kennydallmier@gmail.com
🔗 https://github.com/Herounhero