MISSION SHADOW

"This is for the record. History is written by the victor. History is filled with liars." This isn't a course project. This is production infrastructure that bleeds when you shoot it.

MISSION BRIEF

A self-healing progressive delivery platform — Netflix runs this in production, I built it on student AWS credits. Every canary deployment, every traffic split, every automated rollback was engineered from zero. No tutorials. No hand-holding.

When your deployment fails, the system kills it before users notice. When the infrastructure burns, one command rebuilds it from ashes. That's not theory. That's tactical reality.

THE ARSENAL

INFILTRATION ROUTE

User sends request
      ↓
AWS Classic ELB intercepts
      ↓
Istio Ingress Gateway (service mesh entry point)
      ↓
VirtualService routes traffic
      ├── 95% → Stable pods (v3, battle-tested)
      └── 5% → Canary pods (v4, under surveillance)
              ↓
    Prometheus scrapes /metrics every 15s
              ↓
    AnalysisRun evaluates SLIs
      ├── Success rate < 99%? → ABORT MISSION
      ├── P95 latency > 300ms? → ABORT MISSION
      └── All green? → Promote canary to stable
              ↓
    Argo Rollouts executes the kill order
              ↓
    Bad deployment dead in 60 seconds.
    Users never knew it existed.

TACTICAL OBJECTIVES

Extract the target. Leave no trace.

One command deploys the infrastructure:

cd infra
terraform apply

One command destroys all evidence:

./cleanup.sh

Cluster up: 12 minutes.
Cluster down: 3 minutes.
Because infrastructure is expendable. The mission isn't.

ENGAGEMENT RULES

When a deployment goes rogue, the system doesn't ask permission. It terminates.

v4 deploys → metrics degrade → analysis fails → canary aborted → stable keeps serving

Zero human intervention. That's the difference between a platform and a science project.

COMMAND CENTER

Real traffic. Real mesh. Real automatic rollback.

PROOF OF EXECUTION

Canary deployment revision history — v4 aborted, v3 stable

Prometheus tracking request success rate in real-time

Istio service mesh metrics — traffic split visualization

RECONNAISSANCE & DEPLOYMENT

# Clone the op
git clone https://github.com/kaaaaash/mission-shadow.git
cd mission-shadow

# Configure AWS credentials
aws configure
# AWS Access Key ID: [YOUR_KEY]
# AWS Secret Access Key: [YOUR_SECRET]
# Default region: us-east-1

# Deploy infrastructure
cd infra
terraform init
terraform apply

# Wait ~12 minutes for cluster online
# Reconnect kubectl
aws eks update-kubeconfig --region us-east-1 --name mission-shadow

# Install Argo Rollouts
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f \
  https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

# Install Istio
cd ~
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.*
export PATH=$PWD/bin:$PATH
istioctl install --set profile=default -y

# Enable sidecar injection
kubectl label namespace default istio-injection=enabled

# Deploy the payload
cd ~/mission-shadow/k8s
kubectl apply -f services-rollout.yaml
kubectl apply -f istio-traffic.yaml
kubectl apply -f analysis/
kubectl apply -f rollout.yaml

# Verify operational status
kubectl argo rollouts get rollout shadow-payment

# Access Kiali dashboard
istioctl dashboard kiali

# Mission complete.

INFRASTRUCTURE LAYOUT

mission-shadow/
├── app/
│   ├── main.py              ← FastAPI service + Prometheus metrics
│   ├── Dockerfile           ← Container image definition
│   └── requirements.txt     ← Python dependencies
├── infra/
│   ├── main.tf              ← EKS cluster, VPC, NAT gateway
│   ├── variables.tf         ← Region, instance types, versions
│   ├── outputs.tf           ← Cluster endpoint, config
│   └── cleanup.sh           ← Nightly destroy script
├── k8s/
│   ├── rollout.yaml         ← Argo Rollouts canary strategy
│   ├── services-rollout.yaml    ← Stable + canary services
│   ├── istio-traffic.yaml   ← VirtualService + DestinationRule
│   └── analysis/
│       ├── success-rate.yaml    ← 99% SLI threshold
│       └── latency.yaml         ← 300ms P95 threshold
├── docs/
│   ├── DAY1-2_HANDOVER.md   ← Infrastructure setup log
│   ├── DAY4_HANDOVER.md     ← Argo Rollouts integration
│   └── DAY5_HANDOVER.md     ← Istio service mesh deployment
└── screenshots/             ← Mission evidence

## OPERATION TIMELINE

Day	Objective	Status
1-2	Infrastructure recon + EKS deployment	✅ Complete
3	FastAPI payload development + ECR push	✅ Complete
4	Argo Rollouts integration + canary validation	✅ Complete
5	Istio service mesh infiltration	✅ Complete
6A	Prometheus metrics instrumentation	✅ Complete
6B	Webhook warfare (sidecar injection debugging)	✅ Complete
6C	Automated rollback + chaos engineering	✅ Complete
7	Documentation + exfiltration	✅ Complete

Total duration: 7 days
Total cost: ~$35 USD
Downtime incidents: 0
Manual rollbacks required: 0

BATTLE-TESTED SCENARIOS

✅ Scenario 1: Broken Deployment

Action: Deploy v4 with 30% error rate
Expected: AnalysisRun detects failure, aborts canary
Result: Canary terminated in 60s, stable v3 served 100% traffic
Casualties: Zero

✅ Scenario 2: Pod Assassination

Action: kubectl delete pod --force
Expected: Kubernetes recreates pod, Istio reroutes traffic
Result: Pod back online in 8s, zero dropped requests
Casualties: Zero

✅ Scenario 3: High Latency Injection

Action: Deploy v5 with 500ms delay
Expected: P95 latency breaches 300ms threshold
Result: AnalysisRun fails, rollback triggered
Casualties: Zero

## COST ANALYSIS

Resource	Cost/hour	Daily Cost
EKS Control Plane	$0.10	$2.40
2x t3.medium nodes	$0.08	$1.92
2x Classic ELB	$0.05	$1.20
TOTAL	$0.23	$5.52

Project total: 7 days × $5.52 = $38.64
Remaining credits: $118 - $39 = $79

Nightly destroy protocol saved: ~$150 in potential waste

AFTER ACTION REPORT

What was proven:

Canary deployments contain blast radius (bad code ≠ outage)
Automated rollback works without humans in the loop
Service mesh enables surgical traffic control
Kubernetes + Istio + Argo = production-grade platform
Student budget + 7 days = enterprise deployment pipeline

What was learned:

Istio webhook timeouts are infrastructure issues, not deployment issues
Never reuse Docker tags (immutable deployments are law)
AWS free tier blocks t3.medium, plan accordingly
ELB ghost ENIs require manual cleanup before VPC destroy
Prometheus metrics must be instrumented, not assumed

What comes next:

Multi-cluster federation for global deployments
GitOps with ArgoCD for declarative deployments
Flagger for automated progressive delivery
OpenTelemetry for distributed tracing
Chaos Mesh for production resilience testing

OPERATOR NOTES

This was built solo. No team. No senior engineer reviewing PRs. Just documentation, trial/error, and 47 terraform destroy cycles.

If you're reading this thinking "I could never build this" — wrong.
Six months ago I didn't know what a service mesh was.
Now I'm running one in production (well, "production" being my AWS sandbox, but the architecture is identical).

The difference between a resume project and real infrastructure isn't complexity.
It's whether the thing actually works when you break it.

EXFILTRATION PROTOCOL

Mission complete. Destroy all evidence:

cd ~/mission-shadow/infra
./cleanup.sh

Verify zero burn rate:

aws eks list-clusters --region us-east-1
# Should return empty

aws elb describe-load-balancers --region us-east-1
# Should return empty

Cluster destroyed. Costs zeroed. Mission archived.

╔═══════════════════════════════════════════════════════════════╗
║                                                               ║
║  "Remember — no Russian."                                     ║
║                                                               ║
║  This platform doesn't care about your code's feelings.       ║
║  It cares about uptime.                                       ║
║                                                               ║
║  Mission Shadow: COMPLETE ✅                                  ║
║  Operator: kaash                                              ║
║  Status: EXFILTRATED                                          ║
║                                                               ║
╚═══════════════════════════════════════════════════════════════╝

Bravo Six, going dark.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
docs		docs
infra		infra
k8s		k8s
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MISSION SHADOW

MISSION BRIEF

THE ARSENAL

INFILTRATION ROUTE

TACTICAL OBJECTIVES

ENGAGEMENT RULES

COMMAND CENTER

PROOF OF EXECUTION

RECONNAISSANCE & DEPLOYMENT

INFRASTRUCTURE LAYOUT

BATTLE-TESTED SCENARIOS

✅ Scenario 1: Broken Deployment

✅ Scenario 2: Pod Assassination

✅ Scenario 3: High Latency Injection

AFTER ACTION REPORT

OPERATOR NOTES

EXFILTRATION PROTOCOL

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MISSION SHADOW

MISSION BRIEF

THE ARSENAL

INFILTRATION ROUTE

TACTICAL OBJECTIVES

ENGAGEMENT RULES

COMMAND CENTER

PROOF OF EXECUTION

RECONNAISSANCE & DEPLOYMENT

INFRASTRUCTURE LAYOUT

BATTLE-TESTED SCENARIOS

✅ Scenario 1: Broken Deployment

✅ Scenario 2: Pod Assassination

✅ Scenario 3: High Latency Injection

AFTER ACTION REPORT

OPERATOR NOTES

EXFILTRATION PROTOCOL

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages