A Spring Boot service fully wired to the Elastic Stack (Elasticsearch + Logstash + Kibana) via Docker Compose. Demonstrates production-grade structured JSON logging, MDC-based request tracing, a custom Elasticsearch index template, and a Logstash pipeline — the exact observability setup that reduced MTTD (Mean Time To Detect) production incidents from ~45 minutes to ~5 minutes in a banking microservice platform.
- The MTTD Problem
- Architecture
- What This Demonstrates
- Tech Stack
- Project Structure
- Getting Started
- Explore in Kibana
- Log Structure
- How MDC Tracing Works
- Logstash Pipeline
- Elasticsearch Index Template
- API Reference
- Running Tests
- Configuration Reference
- Production Considerations
- Extending This Example
Before centralized logging, detecting a production incident typically looked like this:
1. User reports an error
2. Engineer SSHs into server-1 → grep through /var/log/app.log
3. No match → SSH into server-2 → repeat
4. Find a stack trace — but which request caused it?
5. Manually correlate timestamps across 3 files
6. Total time: 30–60 minutes
With the Elastic Stack and structured logging:
1. Alert fires (Kibana rule: level:ERROR count > 5 in 1 minute)
2. Engineer opens Kibana → filters: level:ERROR
3. Clicks on an error → copies requestId
4. Filters: requestId:"abc-123" → all logs for that request appear
5. Root cause visible in the full request context
6. Total time: 2–5 minutes
The key enabler: every log line is structured JSON with consistent, searchable fields — not free-text output from printf statements. This repository shows exactly how that is implemented.
┌──────────────────────────────────────────────────────────────────┐
│ Spring Boot :8080 │
│ │
│ HTTP Request │
│ │ │
│ ▼ │
│ MdcRequestFilter │
│ │ injects: requestId, userId, method, uri → MDC │
│ ▼ │
│ RequestLoggingFilter │
│ │ logs: http_request event (INFO) │
│ ▼ │
│ Controller → Service │
│ │ logs: business events with StructuredArguments │
│ │ (order_created, order_cancelled, order_processing_error) │
│ ▼ │
│ RequestLoggingFilter (after) │
│ │ logs: http_response event with statusCode + durationMs │
│ ▼ │
│ MDC.clear() │
│ │
│ Logback (logstash-logback-encoder) │
│ │ writes structured JSON to TCP socket │
└──────┼───────────────────────────────────────────────────────────┘
│ TCP JSON stream
▼
┌──────────────────────────────────────────────────────────────────┐
│ Logstash :5044 │
│ input: tcp (JSON lines) │
│ filter: parse timestamp, normalize level, │
│ tag slow requests (>500ms), tag server errors │
│ output: elasticsearch index logs-{service}-{yyyy.MM.dd} │
└──────┬───────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Elasticsearch :9200 │
│ index: logs-order-service-2025.04.18 │
│ template: custom field mappings (keyword, integer, date) │
│ ILM: 30-day retention, daily rollover │
└──────┬───────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Kibana :5601 │
│ data view: logs-* │
│ Discover: full-text and field-level search │
│ Dashboards: error rate, latency histogram, event timeline │
└──────────────────────────────────────────────────────────────────┘
| Concept | Implementation |
|---|---|
| Structured JSON logging | logstash-logback-encoder — every log line is valid JSON |
| MDC request tracing | MdcRequestFilter — injects requestId into every log line |
| Request/response logging | RequestLoggingFilter — logs all HTTP events with timing |
| Business event logging | StructuredArguments.keyValue() — searchable key-value fields |
| Log level strategy | DEBUG/INFO/WARN/ERROR with consistent field schemas per level |
| Logstash pipeline | TCP input → filter enrichment → ES output |
| Custom index template | Explicit field type mappings for correct Kibana aggregations |
| ILM policy | Daily rollover, 30-day retention, warm tier after 7 days |
| Kibana data view | Auto-configured logs-* pattern pointing at @timestamp |
| Error simulation endpoint | Generate ERROR logs on demand to test alerting |
| Component | Version | Role |
|---|---|---|
| Spring Boot | 3.2.4 | Application framework |
logstash-logback-encoder |
7.4 | Structured JSON log formatting |
| Elasticsearch | 8.12.2 | Log storage and search |
| Logstash | 8.12.2 | Log ingestion and enrichment |
| Kibana | 8.12.2 | Search, visualization, alerting |
| Micrometer Prometheus | managed | Metrics endpoint for scraping |
| Lombok | — | Boilerplate reduction |
elastic-stack-observability/
├── src/main/java/ir/aliakrami/observability/
│ ├── ObservabilityApplication.java
│ ├── filter/
│ │ ├── MdcRequestFilter.java ← Injects requestId/userId into MDC
│ │ └── RequestLoggingFilter.java ← Logs all HTTP requests/responses
│ ├── controller/
│ │ └── OrderController.java ← REST API + simulate-error endpoint
│ ├── service/
│ │ └── OrderService.java ← Business logic with structured logging
│ └── model/
│ └── Order.java
├── src/main/resources/
│ ├── application.yml ← Logstash host/port/enabled config
│ └── logback-spring.xml ← Logback config: JSON encoder, TCP appender
├── elk/
│ ├── logstash/
│ │ ├── config/logstash.yml ← Logstash main settings
│ │ └── pipeline/order-service.conf ← Pipeline: input → filter → output
│ └── elasticsearch/
│ ├── index-template.json ← Custom field mappings
│ └── ilm-policy.json ← 30-day retention policy
├── scripts/
│ ├── setup-elk.sh ← Registers template, ILM, Kibana view
│ └── generate-logs.sh ← Pumps varied log events for exploration
├── src/test/java/ir/aliakrami/observability/
│ └── ObservabilityApplicationTest.java
├── docker-compose.yml ← Full ELK stack + Spring Boot
├── Dockerfile
└── README.md
- Docker + Docker Compose (8 GB RAM recommended for ELK)
- Java 17+ and Maven (for running tests locally)
curl+bash
git clone https://github.com/aliakrami/elastic-stack-observability.git
cd elastic-stack-observability
docker compose up --buildThe stack takes about 60–90 seconds to fully initialize. Services start in order:
- Elasticsearch → 2. Logstash + Kibana → 3. Spring Boot app
bash scripts/setup-elk.shThis registers the index template, ILM policy, and creates the logs-* Kibana data view automatically.
bash scripts/generate-logs.shThis sends a mix of successful orders, status transitions, 404 requests, and simulated ERROR events — giving you a rich dataset to explore in Kibana immediately.
# App health
curl http://localhost:8080/actuator/health
# Elasticsearch cluster status
curl http://localhost:9200/_cluster/health?pretty
# Logstash monitoring API
curl http://localhost:9600/?pretty
# Kibana
open http://localhost:5601Open http://localhost:5601 → Discover → select logs-* data view.
| Query | What you see |
|---|---|
level: ERROR |
All error events — the first stop in incident investigation |
event: order_created |
Every successful order creation |
event: order_cancelled |
All cancellations — worth monitoring for spikes |
event: http_response |
All HTTP responses with status and duration |
statusCode >= 400 |
All client and server errors |
durationMs > 200 |
Slow requests |
tags: slow_request |
Requests tagged by Logstash as slow (>500ms) |
tags: server_error |
5xx responses tagged by Logstash |
- Find any log entry
- Copy its
requestIdvalue - Run query:
requestId: "paste-id-here" - All log lines for that request appear — including the incoming request, business events, and response time
This is the core MTTD improvement: one query replaces hours of log file grepping.
| Visualization | Fields | Chart type |
|---|---|---|
| Error rate over time | level: ERROR, @timestamp |
Line chart |
| HTTP status distribution | statusCode, count |
Pie chart |
| Top slow endpoints | uri, avg durationMs |
Bar chart |
| Order event timeline | event, @timestamp |
Line chart |
| Cancellation rate | event: order_cancelled vs event: order_created |
Metric |
Every log line produced by the service is a JSON object. Example:
{
"@timestamp": "2025-04-18T10:23:45.123+0330",
"level": "INFO",
"message": "Order created successfully",
"logger": "ir.aliakrami.observability.service.OrderService",
"thread": "http-nio-8080-exec-3",
"service": "order-service",
"environment": "docker",
"requestId": "7f3a1c2d-9b8e-4f1a-b5c6-2d7e3f4a5b6c",
"userId": "customer-1",
"orderId": "a3f1c2d4-...",
"event": "order_created",
"customerId": "customer-1",
"totalAmount": 49.99,
"status": "PENDING"
}| Field | Type | Source | Description |
|---|---|---|---|
@timestamp |
date | Logback encoder | Event timestamp (ISO-8601) |
level |
keyword | Logback | Log level: DEBUG / INFO / WARN / ERROR |
message |
text | Logger call | Human-readable description |
service |
keyword | application.yml |
Service name — constant per deployment |
environment |
keyword | Spring profile | local / docker / production |
requestId |
keyword | MdcRequestFilter |
Unique per HTTP request — use for tracing |
userId |
keyword | MdcRequestFilter |
From X-User-Id header |
orderId |
keyword | MDC.put() in service |
Set during order operations |
event |
keyword | StructuredArguments |
Machine-readable event name |
statusCode |
integer | RequestLoggingFilter |
HTTP response status |
durationMs |
long | RequestLoggingFilter |
Request processing time |
method |
keyword | MdcRequestFilter |
HTTP method |
uri |
keyword | MdcRequestFilter |
Request path |
MDC (Mapped Diagnostic Context) is a thread-local key-value store built into SLF4J. When you call MDC.put("requestId", "abc-123"), that value appears in every subsequent log line on that thread — not just the line where you wrote it.
Request arrives
│
▼
MdcRequestFilter.doFilter()
MDC.put("requestId", "7f3a...")
MDC.put("userId", "customer-1")
│
▼
Controller method runs
log.info("Processing..") → JSON includes requestId + userId ✓
│
▼
Service method runs
log.debug("Validating") → JSON includes requestId + userId ✓
log.info("Order created") → JSON includes requestId + userId ✓
MDC.put("orderId", "a3f1...") → orderId also appears now ✓
│
▼
Response sent
RequestLoggingFilter logs response → JSON includes requestId + userId ✓
│
▼
finally: MDC.clear()
Thread returns to pool — MDC is clean for the next request
Because every log line in a request carries the same requestId, filtering on it in Kibana shows the complete picture of what happened during that request — across all classes, all layers, all log levels.
The pipeline at elk/logstash/pipeline/order-service.conf does three things:
Input: Accepts JSON lines over TCP from Spring Boot's LogstashTcpSocketAppender. Alternatively accepts Beats input (Filebeat) on port 5045.
Filter:
- Parses
@timestampusing the date filter (uses app time, not ingestion time) - Uppercases
levelfor consistent filtering - Converts
durationMsandstatusCodeto numeric types for aggregations - Tags slow requests (
durationMs > 500) withslow_request - Tags server errors (
statusCode >= 500) withserver_error - Adds
ingest_timestampandlogstash_hostmetadata fields
Output: Writes to daily rolling Elasticsearch indices: logs-order-service-2025.04.18
The template at elk/elasticsearch/index-template.json applies to all logs-* indices and defines explicit field types.
Why explicit mappings matter: Without them, Elasticsearch uses dynamic mapping and may index statusCode as text instead of integer — breaking range queries like statusCode >= 500 and numeric aggregations in Kibana.
Key mappings:
level,service,requestId,orderId,event,uri→keyword(exact match, aggregatable)message→text+.keywordsub-field (full-text search + exact match)statusCode→integer(range queries, average aggregations)durationMs→long(percentile aggregations for P95 latency)@timestamp→date(time-series queries, Kibana time filter)
| Method | Endpoint | Description |
|---|---|---|
GET |
/actuator/health |
Health check |
POST |
/api/v1/orders |
Create an order |
GET |
/api/v1/orders |
List all orders |
GET |
/api/v1/orders/{orderId} |
Get an order |
PATCH |
/api/v1/orders/{orderId}/status |
Update order status |
POST |
/api/v1/orders/{orderId}/simulate-error |
Generate an ERROR log entry |
curl -s -X POST http://localhost:8080/api/v1/orders \
-H "Content-Type: application/json" \
-H "X-User-Id: customer-123" \
-d '{
"customerId": "customer-123",
"productId": "product-456",
"quantity": 2,
"totalAmount": 99.99
}' | jq .curl -s -X POST \
"http://localhost:8080/api/v1/orders/order-001/simulate-error?type=PAYMENT_TIMEOUT"Then in Kibana: level: ERROR → the error appears with full context.
# Set OrderService to DEBUG
curl -s -X POST http://localhost:8080/actuator/loggers/ir.aliakrami.observability.service \
-H "Content-Type: application/json" \
-d '{"configuredLevel": "DEBUG"}'
# Revert to INFO
curl -s -X POST http://localhost:8080/actuator/loggers/ir.aliakrami.observability.service \
-H "Content-Type: application/json" \
-d '{"configuredLevel": "INFO"}'No ELK stack required — the tests run against the Spring Boot app only.
./mvnw testTests verify: order creation, retrieval, 404 handling, status updates, error simulation, X-Request-Id header injection, and MDC filter behaviour.
| Environment Variable | Default | Description |
|---|---|---|
LOGSTASH_HOST |
localhost |
Logstash hostname |
LOGSTASH_PORT |
5044 |
Logstash TCP port |
LOGSTASH_ENABLED |
false |
Set to true to ship logs to Logstash |
SPRING_PROFILES_ACTIVE |
(none) | Set to docker for JSON console output |
# Human-readable console logs, no Logstash connection
./mvnw spring-boot:run# Start only ELK (not the app container)
docker compose up elasticsearch logstash kibana
# Run the app pointing at Logstash
LOGSTASH_ENABLED=true SPRING_PROFILES_ACTIVE=docker ./mvnw spring-boot:runSecurity: The Docker Compose stack disables Elasticsearch security for local development simplicity (xpack.security.enabled=false). In production, enable TLS and use API keys or username/password authentication in the Logstash output.
Async logging: The Logstash appender is wrapped in an AsyncAppender with a 512-message queue. If Logstash is unreachable, logs queue in memory — the application never blocks. Set discardingThreshold=0 to ensure no logs are dropped even under high load (already configured).
Log volume: DEBUG logs are excluded from the Logstash appender in production by setting the root logger to INFO. Keep DEBUG only for local development.
Index lifecycle: The ILM policy rolls indices daily and deletes after 30 days. Adjust min_age in ilm-policy.json to match your compliance and storage requirements.
Alerting: In Kibana → Stack Management → Rules, create a threshold rule: trigger when level: ERROR count exceeds N in 5 minutes. Connect to email, Slack, or PagerDuty.
Correlation with metrics: Pair this with a Prometheus + Grafana stack (see management.endpoints in application.yml) to correlate error spikes in logs with latency spikes in metrics.
| Goal | What to add |
|---|---|
| Distributed tracing | Add Micrometer Tracing + Zipkin; traceId will populate automatically in logs |
| Filebeat (file-based shipping) | Add a Filebeat container reading container logs from /var/lib/docker/containers |
| Alerting | Configure Kibana Rules: level:ERROR count > 5 → Slack/email webhook |
| Multi-service logs | Add a second service; logs from both appear in logs-* filtered by service field |
| Kibana dashboard export | Export saved dashboards as NDJSON and commit to elk/kibana/dashboards/ |
| APM | Add Elastic APM agent for distributed tracing integrated with logs |
- order-service — Spring Boot microservice skeleton with Kafka integration
- kafka-dead-letter-retry — Production-grade Kafka retry and dead-letter handling
- keycloak-sso-spring-boot — OAuth2 resource server with Keycloak RBAC
- apisix-gateway-example — API gateway with JWT auth and rate limiting
Ali Akrami — Senior Backend Engineer specializing in Java, microservices, distributed systems, and cloud-native architecture.