elastic-stack-observability

A Spring Boot service fully wired to the Elastic Stack (Elasticsearch + Logstash + Kibana) via Docker Compose. Demonstrates production-grade structured JSON logging, MDC-based request tracing, a custom Elasticsearch index template, and a Logstash pipeline — the exact observability setup that reduced MTTD (Mean Time To Detect) production incidents from ~45 minutes to ~5 minutes in a banking microservice platform.

The MTTD Problem

Before centralized logging, detecting a production incident typically looked like this:

1. User reports an error
2. Engineer SSHs into server-1 → grep through /var/log/app.log
3. No match → SSH into server-2 → repeat
4. Find a stack trace — but which request caused it?
5. Manually correlate timestamps across 3 files
6. Total time: 30–60 minutes

With the Elastic Stack and structured logging:

1. Alert fires (Kibana rule: level:ERROR count > 5 in 1 minute)
2. Engineer opens Kibana → filters: level:ERROR
3. Clicks on an error → copies requestId
4. Filters: requestId:"abc-123" → all logs for that request appear
5. Root cause visible in the full request context
6. Total time: 2–5 minutes

The key enabler: every log line is structured JSON with consistent, searchable fields — not free-text output from printf statements. This repository shows exactly how that is implemented.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     Spring Boot :8080                            │
│                                                                  │
│  HTTP Request                                                    │
│      │                                                           │
│      ▼                                                           │
│  MdcRequestFilter                                                │
│      │  injects: requestId, userId, method, uri → MDC           │
│      ▼                                                           │
│  RequestLoggingFilter                                            │
│      │  logs: http_request event (INFO)                          │
│      ▼                                                           │
│  Controller → Service                                            │
│      │  logs: business events with StructuredArguments           │
│      │  (order_created, order_cancelled, order_processing_error) │
│      ▼                                                           │
│  RequestLoggingFilter (after)                                    │
│      │  logs: http_response event with statusCode + durationMs  │
│      ▼                                                           │
│  MDC.clear()                                                     │
│                                                                  │
│  Logback (logstash-logback-encoder)                              │
│      │  writes structured JSON to TCP socket                     │
└──────┼───────────────────────────────────────────────────────────┘
       │ TCP JSON stream
       ▼
┌──────────────────────────────────────────────────────────────────┐
│                     Logstash :5044                               │
│  input:  tcp (JSON lines)                                        │
│  filter: parse timestamp, normalize level,                       │
│          tag slow requests (>500ms), tag server errors           │
│  output: elasticsearch index logs-{service}-{yyyy.MM.dd}         │
└──────┬───────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────┐
│              Elasticsearch :9200                                 │
│  index:    logs-order-service-2025.04.18                         │
│  template: custom field mappings (keyword, integer, date)        │
│  ILM:      30-day retention, daily rollover                      │
└──────┬───────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────┐
│                     Kibana :5601                                 │
│  data view:  logs-*                                              │
│  Discover:   full-text and field-level search                    │
│  Dashboards: error rate, latency histogram, event timeline       │
└──────────────────────────────────────────────────────────────────┘

What This Demonstrates

Concept	Implementation
Structured JSON logging	`logstash-logback-encoder` — every log line is valid JSON
MDC request tracing	`MdcRequestFilter` — injects `requestId` into every log line
Request/response logging	`RequestLoggingFilter` — logs all HTTP events with timing
Business event logging	`StructuredArguments.keyValue()` — searchable key-value fields
Log level strategy	DEBUG/INFO/WARN/ERROR with consistent field schemas per level
Logstash pipeline	TCP input → filter enrichment → ES output
Custom index template	Explicit field type mappings for correct Kibana aggregations
ILM policy	Daily rollover, 30-day retention, warm tier after 7 days
Kibana data view	Auto-configured `logs-*` pattern pointing at `@timestamp`
Error simulation endpoint	Generate ERROR logs on demand to test alerting

Tech Stack

Component	Version	Role
Spring Boot	3.2.4	Application framework
`logstash-logback-encoder`	7.4	Structured JSON log formatting
Elasticsearch	8.12.2	Log storage and search
Logstash	8.12.2	Log ingestion and enrichment
Kibana	8.12.2	Search, visualization, alerting
Micrometer Prometheus	managed	Metrics endpoint for scraping
Lombok	—	Boilerplate reduction

Project Structure

elastic-stack-observability/
├── src/main/java/ir/aliakrami/observability/
│   ├── ObservabilityApplication.java
│   ├── filter/
│   │   ├── MdcRequestFilter.java         ← Injects requestId/userId into MDC
│   │   └── RequestLoggingFilter.java     ← Logs all HTTP requests/responses
│   ├── controller/
│   │   └── OrderController.java          ← REST API + simulate-error endpoint
│   ├── service/
│   │   └── OrderService.java             ← Business logic with structured logging
│   └── model/
│       └── Order.java
├── src/main/resources/
│   ├── application.yml                   ← Logstash host/port/enabled config
│   └── logback-spring.xml               ← Logback config: JSON encoder, TCP appender
├── elk/
│   ├── logstash/
│   │   ├── config/logstash.yml           ← Logstash main settings
│   │   └── pipeline/order-service.conf  ← Pipeline: input → filter → output
│   └── elasticsearch/
│       ├── index-template.json           ← Custom field mappings
│       └── ilm-policy.json              ← 30-day retention policy
├── scripts/
│   ├── setup-elk.sh                      ← Registers template, ILM, Kibana view
│   └── generate-logs.sh                 ← Pumps varied log events for exploration
├── src/test/java/ir/aliakrami/observability/
│   └── ObservabilityApplicationTest.java
├── docker-compose.yml                    ← Full ELK stack + Spring Boot
├── Dockerfile
└── README.md

Getting Started

Prerequisites

Docker + Docker Compose (8 GB RAM recommended for ELK)
Java 17+ and Maven (for running tests locally)
curl + bash

Start the full stack

git clone https://github.com/aliakrami/elastic-stack-observability.git
cd elastic-stack-observability

docker compose up --build

The stack takes about 60–90 seconds to fully initialize. Services start in order:

Elasticsearch → 2. Logstash + Kibana → 3. Spring Boot app

Register index template and Kibana data view

bash scripts/setup-elk.sh

This registers the index template, ILM policy, and creates the logs-* Kibana data view automatically.

Generate log data

bash scripts/generate-logs.sh

This sends a mix of successful orders, status transitions, 404 requests, and simulated ERROR events — giving you a rich dataset to explore in Kibana immediately.

Verify everything is running

# App health
curl http://localhost:8080/actuator/health

# Elasticsearch cluster status
curl http://localhost:9200/_cluster/health?pretty

# Logstash monitoring API
curl http://localhost:9600/?pretty

# Kibana
open http://localhost:5601

Explore in Kibana

Open http://localhost:5601 → Discover → select logs-* data view.

Kibana queries to try

Query	What you see
`level: ERROR`	All error events — the first stop in incident investigation
`event: order_created`	Every successful order creation
`event: order_cancelled`	All cancellations — worth monitoring for spikes
`event: http_response`	All HTTP responses with status and duration
`statusCode >= 400`	All client and server errors
`durationMs > 200`	Slow requests
`tags: slow_request`	Requests tagged by Logstash as slow (>500ms)
`tags: server_error`	5xx responses tagged by Logstash

Trace a single request end-to-end

Find any log entry
Copy its requestId value
Run query: requestId: "paste-id-here"
All log lines for that request appear — including the incoming request, business events, and response time

This is the core MTTD improvement: one query replaces hours of log file grepping.

Suggested Kibana visualizations to build

Visualization	Fields	Chart type
Error rate over time	`level: ERROR`, `@timestamp`	Line chart
HTTP status distribution	`statusCode`, count	Pie chart
Top slow endpoints	`uri`, avg `durationMs`	Bar chart
Order event timeline	`event`, `@timestamp`	Line chart
Cancellation rate	`event: order_cancelled` vs `event: order_created`	Metric

Log Structure

Every log line produced by the service is a JSON object. Example:

{
  "@timestamp":  "2025-04-18T10:23:45.123+0330",
  "level":       "INFO",
  "message":     "Order created successfully",
  "logger":      "ir.aliakrami.observability.service.OrderService",
  "thread":      "http-nio-8080-exec-3",
  "service":     "order-service",
  "environment": "docker",
  "requestId":   "7f3a1c2d-9b8e-4f1a-b5c6-2d7e3f4a5b6c",
  "userId":      "customer-1",
  "orderId":     "a3f1c2d4-...",
  "event":       "order_created",
  "customerId":  "customer-1",
  "totalAmount": 49.99,
  "status":      "PENDING"
}

Field reference

Field	Type	Source	Description
`@timestamp`	date	Logback encoder	Event timestamp (ISO-8601)
`level`	keyword	Logback	Log level: DEBUG / INFO / WARN / ERROR
`message`	text	Logger call	Human-readable description
`service`	keyword	`application.yml`	Service name — constant per deployment
`environment`	keyword	Spring profile	local / docker / production
`requestId`	keyword	`MdcRequestFilter`	Unique per HTTP request — use for tracing
`userId`	keyword	`MdcRequestFilter`	From `X-User-Id` header
`orderId`	keyword	`MDC.put()` in service	Set during order operations
`event`	keyword	`StructuredArguments`	Machine-readable event name
`statusCode`	integer	`RequestLoggingFilter`	HTTP response status
`durationMs`	long	`RequestLoggingFilter`	Request processing time
`method`	keyword	`MdcRequestFilter`	HTTP method
`uri`	keyword	`MdcRequestFilter`	Request path

How MDC Tracing Works

MDC (Mapped Diagnostic Context) is a thread-local key-value store built into SLF4J. When you call MDC.put("requestId", "abc-123"), that value appears in every subsequent log line on that thread — not just the line where you wrote it.

Request arrives
    │
    ▼
MdcRequestFilter.doFilter()
    MDC.put("requestId", "7f3a...")
    MDC.put("userId",    "customer-1")
    │
    ▼
Controller method runs
    log.info("Processing..") → JSON includes requestId + userId ✓
    │
    ▼
Service method runs
    log.debug("Validating") → JSON includes requestId + userId ✓
    log.info("Order created") → JSON includes requestId + userId ✓
    MDC.put("orderId", "a3f1...") → orderId also appears now ✓
    │
    ▼
Response sent
    RequestLoggingFilter logs response → JSON includes requestId + userId ✓
    │
    ▼
finally: MDC.clear()
    Thread returns to pool — MDC is clean for the next request

Because every log line in a request carries the same requestId, filtering on it in Kibana shows the complete picture of what happened during that request — across all classes, all layers, all log levels.

Logstash Pipeline

The pipeline at elk/logstash/pipeline/order-service.conf does three things:

Input: Accepts JSON lines over TCP from Spring Boot's LogstashTcpSocketAppender. Alternatively accepts Beats input (Filebeat) on port 5045.

Filter:

Parses @timestamp using the date filter (uses app time, not ingestion time)
Uppercases level for consistent filtering
Converts durationMs and statusCode to numeric types for aggregations
Tags slow requests (durationMs > 500) with slow_request
Tags server errors (statusCode >= 500) with server_error
Adds ingest_timestamp and logstash_host metadata fields

Output: Writes to daily rolling Elasticsearch indices: logs-order-service-2025.04.18

Elasticsearch Index Template

The template at elk/elasticsearch/index-template.json applies to all logs-* indices and defines explicit field types.

Why explicit mappings matter: Without them, Elasticsearch uses dynamic mapping and may index statusCode as text instead of integer — breaking range queries like statusCode >= 500 and numeric aggregations in Kibana.

Key mappings:

level, service, requestId, orderId, event, uri → keyword (exact match, aggregatable)
message → text + .keyword sub-field (full-text search + exact match)
statusCode → integer (range queries, average aggregations)
durationMs → long (percentile aggregations for P95 latency)
@timestamp → date (time-series queries, Kibana time filter)

API Reference

Method	Endpoint	Description
`GET`	`/actuator/health`	Health check
`POST`	`/api/v1/orders`	Create an order
`GET`	`/api/v1/orders`	List all orders
`GET`	`/api/v1/orders/{orderId}`	Get an order
`PATCH`	`/api/v1/orders/{orderId}/status`	Update order status
`POST`	`/api/v1/orders/{orderId}/simulate-error`	Generate an ERROR log entry

Create order

curl -s -X POST http://localhost:8080/api/v1/orders \
  -H "Content-Type: application/json" \
  -H "X-User-Id: customer-123" \
  -d '{
    "customerId": "customer-123",
    "productId":  "product-456",
    "quantity":   2,
    "totalAmount": 99.99
  }' | jq .

Simulate an error (for Kibana alerting demo)

curl -s -X POST \
  "http://localhost:8080/api/v1/orders/order-001/simulate-error?type=PAYMENT_TIMEOUT"

Then in Kibana: level: ERROR → the error appears with full context.

Change log level at runtime (no restart needed)

# Set OrderService to DEBUG
curl -s -X POST http://localhost:8080/actuator/loggers/ir.aliakrami.observability.service \
  -H "Content-Type: application/json" \
  -d '{"configuredLevel": "DEBUG"}'

# Revert to INFO
curl -s -X POST http://localhost:8080/actuator/loggers/ir.aliakrami.observability.service \
  -H "Content-Type: application/json" \
  -d '{"configuredLevel": "INFO"}'

Running Tests

No ELK stack required — the tests run against the Spring Boot app only.

./mvnw test

Tests verify: order creation, retrieval, 404 handling, status updates, error simulation, X-Request-Id header injection, and MDC filter behaviour.

Configuration Reference

Environment Variable	Default	Description
`LOGSTASH_HOST`	`localhost`	Logstash hostname
`LOGSTASH_PORT`	`5044`	Logstash TCP port
`LOGSTASH_ENABLED`	`false`	Set to `true` to ship logs to Logstash
`SPRING_PROFILES_ACTIVE`	(none)	Set to `docker` for JSON console output

Run locally without ELK

# Human-readable console logs, no Logstash connection
./mvnw spring-boot:run

Run locally with ELK

# Start only ELK (not the app container)
docker compose up elasticsearch logstash kibana

# Run the app pointing at Logstash
LOGSTASH_ENABLED=true SPRING_PROFILES_ACTIVE=docker ./mvnw spring-boot:run

Production Considerations

Security: The Docker Compose stack disables Elasticsearch security for local development simplicity (xpack.security.enabled=false). In production, enable TLS and use API keys or username/password authentication in the Logstash output.

Async logging: The Logstash appender is wrapped in an AsyncAppender with a 512-message queue. If Logstash is unreachable, logs queue in memory — the application never blocks. Set discardingThreshold=0 to ensure no logs are dropped even under high load (already configured).

Log volume: DEBUG logs are excluded from the Logstash appender in production by setting the root logger to INFO. Keep DEBUG only for local development.

Index lifecycle: The ILM policy rolls indices daily and deletes after 30 days. Adjust min_age in ilm-policy.json to match your compliance and storage requirements.

Alerting: In Kibana → Stack Management → Rules, create a threshold rule: trigger when level: ERROR count exceeds N in 5 minutes. Connect to email, Slack, or PagerDuty.

Correlation with metrics: Pair this with a Prometheus + Grafana stack (see management.endpoints in application.yml) to correlate error spikes in logs with latency spikes in metrics.

Extending This Example

Goal	What to add
Distributed tracing	Add Micrometer Tracing + Zipkin; `traceId` will populate automatically in logs
Filebeat (file-based shipping)	Add a Filebeat container reading container logs from `/var/lib/docker/containers`
Alerting	Configure Kibana Rules: `level:ERROR count > 5` → Slack/email webhook
Multi-service logs	Add a second service; logs from both appear in `logs-*` filtered by `service` field
Kibana dashboard export	Export saved dashboards as NDJSON and commit to `elk/kibana/dashboards/`
APM	Add Elastic APM agent for distributed tracing integrated with logs

Related Projects

order-service — Spring Boot microservice skeleton with Kafka integration
kafka-dead-letter-retry — Production-grade Kafka retry and dead-letter handling
keycloak-sso-spring-boot — OAuth2 resource server with Keycloak RBAC
apisix-gateway-example — API gateway with JWT auth and rate limiting

Author

Ali Akrami — Senior Backend Engineer specializing in Java, microservices, distributed systems, and cloud-native architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
elk		elk
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation