Skip to content

aliakrami/elastic-stack-observability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

elastic-stack-observability

A Spring Boot service fully wired to the Elastic Stack (Elasticsearch + Logstash + Kibana) via Docker Compose. Demonstrates production-grade structured JSON logging, MDC-based request tracing, a custom Elasticsearch index template, and a Logstash pipeline — the exact observability setup that reduced MTTD (Mean Time To Detect) production incidents from ~45 minutes to ~5 minutes in a banking microservice platform.


Table of Contents


The MTTD Problem

Before centralized logging, detecting a production incident typically looked like this:

1. User reports an error
2. Engineer SSHs into server-1 → grep through /var/log/app.log
3. No match → SSH into server-2 → repeat
4. Find a stack trace — but which request caused it?
5. Manually correlate timestamps across 3 files
6. Total time: 30–60 minutes

With the Elastic Stack and structured logging:

1. Alert fires (Kibana rule: level:ERROR count > 5 in 1 minute)
2. Engineer opens Kibana → filters: level:ERROR
3. Clicks on an error → copies requestId
4. Filters: requestId:"abc-123" → all logs for that request appear
5. Root cause visible in the full request context
6. Total time: 2–5 minutes

The key enabler: every log line is structured JSON with consistent, searchable fields — not free-text output from printf statements. This repository shows exactly how that is implemented.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     Spring Boot :8080                            │
│                                                                  │
│  HTTP Request                                                    │
│      │                                                           │
│      ▼                                                           │
│  MdcRequestFilter                                                │
│      │  injects: requestId, userId, method, uri → MDC           │
│      ▼                                                           │
│  RequestLoggingFilter                                            │
│      │  logs: http_request event (INFO)                          │
│      ▼                                                           │
│  Controller → Service                                            │
│      │  logs: business events with StructuredArguments           │
│      │  (order_created, order_cancelled, order_processing_error) │
│      ▼                                                           │
│  RequestLoggingFilter (after)                                    │
│      │  logs: http_response event with statusCode + durationMs  │
│      ▼                                                           │
│  MDC.clear()                                                     │
│                                                                  │
│  Logback (logstash-logback-encoder)                              │
│      │  writes structured JSON to TCP socket                     │
└──────┼───────────────────────────────────────────────────────────┘
       │ TCP JSON stream
       ▼
┌──────────────────────────────────────────────────────────────────┐
│                     Logstash :5044                               │
│  input:  tcp (JSON lines)                                        │
│  filter: parse timestamp, normalize level,                       │
│          tag slow requests (>500ms), tag server errors           │
│  output: elasticsearch index logs-{service}-{yyyy.MM.dd}         │
└──────┬───────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────┐
│              Elasticsearch :9200                                 │
│  index:    logs-order-service-2025.04.18                         │
│  template: custom field mappings (keyword, integer, date)        │
│  ILM:      30-day retention, daily rollover                      │
└──────┬───────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────┐
│                     Kibana :5601                                 │
│  data view:  logs-*                                              │
│  Discover:   full-text and field-level search                    │
│  Dashboards: error rate, latency histogram, event timeline       │
└──────────────────────────────────────────────────────────────────┘

What This Demonstrates

Concept Implementation
Structured JSON logging logstash-logback-encoder — every log line is valid JSON
MDC request tracing MdcRequestFilter — injects requestId into every log line
Request/response logging RequestLoggingFilter — logs all HTTP events with timing
Business event logging StructuredArguments.keyValue() — searchable key-value fields
Log level strategy DEBUG/INFO/WARN/ERROR with consistent field schemas per level
Logstash pipeline TCP input → filter enrichment → ES output
Custom index template Explicit field type mappings for correct Kibana aggregations
ILM policy Daily rollover, 30-day retention, warm tier after 7 days
Kibana data view Auto-configured logs-* pattern pointing at @timestamp
Error simulation endpoint Generate ERROR logs on demand to test alerting

Tech Stack

Component Version Role
Spring Boot 3.2.4 Application framework
logstash-logback-encoder 7.4 Structured JSON log formatting
Elasticsearch 8.12.2 Log storage and search
Logstash 8.12.2 Log ingestion and enrichment
Kibana 8.12.2 Search, visualization, alerting
Micrometer Prometheus managed Metrics endpoint for scraping
Lombok Boilerplate reduction

Project Structure

elastic-stack-observability/
├── src/main/java/ir/aliakrami/observability/
│   ├── ObservabilityApplication.java
│   ├── filter/
│   │   ├── MdcRequestFilter.java         ← Injects requestId/userId into MDC
│   │   └── RequestLoggingFilter.java     ← Logs all HTTP requests/responses
│   ├── controller/
│   │   └── OrderController.java          ← REST API + simulate-error endpoint
│   ├── service/
│   │   └── OrderService.java             ← Business logic with structured logging
│   └── model/
│       └── Order.java
├── src/main/resources/
│   ├── application.yml                   ← Logstash host/port/enabled config
│   └── logback-spring.xml               ← Logback config: JSON encoder, TCP appender
├── elk/
│   ├── logstash/
│   │   ├── config/logstash.yml           ← Logstash main settings
│   │   └── pipeline/order-service.conf  ← Pipeline: input → filter → output
│   └── elasticsearch/
│       ├── index-template.json           ← Custom field mappings
│       └── ilm-policy.json              ← 30-day retention policy
├── scripts/
│   ├── setup-elk.sh                      ← Registers template, ILM, Kibana view
│   └── generate-logs.sh                 ← Pumps varied log events for exploration
├── src/test/java/ir/aliakrami/observability/
│   └── ObservabilityApplicationTest.java
├── docker-compose.yml                    ← Full ELK stack + Spring Boot
├── Dockerfile
└── README.md

Getting Started

Prerequisites

  • Docker + Docker Compose (8 GB RAM recommended for ELK)
  • Java 17+ and Maven (for running tests locally)
  • curl + bash

Start the full stack

git clone https://github.com/aliakrami/elastic-stack-observability.git
cd elastic-stack-observability

docker compose up --build

The stack takes about 60–90 seconds to fully initialize. Services start in order:

  1. Elasticsearch → 2. Logstash + Kibana → 3. Spring Boot app

Register index template and Kibana data view

bash scripts/setup-elk.sh

This registers the index template, ILM policy, and creates the logs-* Kibana data view automatically.

Generate log data

bash scripts/generate-logs.sh

This sends a mix of successful orders, status transitions, 404 requests, and simulated ERROR events — giving you a rich dataset to explore in Kibana immediately.

Verify everything is running

# App health
curl http://localhost:8080/actuator/health

# Elasticsearch cluster status
curl http://localhost:9200/_cluster/health?pretty

# Logstash monitoring API
curl http://localhost:9600/?pretty

# Kibana
open http://localhost:5601

Explore in Kibana

Open http://localhost:5601Discover → select logs-* data view.

Kibana queries to try

Query What you see
level: ERROR All error events — the first stop in incident investigation
event: order_created Every successful order creation
event: order_cancelled All cancellations — worth monitoring for spikes
event: http_response All HTTP responses with status and duration
statusCode >= 400 All client and server errors
durationMs > 200 Slow requests
tags: slow_request Requests tagged by Logstash as slow (>500ms)
tags: server_error 5xx responses tagged by Logstash

Trace a single request end-to-end

  1. Find any log entry
  2. Copy its requestId value
  3. Run query: requestId: "paste-id-here"
  4. All log lines for that request appear — including the incoming request, business events, and response time

This is the core MTTD improvement: one query replaces hours of log file grepping.

Suggested Kibana visualizations to build

Visualization Fields Chart type
Error rate over time level: ERROR, @timestamp Line chart
HTTP status distribution statusCode, count Pie chart
Top slow endpoints uri, avg durationMs Bar chart
Order event timeline event, @timestamp Line chart
Cancellation rate event: order_cancelled vs event: order_created Metric

Log Structure

Every log line produced by the service is a JSON object. Example:

{
  "@timestamp":  "2025-04-18T10:23:45.123+0330",
  "level":       "INFO",
  "message":     "Order created successfully",
  "logger":      "ir.aliakrami.observability.service.OrderService",
  "thread":      "http-nio-8080-exec-3",
  "service":     "order-service",
  "environment": "docker",
  "requestId":   "7f3a1c2d-9b8e-4f1a-b5c6-2d7e3f4a5b6c",
  "userId":      "customer-1",
  "orderId":     "a3f1c2d4-...",
  "event":       "order_created",
  "customerId":  "customer-1",
  "totalAmount": 49.99,
  "status":      "PENDING"
}

Field reference

Field Type Source Description
@timestamp date Logback encoder Event timestamp (ISO-8601)
level keyword Logback Log level: DEBUG / INFO / WARN / ERROR
message text Logger call Human-readable description
service keyword application.yml Service name — constant per deployment
environment keyword Spring profile local / docker / production
requestId keyword MdcRequestFilter Unique per HTTP request — use for tracing
userId keyword MdcRequestFilter From X-User-Id header
orderId keyword MDC.put() in service Set during order operations
event keyword StructuredArguments Machine-readable event name
statusCode integer RequestLoggingFilter HTTP response status
durationMs long RequestLoggingFilter Request processing time
method keyword MdcRequestFilter HTTP method
uri keyword MdcRequestFilter Request path

How MDC Tracing Works

MDC (Mapped Diagnostic Context) is a thread-local key-value store built into SLF4J. When you call MDC.put("requestId", "abc-123"), that value appears in every subsequent log line on that thread — not just the line where you wrote it.

Request arrives
    │
    ▼
MdcRequestFilter.doFilter()
    MDC.put("requestId", "7f3a...")
    MDC.put("userId",    "customer-1")
    │
    ▼
Controller method runs
    log.info("Processing..") → JSON includes requestId + userId ✓
    │
    ▼
Service method runs
    log.debug("Validating") → JSON includes requestId + userId ✓
    log.info("Order created") → JSON includes requestId + userId ✓
    MDC.put("orderId", "a3f1...") → orderId also appears now ✓
    │
    ▼
Response sent
    RequestLoggingFilter logs response → JSON includes requestId + userId ✓
    │
    ▼
finally: MDC.clear()
    Thread returns to pool — MDC is clean for the next request

Because every log line in a request carries the same requestId, filtering on it in Kibana shows the complete picture of what happened during that request — across all classes, all layers, all log levels.


Logstash Pipeline

The pipeline at elk/logstash/pipeline/order-service.conf does three things:

Input: Accepts JSON lines over TCP from Spring Boot's LogstashTcpSocketAppender. Alternatively accepts Beats input (Filebeat) on port 5045.

Filter:

  • Parses @timestamp using the date filter (uses app time, not ingestion time)
  • Uppercases level for consistent filtering
  • Converts durationMs and statusCode to numeric types for aggregations
  • Tags slow requests (durationMs > 500) with slow_request
  • Tags server errors (statusCode >= 500) with server_error
  • Adds ingest_timestamp and logstash_host metadata fields

Output: Writes to daily rolling Elasticsearch indices: logs-order-service-2025.04.18


Elasticsearch Index Template

The template at elk/elasticsearch/index-template.json applies to all logs-* indices and defines explicit field types.

Why explicit mappings matter: Without them, Elasticsearch uses dynamic mapping and may index statusCode as text instead of integer — breaking range queries like statusCode >= 500 and numeric aggregations in Kibana.

Key mappings:

  • level, service, requestId, orderId, event, urikeyword (exact match, aggregatable)
  • messagetext + .keyword sub-field (full-text search + exact match)
  • statusCodeinteger (range queries, average aggregations)
  • durationMslong (percentile aggregations for P95 latency)
  • @timestampdate (time-series queries, Kibana time filter)

API Reference

Method Endpoint Description
GET /actuator/health Health check
POST /api/v1/orders Create an order
GET /api/v1/orders List all orders
GET /api/v1/orders/{orderId} Get an order
PATCH /api/v1/orders/{orderId}/status Update order status
POST /api/v1/orders/{orderId}/simulate-error Generate an ERROR log entry

Create order

curl -s -X POST http://localhost:8080/api/v1/orders \
  -H "Content-Type: application/json" \
  -H "X-User-Id: customer-123" \
  -d '{
    "customerId": "customer-123",
    "productId":  "product-456",
    "quantity":   2,
    "totalAmount": 99.99
  }' | jq .

Simulate an error (for Kibana alerting demo)

curl -s -X POST \
  "http://localhost:8080/api/v1/orders/order-001/simulate-error?type=PAYMENT_TIMEOUT"

Then in Kibana: level: ERROR → the error appears with full context.

Change log level at runtime (no restart needed)

# Set OrderService to DEBUG
curl -s -X POST http://localhost:8080/actuator/loggers/ir.aliakrami.observability.service \
  -H "Content-Type: application/json" \
  -d '{"configuredLevel": "DEBUG"}'

# Revert to INFO
curl -s -X POST http://localhost:8080/actuator/loggers/ir.aliakrami.observability.service \
  -H "Content-Type: application/json" \
  -d '{"configuredLevel": "INFO"}'

Running Tests

No ELK stack required — the tests run against the Spring Boot app only.

./mvnw test

Tests verify: order creation, retrieval, 404 handling, status updates, error simulation, X-Request-Id header injection, and MDC filter behaviour.


Configuration Reference

Environment Variable Default Description
LOGSTASH_HOST localhost Logstash hostname
LOGSTASH_PORT 5044 Logstash TCP port
LOGSTASH_ENABLED false Set to true to ship logs to Logstash
SPRING_PROFILES_ACTIVE (none) Set to docker for JSON console output

Run locally without ELK

# Human-readable console logs, no Logstash connection
./mvnw spring-boot:run

Run locally with ELK

# Start only ELK (not the app container)
docker compose up elasticsearch logstash kibana

# Run the app pointing at Logstash
LOGSTASH_ENABLED=true SPRING_PROFILES_ACTIVE=docker ./mvnw spring-boot:run

Production Considerations

Security: The Docker Compose stack disables Elasticsearch security for local development simplicity (xpack.security.enabled=false). In production, enable TLS and use API keys or username/password authentication in the Logstash output.

Async logging: The Logstash appender is wrapped in an AsyncAppender with a 512-message queue. If Logstash is unreachable, logs queue in memory — the application never blocks. Set discardingThreshold=0 to ensure no logs are dropped even under high load (already configured).

Log volume: DEBUG logs are excluded from the Logstash appender in production by setting the root logger to INFO. Keep DEBUG only for local development.

Index lifecycle: The ILM policy rolls indices daily and deletes after 30 days. Adjust min_age in ilm-policy.json to match your compliance and storage requirements.

Alerting: In Kibana → Stack Management → Rules, create a threshold rule: trigger when level: ERROR count exceeds N in 5 minutes. Connect to email, Slack, or PagerDuty.

Correlation with metrics: Pair this with a Prometheus + Grafana stack (see management.endpoints in application.yml) to correlate error spikes in logs with latency spikes in metrics.


Extending This Example

Goal What to add
Distributed tracing Add Micrometer Tracing + Zipkin; traceId will populate automatically in logs
Filebeat (file-based shipping) Add a Filebeat container reading container logs from /var/lib/docker/containers
Alerting Configure Kibana Rules: level:ERROR count > 5 → Slack/email webhook
Multi-service logs Add a second service; logs from both appear in logs-* filtered by service field
Kibana dashboard export Export saved dashboards as NDJSON and commit to elk/kibana/dashboards/
APM Add Elastic APM agent for distributed tracing integrated with logs

Related Projects


Author

Ali Akrami — Senior Backend Engineer specializing in Java, microservices, distributed systems, and cloud-native architecture.

LinkedIn

About

Spring Boot service with structured JSON logging shipped to Elastic Stack via Logstash. Demonstrates centralized observability: log aggregation, search, and Kibana dashboards.

Topics

Resources

Stars

Watchers

Forks

Contributors