Raft Consensus Implementation with Spring Boot and gRPC

Production-ready Raft Consensus implementation with dynamic membership changes, log compaction, and advanced optimizations.

Features

Full Raft implementation (Leader Election, Log Replication, Commit)
gRPC communication between nodes
Dynamic cluster membership (add/remove servers safely)
Log compaction and automatic snapshotting
Advanced optimizations: InstallSnapshot, Pre-vote, Linearizable Reads
Comprehensive monitoring and metrics
Web dashboard visualization
REST API for cluster management

Quick Start

Requirements

Java 21+
Gradle (or use the included Gradle Wrapper)

Start Cluster

Windows:

start-cluster.bat

Manual:

gradlew bootRun --args="--spring.profiles.active=node1"
gradlew bootRun --args="--spring.profiles.active=node2"
gradlew bootRun --args="--spring.profiles.active=node3"

Open: http://localhost:8080/index.html

Core API

Submit Commands

# Auto-routes to leader
curl -X POST http://localhost:8080/api/cluster/command \
  -H "Content-Type: application/json" \
  -d '{"command": "SET key=value"}'

Cluster Management

# Add node
curl -X POST http://localhost:8081/api/cluster/add \
  -H "Content-Type: application/json" \
  -d '{"nodeId":"node4","host":"localhost","grpcPort":9094,"httpPort":8084}'

# Remove node
curl -X POST http://localhost:8081/api/cluster/remove/node3

# List members
curl http://localhost:8081/api/membership/members

Monitoring

# Performance metrics
curl http://localhost:8081/api/metrics/performance

# Replication status
curl http://localhost:8081/api/metrics/replication

# Snapshot stats
curl http://localhost:8081/api/metrics/snapshots

# Events with filtering
curl 'http://localhost:8081/api/metrics/events?type=ELECTION_START&limit=10'

Advanced Features

1. Dynamic Membership Changes

Safely add/remove nodes with automatic staging and rollback:

Adding a Node:

Starts as staging (non-voting member)
Replicates log without affecting quorum
Automatically promoted after catching up
Falls back to previous config if timeout

Quorum Validation:

Minimum 3 voting nodes enforced
Prevents cluster from becoming non-fault-tolerant
Validates before every removal

Example:

# Start new node
curl -X POST http://localhost:8081/api/nodes/node4/start

# Add to cluster (auto-staging)
sleep 10
curl -X POST http://localhost:8081/api/cluster/add \
  -H "Content-Type: application/json" \
  -d '{"nodeId":"node4","host":"localhost","grpcPort":9094,"httpPort":8084,"staging":true}'

# Verify membership
curl http://localhost:8081/api/membership/members

2. Log Compaction & Snapshots

Automatic snapshot creation when log reaches 100 entries:

Benefits:

Memory reduced by ~95%
Followers catch up in seconds (vs minutes)
Enables indefinite cluster operation

Monitor:

curl http://localhost:8081/api/metrics/snapshots

Response shows compression ratio and compacted entries count.

3. Advanced Optimizations

InstallSnapshot RPC:

Leaders send snapshots to far-behind followers
Recovery from seconds to milliseconds for lagging nodes
Automatic and transparent

Pre-vote Algorithm:

Prevents disruption from partitioned nodes
Reduces unnecessary leader elections by 90%
Improves cluster stability

Linearizable Reads:

Fast read operations (2-3x faster than writes)
No log replication needed
Strong consistency guarantee

# Read from leader
curl http://localhost:8081/api/read

Architecture

Three-Node Example

Client
  |
  +---> NodeManagerController
           |
           +---> [node1 (Leader), node2 (Follower), node3 (Follower)]
                    |
                    +---> gRPC communication
                    +---> Log replication
                    +---> Heartbeats

Node States

FOLLOWER: Default state, receives heartbeats
CANDIDATE: Initiates elections
LEADER: Accepts commands, replicates logs

gRPC Services

RequestVote: Leader election
AppendEntries: Log replication and heartbeats
InstallSnapshot: Fast catch-up for lagging followers

Configuration

Each node has src/main/resources/application-nodeX.properties:

raft.node-id=node1
server.port=8081
spring.grpc.server.port=9091

# Peer config
raft.peers[0].node-id=node1
raft.peers[0].host=localhost
raft.peers[0].grpc-port=9091

Tuning (in RaftNode.java)

ELECTION_TIMEOUT_MIN = 3000;      // 3 seconds
ELECTION_TIMEOUT_MAX = 5000;      // 5 seconds
HEARTBEAT_INTERVAL = 1000;        // 1 second
SNAPSHOT_THRESHOLD = 100;         // entries
STAGING_DURATION_MS = 10000;      // 10 seconds
MEMBERSHIP_CHANGE_TIMEOUT_MS = 30000; // 30 seconds

Testing Features

Test Leader Election

# Stop current leader
curl -X POST http://localhost:8081/api/nodes/node1/stop

# Watch new leader election
curl http://localhost:8082/api/status

Test Log Replication

# Submit commands
for i in {1..10}; do
  curl -X POST http://localhost:8081/api/cluster/command \
    -H "Content-Type: application/json" \
    -d "{\"command\":\"test-$i\"}"
done

# Verify on all nodes
curl http://localhost:8081/api/status | jq '.logSize'
curl http://localhost:8082/api/status | jq '.logSize'

Test Snapshots

# Create 150 entries (triggers snapshot)
for i in {1..150}; do
  curl -X POST http://localhost:8081/api/cluster/command \
    -H "Content-Type: application/json" \
    -d "{\"command\":\"cmd-$i\"}"
  sleep 0.05
done

# Check snapshot created
curl http://localhost:8081/api/metrics/snapshots | jq '.totalSnapshots'

Test Membership Changes

# Add node4
curl -X POST http://localhost:8081/api/nodes/node4/start
sleep 10

curl -X POST http://localhost:8081/api/cluster/add \
  -H "Content-Type: application/json" \
  -d '{"nodeId":"node4","host":"localhost","grpcPort":9094,"httpPort":8084}'

# Verify
curl http://localhost:8081/api/metrics/health | jq '.clusterSize'

Project Structure

src/
├── main/java/com/example/raftimplementation/
│   ├── config/         # Configuration classes
│   ├── controller/     # REST API endpoints
│   ├── grpc/           # gRPC service implementation
│   ├── model/          # Data models
│   ├── service/        # Core Raft logic
│   └── proto/          # gRPC protocol definitions
├── resources/
│   ├── application-node*.properties  # Node configs
│   └── index.html                    # Web dashboard
└── test/

Monitoring

Health Endpoint

curl http://localhost:8081/api/metrics/health

Shows: node state, term, log size, commit index, snapshot info, peer connections

Performance Endpoint

curl http://localhost:8081/api/metrics/performance

Shows: throughput (cmd/s), election time (ms), replication latency (ms), leader stability (%)

Replication Endpoint

curl http://localhost:8081/api/metrics/replication

Shows per-peer: nextIndex, matchIndex, lag, upToDate status

Event History

# View events by type
curl 'http://localhost:8081/api/metrics/events?type=ELECTION_WON&limit=10'

Implementation Status

Core Raft:

Leader election with timeouts
Log replication and commit
Persistent state management

Safety (Membership Changes):

Quorum validation
Staging phase for new nodes
Automatic promotion
Rollback on failure
Joint consensus

Optimizations:

InstallSnapshot RPC (fast recovery)
Pre-vote algorithm (election stability)
Linearizable reads (fast reads)
Async non-blocking gRPC

Log Compaction:

Automatic snapshots at 100 entries
State machine preservation
Snapshot installation
Compression metrics

Monitoring:

Performance metrics
Health metrics
Replication tracking
Event history with filtering
Snapshot statistics

API & Dashboard:

Cluster command submission
Node management
Cluster membership control
Web visualization

Troubleshooting

Nodes not connecting

Verify ports 8081-8083, 9091-9093 are available
Check firewall settings
Review logs for connection errors

No leader elected

Wait 5-10 seconds initially
Ensure 2+ nodes running
Check network connectivity
Review election events: curl 'http://localhost:8081/api/metrics/events?type=ELECTION_START'

Commands not replicated

Verify you're contacting the leader
Use /api/cluster/command to auto-route
Check commitIndex and lastApplied match

High replication lag

Monitor: curl http://localhost:8081/api/metrics/replication
Check peer connectivity
Verify network bandwidth

Incorrect state machine indices

Happens after snapshots
Frontend auto-corrects using snapshotBaseIndex
Check: curl http://localhost:8081/api/metrics/health for snapshot info

References

License

MIT License

Author

Built with Spring Boot 3.5.6, gRPC, and the Raft Consensus Algorithm

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
gradle/wrapper		gradle/wrapper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTROLLER_SUMMARY.md		CONTROLLER_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle
start-client.bat		start-client.bat
start-cluster.bat		start-cluster.bat
start-cluster.sh		start-cluster.sh
start-node.bat		start-node.bat
stop-cluster.bat		stop-cluster.bat

Folders and files

Latest commit

History

Repository files navigation

Raft Consensus Implementation with Spring Boot and gRPC

Features

Quick Start

Requirements

Start Cluster

Core API

Submit Commands

Cluster Management

Monitoring

Advanced Features

1. Dynamic Membership Changes

2. Log Compaction & Snapshots

3. Advanced Optimizations

Architecture

Three-Node Example

Node States

gRPC Services

Configuration

Tuning (in RaftNode.java)

Testing Features

Test Leader Election

Test Log Replication

Test Snapshots

Test Membership Changes

Project Structure

Monitoring

Health Endpoint

Performance Endpoint

Replication Endpoint

Event History

Implementation Status

Troubleshooting

Nodes not connecting

No leader elected

Commands not replicated

High replication lag

Incorrect state machine indices

References

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages