Production-ready Raft Consensus implementation with dynamic membership changes, log compaction, and advanced optimizations.
- Full Raft implementation (Leader Election, Log Replication, Commit)
- gRPC communication between nodes
- Dynamic cluster membership (add/remove servers safely)
- Log compaction and automatic snapshotting
- Advanced optimizations: InstallSnapshot, Pre-vote, Linearizable Reads
- Comprehensive monitoring and metrics
- Web dashboard visualization
- REST API for cluster management
- Java 21+
- Gradle (or use the included Gradle Wrapper)
Windows:
start-cluster.batManual:
gradlew bootRun --args="--spring.profiles.active=node1"
gradlew bootRun --args="--spring.profiles.active=node2"
gradlew bootRun --args="--spring.profiles.active=node3"Open: http://localhost:8080/index.html
# Auto-routes to leader
curl -X POST http://localhost:8080/api/cluster/command \
-H "Content-Type: application/json" \
-d '{"command": "SET key=value"}'# Add node
curl -X POST http://localhost:8081/api/cluster/add \
-H "Content-Type: application/json" \
-d '{"nodeId":"node4","host":"localhost","grpcPort":9094,"httpPort":8084}'
# Remove node
curl -X POST http://localhost:8081/api/cluster/remove/node3
# List members
curl http://localhost:8081/api/membership/members# Performance metrics
curl http://localhost:8081/api/metrics/performance
# Replication status
curl http://localhost:8081/api/metrics/replication
# Snapshot stats
curl http://localhost:8081/api/metrics/snapshots
# Events with filtering
curl 'http://localhost:8081/api/metrics/events?type=ELECTION_START&limit=10'Safely add/remove nodes with automatic staging and rollback:
Adding a Node:
- Starts as staging (non-voting member)
- Replicates log without affecting quorum
- Automatically promoted after catching up
- Falls back to previous config if timeout
Quorum Validation:
- Minimum 3 voting nodes enforced
- Prevents cluster from becoming non-fault-tolerant
- Validates before every removal
Example:
# Start new node
curl -X POST http://localhost:8081/api/nodes/node4/start
# Add to cluster (auto-staging)
sleep 10
curl -X POST http://localhost:8081/api/cluster/add \
-H "Content-Type: application/json" \
-d '{"nodeId":"node4","host":"localhost","grpcPort":9094,"httpPort":8084,"staging":true}'
# Verify membership
curl http://localhost:8081/api/membership/membersAutomatic snapshot creation when log reaches 100 entries:
Benefits:
- Memory reduced by ~95%
- Followers catch up in seconds (vs minutes)
- Enables indefinite cluster operation
Monitor:
curl http://localhost:8081/api/metrics/snapshotsResponse shows compression ratio and compacted entries count.
InstallSnapshot RPC:
- Leaders send snapshots to far-behind followers
- Recovery from seconds to milliseconds for lagging nodes
- Automatic and transparent
Pre-vote Algorithm:
- Prevents disruption from partitioned nodes
- Reduces unnecessary leader elections by 90%
- Improves cluster stability
Linearizable Reads:
- Fast read operations (2-3x faster than writes)
- No log replication needed
- Strong consistency guarantee
# Read from leader
curl http://localhost:8081/api/readClient
|
+---> NodeManagerController
|
+---> [node1 (Leader), node2 (Follower), node3 (Follower)]
|
+---> gRPC communication
+---> Log replication
+---> Heartbeats
- FOLLOWER: Default state, receives heartbeats
- CANDIDATE: Initiates elections
- LEADER: Accepts commands, replicates logs
- RequestVote: Leader election
- AppendEntries: Log replication and heartbeats
- InstallSnapshot: Fast catch-up for lagging followers
Each node has src/main/resources/application-nodeX.properties:
raft.node-id=node1
server.port=8081
spring.grpc.server.port=9091
# Peer config
raft.peers[0].node-id=node1
raft.peers[0].host=localhost
raft.peers[0].grpc-port=9091ELECTION_TIMEOUT_MIN = 3000; // 3 seconds
ELECTION_TIMEOUT_MAX = 5000; // 5 seconds
HEARTBEAT_INTERVAL = 1000; // 1 second
SNAPSHOT_THRESHOLD = 100; // entries
STAGING_DURATION_MS = 10000; // 10 seconds
MEMBERSHIP_CHANGE_TIMEOUT_MS = 30000; // 30 seconds# Stop current leader
curl -X POST http://localhost:8081/api/nodes/node1/stop
# Watch new leader election
curl http://localhost:8082/api/status# Submit commands
for i in {1..10}; do
curl -X POST http://localhost:8081/api/cluster/command \
-H "Content-Type: application/json" \
-d "{\"command\":\"test-$i\"}"
done
# Verify on all nodes
curl http://localhost:8081/api/status | jq '.logSize'
curl http://localhost:8082/api/status | jq '.logSize'# Create 150 entries (triggers snapshot)
for i in {1..150}; do
curl -X POST http://localhost:8081/api/cluster/command \
-H "Content-Type: application/json" \
-d "{\"command\":\"cmd-$i\"}"
sleep 0.05
done
# Check snapshot created
curl http://localhost:8081/api/metrics/snapshots | jq '.totalSnapshots'# Add node4
curl -X POST http://localhost:8081/api/nodes/node4/start
sleep 10
curl -X POST http://localhost:8081/api/cluster/add \
-H "Content-Type: application/json" \
-d '{"nodeId":"node4","host":"localhost","grpcPort":9094,"httpPort":8084}'
# Verify
curl http://localhost:8081/api/metrics/health | jq '.clusterSize'src/
├── main/java/com/example/raftimplementation/
│ ├── config/ # Configuration classes
│ ├── controller/ # REST API endpoints
│ ├── grpc/ # gRPC service implementation
│ ├── model/ # Data models
│ ├── service/ # Core Raft logic
│ └── proto/ # gRPC protocol definitions
├── resources/
│ ├── application-node*.properties # Node configs
│ └── index.html # Web dashboard
└── test/
curl http://localhost:8081/api/metrics/healthShows: node state, term, log size, commit index, snapshot info, peer connections
curl http://localhost:8081/api/metrics/performanceShows: throughput (cmd/s), election time (ms), replication latency (ms), leader stability (%)
curl http://localhost:8081/api/metrics/replicationShows per-peer: nextIndex, matchIndex, lag, upToDate status
# View events by type
curl 'http://localhost:8081/api/metrics/events?type=ELECTION_WON&limit=10'Core Raft:
- Leader election with timeouts
- Log replication and commit
- Persistent state management
Safety (Membership Changes):
- Quorum validation
- Staging phase for new nodes
- Automatic promotion
- Rollback on failure
- Joint consensus
Optimizations:
- InstallSnapshot RPC (fast recovery)
- Pre-vote algorithm (election stability)
- Linearizable reads (fast reads)
- Async non-blocking gRPC
Log Compaction:
- Automatic snapshots at 100 entries
- State machine preservation
- Snapshot installation
- Compression metrics
Monitoring:
- Performance metrics
- Health metrics
- Replication tracking
- Event history with filtering
- Snapshot statistics
API & Dashboard:
- Cluster command submission
- Node management
- Cluster membership control
- Web visualization
- Verify ports 8081-8083, 9091-9093 are available
- Check firewall settings
- Review logs for connection errors
- Wait 5-10 seconds initially
- Ensure 2+ nodes running
- Check network connectivity
- Review election events:
curl 'http://localhost:8081/api/metrics/events?type=ELECTION_START'
- Verify you're contacting the leader
- Use
/api/cluster/commandto auto-route - Check
commitIndexandlastAppliedmatch
- Monitor:
curl http://localhost:8081/api/metrics/replication - Check peer connectivity
- Verify network bandwidth
- Happens after snapshots
- Frontend auto-corrects using
snapshotBaseIndex - Check:
curl http://localhost:8081/api/metrics/healthfor snapshot info
MIT License
Built with Spring Boot 3.5.6, gRPC, and the Raft Consensus Algorithm