GaleMind ML Inference Server v0.1 - A high-performance machine learning inference server providing both REST and gRPC APIs.
- Rust (1.70+): Install from rustup.rs
- Make: Required for using the Makefile commands
- Pushing a commit to
mainordevelopbranch will trigger Docker image building and (upon success) pushing togalemindzen's Docker hub private repo. - Pushing a
v*tag onto any commit will trigger its docker image building and (upon success) pushing togalemindzen's Docker hub private repo;- if that "
*" ends in+k8s, then also deploy onto Galemind's Linode Kubernetes cluster.
- if that "
- Clone the repository:
git clone <repository-url>
cd galemind-server- Install dependencies:
Make sure you have installedlibssl-dev! Rust openSSL crate depends on it.
For Debian derivatives:
sudo apt install libssl-devMake sure you have installed protobuf-compiler! Rust grpc_server crate depends on it.
For Debian derivatives:
sudo apt install protobuf-compilercargo build# Build the entire project (includes format and test)
make all
# Run tests only
make test
# Format code
make format
# Run the server
make run# Build the project
cargo build
# Build for production (optimized)
cargo build --release
# Run tests
cargo test
# Format code
cargo fmtSet the required environment variables in the .env file (recommended):
export MODELS_DIR=/path/to/your/modelsUsing Makefile (automatically loads environment variables from .env):
make runOr using cargo directly:
cargo run -p galemind startThe server supports the following command-line options:
cargo run -p galemind start \
--rest-host 0.0.0.0 \
--rest-port 8080 \
--grpc-host 0.0.0.0 \
--grpc-port 50051- REST API: Available at
http://localhost:8080(default) - gRPC API: Available at
localhost:50051(default)
The REST server supports both the native Galemind protocol and OpenAI-compatible API through the X-Protocol-Inference header.
Use X-Protocol-Inference: openai header to interact with OpenAI-compatible endpoints:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Protocol-Inference: openai" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 150
}'curl -X GET http://localhost:8080/v1/models \
-H "X-Protocol-Inference: openai"curl -X GET http://localhost:8080/v1/models/gpt-3.5-turbo/ready \
-H "X-Protocol-Inference: openai"Use X-Protocol-Inference: galemind header (or omit header for default) to use the native Galemind protocol:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Protocol-Inference: galemind" \
-d '{
"id": "test-request-1",
"inputs": [
{
"name": "input_text",
"shape": [1],
"datatype": "string",
"data": ["Hello, how are you?"]
}
]
}'curl -X GET http://localhost:8080/v1/models \
-H "X-Protocol-Inference: galemind"curl -X GET http://localhost:8080/v1/models/my-model/ready \
-H "X-Protocol-Inference: galemind"The gRPC server now supports an enhanced unified interface that provides:
- Protocol Selection: Choose between Galemind and OpenAI protocols
- Multiple Content Types: Text, Binary, and Base64 content support
- Streaming Support: Advanced streaming with chunk management and end-of-stream detection
- Backward Compatibility: Full compatibility with existing ModelInfer methods
rpc UnifiedInfer(UnifiedInferRequest) returns (UnifiedInferResponse)rpc UnifiedInferStream(stream UnifiedInferRequest) returns (stream UnifiedInferResponse)message UnifiedInferRequest {
InferenceProtocol protocol = 1; // PROTOCOL_GALEMIND or PROTOCOL_OPENAI
optional ModelInferRequest legacy_request = 2; // For backward compatibility
MessageContent content = 3; // Enhanced content with type support
optional StreamMetadata stream_metadata = 4; // Streaming metadata
string model_name = 5;
string model_version = 6;
string request_id = 7;
map<string, InferParameter> parameters = 8;
map<string, string> metadata = 9;
}- CONTENT_TYPE_TEXT: Plain text content
- CONTENT_TYPE_BINARY: Raw binary data
- CONTENT_TYPE_BASE64: Base64-encoded content
- Stream ID: Unique identifier for stream sessions
- Chunk Sequencing: Ordered chunk processing with sequence numbers
- End-of-Stream Detection: Automatic stream completion handling
- Stream Reconstruction: Automatic combining of chunked content
The enhanced interface maintains full backward compatibility:
- Legacy Support: Original
ModelInferandModelInferAsyncmethods continue to work - Legacy Request Field: Use
legacy_requestfield inUnifiedInferRequestto wrap existing requests - Protocol Fallback: Defaults to Galemind protocol when not specified
UnifiedInferRequest {
protocol: PROTOCOL_OPENAI
content: {
content_type: CONTENT_TYPE_TEXT
text_content: "Hello, how are you?"
}
model_name: "gpt-3.5-turbo"
request_id: "req_123"
}// Chunk 1
UnifiedInferRequest {
protocol: PROTOCOL_GALEMIND
content: {
content_type: CONTENT_TYPE_TEXT
text_content: "First part of message"
}
stream_metadata: {
stream_id: "stream_456"
chunk_sequence: 1
is_streaming: true
end_of_stream: false
total_chunks: 3
}
model_name: "my-model"
request_id: "req_456"
}
// Final Chunk
UnifiedInferRequest {
protocol: PROTOCOL_GALEMIND
content: {
content_type: CONTENT_TYPE_TEXT
text_content: "Final part of message"
}
stream_metadata: {
stream_id: "stream_456"
chunk_sequence: 3
is_streaming: true
end_of_stream: true
total_chunks: 3
}
model_name: "my-model"
request_id: "req_456"
}UnifiedInferRequest {
protocol: PROTOCOL_GALEMIND
content: {
content_type: CONTENT_TYPE_BINARY
binary_content: [raw_bytes_here]
}
model_name: "image-processor"
request_id: "req_789"
}| Command | Description |
|---|---|
make all |
Format code, run tests, and build the project |
make test |
Run all tests |
make format |
Format code using cargo fmt |
make run |
Start the GaleMind server |
This is a Rust workspace containing:
src/galemind/- Main server applicationsrc/grpc_server/- gRPC server implementationsrc/rest_server/- REST API server implementation
See the LICENSE file for details.