GaleMind Server

GaleMind ML Inference Server v0.1 - A high-performance machine learning inference server providing both REST and gRPC APIs.

Prerequisites

Rust (1.70+): Install from rustup.rs
Make: Required for using the Makefile commands

Github deployment notes

Pushing a commit to main or develop branch will trigger Docker image building and (upon success) pushing to galemindzen's Docker hub private repo.
Pushing a v* tag onto any commit will trigger its docker image building and (upon success) pushing to galemindzen's Docker hub private repo;
- if that "*" ends in +k8s, then also deploy onto Galemind's Linode Kubernetes cluster.

Installation

Clone the repository:

git clone <repository-url>
cd galemind-server

Install dependencies:
Make sure you have installed libssl-dev! Rust openSSL crate depends on it.
For Debian derivatives:

sudo apt install libssl-dev

Make sure you have installed protobuf-compiler! Rust grpc_server crate depends on it.
For Debian derivatives:

sudo apt install protobuf-compiler

cargo build

Compilation

Using Makefile (Recommended)

# Build the entire project (includes format and test)
make all

# Run tests only
make test

# Format code
make format

# Run the server
make run

Using Cargo directly

# Build the project
cargo build

# Build for production (optimized)
cargo build --release

# Run tests
cargo test

# Format code
cargo fmt

Usage

Environment Variables

Set the required environment variables in the .env file (recommended):

export MODELS_DIR=/path/to/your/models

Starting the Server

Using Makefile (automatically loads environment variables from .env):

make run

Or using cargo directly:

cargo run -p galemind start

Server Configuration

The server supports the following command-line options:

cargo run -p galemind start \
  --rest-host 0.0.0.0 \
  --rest-port 8080 \
  --grpc-host 0.0.0.0 \
  --grpc-port 50051

REST API: Available at http://localhost:8080 (default)
gRPC API: Available at localhost:50051 (default)

API Usage

The REST server supports both the native Galemind protocol and OpenAI-compatible API through the X-Protocol-Inference header.

OpenAI Protocol

Use X-Protocol-Inference: openai header to interact with OpenAI-compatible endpoints:

Chat Completions

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Protocol-Inference: openai" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

List Models

curl -X GET http://localhost:8080/v1/models \
  -H "X-Protocol-Inference: openai"

Model Ready Check

curl -X GET http://localhost:8080/v1/models/gpt-3.5-turbo/ready \
  -H "X-Protocol-Inference: openai"

Galemind Protocol

Use X-Protocol-Inference: galemind header (or omit header for default) to use the native Galemind protocol:

Inference Request

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Protocol-Inference: galemind" \
  -d '{
    "id": "test-request-1",
    "inputs": [
      {
        "name": "input_text",
        "shape": [1],
        "datatype": "string",
        "data": ["Hello, how are you?"]
      }
    ]
  }'

List Models

curl -X GET http://localhost:8080/v1/models \
  -H "X-Protocol-Inference: galemind"

Model Ready Check

curl -X GET http://localhost:8080/v1/models/my-model/ready \
  -H "X-Protocol-Inference: galemind"

gRPC Unified Interface

The gRPC server now supports an enhanced unified interface that provides:

Protocol Selection: Choose between Galemind and OpenAI protocols
Multiple Content Types: Text, Binary, and Base64 content support
Streaming Support: Advanced streaming with chunk management and end-of-stream detection
Backward Compatibility: Full compatibility with existing ModelInfer methods

New gRPC Service Methods

UnifiedInfer (Single Request/Response)

rpc UnifiedInfer(UnifiedInferRequest) returns (UnifiedInferResponse)

UnifiedInferStream (Bidirectional Streaming)

rpc UnifiedInferStream(stream UnifiedInferRequest) returns (stream UnifiedInferResponse)

Message Structure

UnifiedInferRequest

message UnifiedInferRequest {
  InferenceProtocol protocol = 1;           // PROTOCOL_GALEMIND or PROTOCOL_OPENAI
  optional ModelInferRequest legacy_request = 2;  // For backward compatibility
  MessageContent content = 3;               // Enhanced content with type support
  optional StreamMetadata stream_metadata = 4;    // Streaming metadata
  string model_name = 5;
  string model_version = 6;
  string request_id = 7;
  map<string, InferParameter> parameters = 8;
  map<string, string> metadata = 9;
}

Content Types

CONTENT_TYPE_TEXT: Plain text content
CONTENT_TYPE_BINARY: Raw binary data
CONTENT_TYPE_BASE64: Base64-encoded content

Streaming Features

Stream ID: Unique identifier for stream sessions
Chunk Sequencing: Ordered chunk processing with sequence numbers
End-of-Stream Detection: Automatic stream completion handling
Stream Reconstruction: Automatic combining of chunked content

Backward Compatibility

The enhanced interface maintains full backward compatibility:

Legacy Support: Original ModelInfer and ModelInferAsync methods continue to work
Legacy Request Field: Use legacy_request field in UnifiedInferRequest to wrap existing requests
Protocol Fallback: Defaults to Galemind protocol when not specified

Example Usage Patterns

Single Request with OpenAI Protocol

UnifiedInferRequest {
  protocol: PROTOCOL_OPENAI
  content: {
    content_type: CONTENT_TYPE_TEXT
    text_content: "Hello, how are you?"
  }
  model_name: "gpt-3.5-turbo"
  request_id: "req_123"
}

Streaming with Chunks

// Chunk 1
UnifiedInferRequest {
  protocol: PROTOCOL_GALEMIND
  content: {
    content_type: CONTENT_TYPE_TEXT
    text_content: "First part of message"
  }
  stream_metadata: {
    stream_id: "stream_456"
    chunk_sequence: 1
    is_streaming: true
    end_of_stream: false
    total_chunks: 3
  }
  model_name: "my-model"
  request_id: "req_456"
}

// Final Chunk
UnifiedInferRequest {
  protocol: PROTOCOL_GALEMIND
  content: {
    content_type: CONTENT_TYPE_TEXT
    text_content: "Final part of message"
  }
  stream_metadata: {
    stream_id: "stream_456"
    chunk_sequence: 3
    is_streaming: true
    end_of_stream: true
    total_chunks: 3
  }
  model_name: "my-model"
  request_id: "req_456"
}

Binary Content Processing

UnifiedInferRequest {
  protocol: PROTOCOL_GALEMIND
  content: {
    content_type: CONTENT_TYPE_BINARY
    binary_content: [raw_bytes_here]
  }
  model_name: "image-processor"
  request_id: "req_789"
}

Available Make Commands

Command	Description
`make all`	Format code, run tests, and build the project
`make test`	Run all tests
`make format`	Format code using cargo fmt
`make run`	Start the GaleMind server

Project Structure

This is a Rust workspace containing:

src/galemind/ - Main server application
src/grpc_server/ - gRPC server implementation
src/rest_server/ - REST API server implementation

License

See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.devcontainer		.devcontainer
.github		.github
doc/design		doc/design
infra		infra
src		src
.env		.env
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
tarpaulin.toml		tarpaulin.toml

Folders and files

Latest commit

History

Repository files navigation

GaleMind Server

Prerequisites

Github deployment notes

Installation

Compilation

Using Makefile (Recommended)

Using Cargo directly

Usage

Environment Variables

Starting the Server

Server Configuration

API Usage

OpenAI Protocol

Chat Completions

List Models

Model Ready Check

Galemind Protocol

Inference Request

List Models

Model Ready Check

gRPC Unified Interface

New gRPC Service Methods

UnifiedInfer (Single Request/Response)

UnifiedInferStream (Bidirectional Streaming)

Message Structure

UnifiedInferRequest

Content Types

Streaming Features

Backward Compatibility

Example Usage Patterns

Single Request with OpenAI Protocol

Streaming with Chunks

Binary Content Processing

Available Make Commands

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages