Skip to content

Feature Request: [with working POC] Sleep endpoint for server #24153

@strawberrymelonpanda

Description

@strawberrymelonpanda

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

POST to /sleep triggers the current server to immediately enter the sleeping state (releases GPU memory) on demand. In Router mode, POST to /models/sleep {"model": <model>} does the same.

Re-waking works as it already does. Any incoming request automatically wakes the server/instance from sleep.

Motivation

The current Sleep mechanism is time-based, I wanted a way to programmatically send the model to sleep and release the GPU memory without fully stopping the server or unloading the model.

This way I can have a script release the GPU memory, do another task, and have the model wake again on the next request.

Possible Implementation

I have a working POC that I've been using for several weeks here:
https://gist.github.com/strawberrymelonpanda/07baa1560eb8978b100e7307f8dc882f

This is an AI-assisted implementation. I'm not a C++ developer and so I am unable to meet the contributor guidelines of "explaining every line", but I'd still like to share the solution.

I'm looking for someone to pick this up if the feature is desirable. @ngxson, I think this is your area.

A brief description of the changes via AI, which can explain it better than I can:

The core change is a single flag (req_sleep) in the task queue that, when set, causes the idle loop to immediately transition to the sleeping state. The rest is wiring this through the API layer and the router-to-child communication channel.

git apply --stat sleep.patch reports:
7 files changed, 90 insertions(+), 4 deletions(-)

I tried to make sure there were no extraneous edits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions