Dynamic Resource Allocation For AI-native Services (DRAFAS).
We need a cluster with at least one server and one Nvidia GPU.
-
Install K3s following instruction.
-
Install NVIDIA gpu operator for k3s
helm install --wait nvidiagpu -n gpu-operator --create-namespace \ --values utils_services/gpu-operator-values.yaml \ nvidia/gpu-operator
-
GPU slicing with MPS or time-slicing. By default, we use MPS. If you want to use time-slicing, you need to modify
utils_services/mps-slicing-config-all.yamlkubectl create -n gpu-operator -f utils_services/mps-slicing-config-all.yaml kubectl patch clusterpolicies.nvidia.com/cluster-policy \ -n gpu-operator --type merge \ -p '{"spec": {"devicePlugin": {"config": {"name": "mps-slicing-config-all", "default": "any"}}}}' # check if patch working correctly kubectl get events -n gpu-operator --sort-by='.lastTimestamp'
-
Install Prometheus
kubectl apply -f utils_services/prometheus-deployment.yaml
-
Install Istio following instruction. Then config istio to only expose neccessary metrics.
istioctl install -f utils_services/istio-operator.yaml
We provide three prototype AI services for evaluation: a chatbot service (ollama), an image classification service (pytorch), and a text to speech service (coqui). To deploy them, run bellow commands from K3s manager node:
kubectl apply -f ai_services/ollama-deployment.yaml
kubectl apply -f ai_services/pytorch-deployment.yaml
kubectl apply -f ai_services/coqui-deployment.yamlThe auto scaling agent (DRL-based or rule-based) should be deployed in a separated machine (no GPU is required).
- A virtual environment should be used.
cd drafas python3.10 -m venv .env source .env/bin/activate pip install -r requirement.txt
- Running the agent
# this command start all three agents. To manually start each agent, check the script for details. ./utils_services/run_test_real_env.sh drl # if use rule-based agent, use bellow command # ./utils_services/run_test_real_env.sh th # for custom command, check help message python drl/drl.py --help
The client emulator send inference requests to the AI services. If you run the client in the different python env than the one for auto scaling agent, you need to first prepair the python virtual env similar to the auto scaling agent.
- Generate request content dataset.
python client/generate_data.py
- Start sending request to AI services from client emulator.
./utils_services/run_test_real_env_client.sh drl # for custom client command, see help python client/client.py --help
In DRAFAS, network slicing is optional and can be created using 5G/B5G network slicing. For fast testing, two simple network slices can be created using helm chart from gradiant repo, following the tutorial. The slice's parameters can be setup via open5GS web interface.
For fast training and evaluation, we implemented a simulator in simpy_env.py. For training in simulator:
taskset -c 2-3 python drl/drl.py --mode train --service ollama
# for custom train command, see help
python drl/drl.py --helpTo understand how DRAFAS works, please check our publication at to be updated