RecFlex

RecFlex is a recommendation model optimization system that optimizes the GPU kernel of fused embedding operations, allowing using distinct schedules for different feature fields in the fused kernel.

RecFlex utilizes an inter-feature interference-aware tuning framework to tune the generally optimal schedules based on recent input data. It first tunes the per-feature schedule with explicitly controlled occupancy and simulated inter-feature interference in the local stage, and then tunes the optimal occupancy value in the global stage. RecFlex also incorporates a feature-specific schedule fusion compiler to generate fused GPU kernels. The thread binding of the fused kernels is determined at runtime to reduce workload imbalance and resource waste.

RecFlex Overview.

Experimental results show that RecFlex achieves average speedups of 1.95×, 11.57×, and 7.92× over TorchRec, HugeCTR, and RECom, which are the state-of-the-art recommendation model libraries or compilers.

Get Started

Set up Development Environment

To try RecFlex, you can directly run the docker container with image nvcr.io/nvidia/pytorch:23.04-py3.

docker run -d --gpus all --net=host --name recflex -it nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04

In our paper, to be compatible with TorchRec, we use the image nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 and execute the following commands:

apt-get update

# install python3.8 and pip
apt-get install -y wget python3.8 python3.8-distutils python3.8-dev
wget https://bootstrap.pypa.io/get-pip.py
python3.8 get-pip.py
update-alternatives --install /usr/bin/python python /usr/bin/python3.8 1

# install torch and torchrec
pip install torch==2.2.0 --index-url https://download.pytorch.org/whl/cu118
pip install fbgemm-gpu==0.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install torchmetrics==1.0.3
pip install torchrec==0.6.0 --index-url https://download.pytorch.org/whl/cu118

Then, install CMake:

cd ~
wget https://github.com/Kitware/CMake/releases/download/v3.28.3/cmake-3.28.3-linux-x86_64.tar.gz
tar -zxvf cmake-3.28.3-linux-x86_64.tar.gz
export PATH=$HOME/cmake-3.28.3-linux-x86_64/bin:$PATH
echo 'export PATH=$HOME/cmake-3.28.3-linux-x86_64/bin:$PATH' >> ~/.bashrc

and install Nsight System:

cd ~
wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2023_4_1_97/nsight-systems-2023.4.1_2023.4.1.97-1_amd64.deb
apt-get install -y ./nsight-systems-2023.4.1_2023.4.1.97-1_amd64.deb

Install RecFlex

Clone this repository to RecFlex and then install the RecFlex package:

cd RecFlex
pip install .
bash post_install.sh

Run a Simple Test

cd examples
bash test.sh

Benchmark Models and Datasets

The benchmark models used in the paper are simplified DLRMs with no bottom MLP. The pooling operations used in the models are reduce-sum for all features. After embedding lookups of all features, the concatenation operation is applied so that the embedding dimension can vary across features.

For the input datasets, we use the script data_synthesis/data_generate.py and the configurations under examples/models to generate them. See data_synthesis/README.md for details.

Citation

If you use this codebase or otherwise found our work valuable, please cite:

@inproceedings{pan2024recflex,
  title = {RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules},
  author = {Pan, Zaifeng and Zheng, Zhen and Zhang, Feng and Xie, Bing and Wu, Ruofan and Smith, Shaden and Liu, Chuanjie and Ruwase, Olatunji and Du, Xiaoyong and Ding, Yufei},
  booktitle = {SC24: International Conference for High Performance Computing, Networking, Storage and Analysis},
  year = {2024},
  organization = {IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
HugeCTR		HugeCTR
RecFlex		RecFlex
SC_artifact		SC_artifact
TensorFlow		TensorFlow
TorchRec		TorchRec
data_synthesis		data_synthesis
docs/assets		docs/assets
examples		examples
microbenchmark		microbenchmark
plots		plots
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
copyright.py		copyright.py
post_install.sh		post_install.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RecFlex

Get Started

Set up Development Environment

Install RecFlex

Run a Simple Test

Benchmark Models and Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RecFlex

Get Started

Set up Development Environment

Install RecFlex

Run a Simple Test

Benchmark Models and Datasets

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages