CS undergraduate at Peking University
I like building model systems that leave the paper, touch real hardware, and survive the awkward details of latency, memory, power, kernels, and devices.
I am interested in the systems side of AI: how models are represented, lowered, scheduled, profiled, optimized, and deployed on constrained hardware.
My favorite kind of work sits between research and engineering:
- turning a model idea into something runnable;
- finding where the real bottleneck hides;
- making benchmarks reproducible enough to trust;
- building small teams that can move from uncertainty to working systems.
- Reading systems, ML, architecture, and security papers with an engineer's eye.
- Profiling code until performance numbers become explainable.
- Working close to hardware: CPU backends, mobile devices, heterogeneous compute, memory layout, kernels, and runtime details.
- Writing clean notes, reproducible scripts, and documentation that future-me will not resent.
- Exploring tools that make engineering collaboration sharper, faster, and calmer.
Model systems and inference
- LLM inference pipelines, low-bit model deployment, runtime integration, benchmark design, throughput / latency / memory / power evaluation.
Systems and backend engineering
- C/C++, Python, CUDA, CMake, Linux.
- CPU backend debugging and optimization.
- Familiar with SIMD / NEON / AVX optimization ideas.
Mobile and heterogeneous computing
- Android / iOS deployment workflows.
- OpenCL backend adaptation.
- Mobile GPU kernel optimization exploration.
Performance analysis
perf,ncu, NVIDIA Nsight.- CPU/GPU profiling, hotspot diagnosis, kernel-level performance analysis.
Collaboration
- Task decomposition, code review, experiment organization, technical writing, and keeping a small engineering group pointed in the same direction.
- Efficient and trustworthy deployment of large models on edge devices.
- Model safety from a systems perspective: evaluation, runtime behavior, deployment reliability, and hardware-aware constraints.
- Better runtimes for low-bit and non-standard model representations.
- Mobile GPU kernels and heterogeneous scheduling for practical AI inference.
- Reproducible benchmarking infrastructure for model systems.
- The boundary between AI systems, architecture, and cybersecurity evaluation.
I am trying to become the kind of engineer-researcher who can:
understand the model, read the kernel, measure the system, explain the tradeoff, and make the next version faster without making it less trustworthy.
- Email: zybi@stu.pku.edu.cn


