Writing CUDA kernels, optimizing transformer attention, and building high-throughput inference infrastructure. I work at the hardware-software boundary where milliseconds matter.
I specialize in the performance layer of deep learning, the part most engineers skip. Custom CUDA kernels, tiled matrix multiplication, numerically stable softmax, pybind11 bindings. I have benchmarked 4 attention implementations on NVIDIA L4 hardware and built production inference endpoints. My research at UTA (IEEE ICC 2026) involved GPU-optimized LLM fine-tuning pipelines on 10K+ examples using Sionna 6G digital twin simulation outputs.
Sorted by most recently updated on GitHub
Loading projects from GitHub...
Graduate Research Assistant
Jun 2025 - Present
Led GPU-optimized LLM fine-tuning pipeline for CTMap (IEEE ICC 2026 + OJ-COMS 2026). Built CUDA-accelerated data pipelines processing Sionna 6G wireless simulation outputs. Profiled and optimized transformer inference, achieving significant throughput gains on research workloads.
Machine Learning Engineer Intern
Dec 2025 - Feb 2026
Optimized deep learning CV pipelines for real-time intra-oral image inference on AWS SageMaker. Reduced inference latency through model quantization and batching strategies across 50K+ labeled medical images.
UC Berkeley EECS • Jul 2025
Oracle Cloud • Jun 2024 - Jun 2026
Let's talk CUDA kernels, inference optimization, or high-performance deep learning systems.