Deep Learning Engineer

GPU Optimization & Deep Learning Systems

Writing CUDA kernels, optimizing transformer attention, and building high-throughput inference infrastructure. I work at the hardware-software boundary where milliseconds matter.

5.64× CUDA Attention Speedup
12.3× FlashAttention Throughput
99.7% Memory Reduction
3.7× AWS Inferentia Speedup

About This Role

I specialize in the performance layer of deep learning, the part most engineers skip. Custom CUDA kernels, tiled matrix multiplication, numerically stable softmax, pybind11 bindings. I have benchmarked 4 attention implementations on NVIDIA L4 hardware and built production inference endpoints. My research at UTA (IEEE ICC 2026) involved GPU-optimized LLM fine-tuning pipelines on 10K+ examples using Sionna 6G digital twin simulation outputs.

Technical Skills

CUDACUDA
C++C++
PyTorchPyTorch
Hugging FaceHugging Face
PythonPython
AWS InferentiaAWS Inferentia
DockerDocker
LinuxLinux
CUDACUDA
C++C++
PyTorchPyTorch
Hugging FaceHugging Face
PythonPython
AWS InferentiaAWS Inferentia
DockerDocker
LinuxLinux

Deep Learning Projects

Sorted by most recently updated on GitHub

Loading projects from GitHub...

Experience

The University of Texas at Arlington

Graduate Research Assistant

Jun 2025 - Present

Led GPU-optimized LLM fine-tuning pipeline for CTMap (IEEE ICC 2026 + OJ-COMS 2026). Built CUDA-accelerated data pipelines processing Sionna 6G wireless simulation outputs. Profiled and optimized transformer inference, achieving significant throughput gains on research workloads.

ReplyQuickAI (DentalScan)

Machine Learning Engineer Intern

Dec 2025 - Feb 2026

Optimized deep learning CV pipelines for real-time intra-oral image inference on AWS SageMaker. Reduced inference latency through model quantization and batching strategies across 50K+ labeled medical images.

Certifications

Advanced Large Language Model Agents

UC Berkeley EECS • Jul 2025

Oracle GenAI Professional

Oracle Cloud • Jun 2024 - Jun 2026

Need GPU-Level ML Performance?

Let's talk CUDA kernels, inference optimization, or high-performance deep learning systems.