Sai Teja Srivillibhutturu

Sai Teja
Srivillibhutturu

sai_teja — bash — 80×24
Download Resume
Experience Projects Contact Explore Roles

About Me

LLM Engineer and researcher specializing in fine-tuning, agentic systems, and GPU-optimized inference, with applied work in Healthcare AI. MS in Computer Science from UT Arlington (May 2025).

I build and optimize LLM systems, from SFT pipelines and LoRA fine-tuning to CUDA kernels and production inference infrastructure. I apply this across two domains: 6G wireless networks (2× IEEE published) and Healthcare AI, where I have deployed clinical LLMs across EPIC/FHIR-integrated hospital systems.

2× IEEE published researcher: IEEE ICC 2026 and IEEE OJ-COMS 2026, on LLM-enabled path planning and digital twin guided AI for 6G wireless networks.

My Skills

PythonPython
PyTorchPyTorch
HuggingFaceHugging Face
LoRA / PEFT
vLLM
LangChain
RAG
LLM Agents
TRL / SFT
CUDACUDA
FlashAttention
TensorFlowTensorFlow
C++C++
SionnaSionna (6G)
OpenStreetMapOpenStreetMap
AWSAWS
DockerDocker
K8sKubernetes
PythonPython
PyTorchPyTorch
HuggingFaceHugging Face
LoRA / PEFT
vLLM
LangChain
RAG
LLM Agents
TRL / SFT
CUDACUDA
FlashAttention
TensorFlowTensorFlow
C++C++
SionnaSionna (6G)
OpenStreetMapOpenStreetMap
AWSAWS
DockerDocker
K8sKubernetes
Pinecone
ChromaDB
Tavily
Groq
Gradio
FastAPIFastAPI
FHIR / EPIC
Medical Imaging
HIPAA
DICOM
JavaJava
GoGo
Spring Boot
PostgreSQLPostgreSQL
MongoDBMongoDB
RedisRedis
AirflowAirflow
SparkSpark
ONNX / TensorRT
GitGit
LinuxLinux
Pinecone
ChromaDB
Tavily
Groq
Gradio
FastAPIFastAPI
FHIR / EPIC
Medical Imaging
HIPAA
DICOM
JavaJava
GoGo
Spring Boot
PostgreSQLPostgreSQL
MongoDBMongoDB
RedisRedis
AirflowAirflow
SparkSpark
ONNX / TensorRT
GitGit
LinuxLinux

Experience

Qure.ai Technologies Healthcare AI

AI Solutions Engineer Intern

Mar 2026 - May 2026 • New York, NY (Remote)

Configured LLMs for protocol-specific clinical workflows, orchestrating radiologist report parsing and EMR data extraction across EPIC/FHIR-integrated hospital systems including Medstar, Mount Sinai, and UFL. Deployed and maintained AI inference endpoints supporting 6+ live health system sites under real-time US time zone SLAs. Prepared HIPAA-aligned technical documentation including DPIAs, security questionnaires, and EPIC bidirectional integration guides for clinical AI deployment.

PythonLLMsEPIC/FHIRAWSClinical AI

ReplyQuickAI (DentalScan) Healthcare ML

Machine Learning Engineer Intern

Dec 2025 - Feb 2026 • United States

Built computer vision pipelines for intra-oral image analysis across 6+ clinical categories (gingivitis staging, plaque detection, recession classification) on a 50K+ labeled dataset. Engineered automated retraining pipeline on AWS SageMaker incorporating dentist-corrected labels, improving model accuracy iteratively across production inference endpoints.

PyTorchCNNsAWS SageMakerComputer VisionPython

The University of Texas at Arlington

Graduate Research Assistant

Jun 2025 - Present • Arlington, TX

Built TopGPT, a full-stack LLM application fine-tuned on 3+ textbooks with RAG over 1,000+ research paper chunks stored in Pinecone on AWS. Led CTMap, an LLM fine-tuning pipeline for mmWave 6G path planning, resulting in two accepted IEEE publications: IEEE ICC 2026 (conference) and IEEE OJ-COMS 2026 (journal). Built SFT pipelines encoding Sionna channel maps and OpenStreetMap graphs into transformer-readable formats.

PyTorchRAGPineconeAWSLLM Fine-tuningSionna

The University of Texas at Arlington

Graduate Teaching Assistant

Aug 2024 - May 2025 • Arlington, TX

Supported graduate courses in Numerical Methods for 50+ students, assisting with algorithmic problem solving, optimization, and computational modeling. Concurrently developed CTMap, an LLM-enabled 6G path planning system accepted at IEEE ICC 2026 and extended to a journal paper at IEEE OJ-COMS 2026, fine-tuning LLMs on Dijkstra-generated coordinate paths applied to Sionna wireless network simulation outputs.

PythonLLM Fine-tuningOpenStreetMapSionna 6GDijkstra

Tata Consultancy Services

Senior Software Engineer → Software Engineer

Jun 2019 - May 2023 • 4 years • Chennai

Designed and owned Java-based distributed data processing services handling millions of records daily across production systems serving 10+ enterprise clients. Led system design and architecture reviews for data-intensive microservices. Built fault-tolerant backend pipelines with high availability and reduced processing latency by 40% through service refactoring.

JavaSpring BootMicroservicesSQLREST APIs

Education

The University of Texas at Arlington

Master of Science - Computer Science & Engineering

Specialization in Deep Learning

Aug 2023 - May 2025

4.0 / 4.0

Completed 10+ rigorous graduate-level courses, building a solid foundation in advanced computing and ML systems. Graduated May 2025 with 4.0 GPA.

Graduate Coursework

Neural Networks
Artificial Intelligence
Computer Vision
Machine Learning
Data Analysis & Modeling
Data Mining
Design & Analysis of Algorithms
Numerical Methods
Scalable Systems & Optimization

Andhra University

Bachelor of Technology - Computer Science Engineering

Jun 2015 - Apr 2019

8.28 / 10.0

Relevant Coursework

Data Structures
Algorithms
Object Oriented Programming
Computer Networks
Database Management Systems
Artificial Intelligence
Cyber Security & Digital Forensics
Big Data Analytics
Data Mining
Discrete Mathematics
Formal Languages & Automata
Internet of Things

Featured Projects

IEEE OJ-COMS 2026

Digital Twin–Guided AI Path Planning for Connectivity-Aware Mobility

Sai Teja Srivillibhutturu et al. • IEEE Open Journal of the Communications Society 2026

Comprehensive digital twin–guided AI framework for connectivity-aware mobility in 6G networks. Uses Sionna ray-tracing simulations to train AI agents that predict link quality along candidate routes and select paths maximizing sustained mmWave wireless connectivity.

IEEE ICC 2026 2026

CTMap: LLM-Enabled Connectivity-Aware Path Planning for mmWave 6G Networks

Sai Teja Srivillibhutturu et al. • IEEE International Conference on Communications 2026

Fine-tuned LLMs on Dijkstra-generated coordinate paths from OpenStreetMap, applied to Sionna 6G wireless simulation outputs to produce real-time optimal paths for mmWave connectivity-aware routing.

CUDA Attention Kernel + AWS Neuron SDK

Deep Learning

Two production-grade systems: (1) Custom CUDA C++ scaled-dot-product attention kernel with tiled QKᵀ, numerically stable softmax, and pybind11 PyTorch binding, 5.64x faster than PyTorch at N=32, correctness verified at max_diff < 1e-7. (2) GPT-2 ported to AWS Inferentia via neuronx-cc 3-step pipeline achieving 3.7x speedup (45ms to 12ms, 1,800 to 6,700 tokens/sec). Deployed as FastAPI REST endpoint on HuggingFace Spaces.

CUDA C++PyTorchAWS Inferentiapybind11FastAPI

Production LLM Fine-Tuning: Qwen-7B SFT

LLM / GenAI

Supervised fine-tuning of Qwen2.5-7B using LoRA (r=8, alpha=16) on UltraFeedback, training only 0.5% of parameters (35M of 7B) with QLoRA 4-bit quantization and FP16 mixed precision. Achieved 17% training loss reduction (1.412 to 1.176) and 0.855 BERTScore in 30 minutes on a T4 GPU. Model merged and deployed to HuggingFace Hub.

PyTorchLoRA / PEFTQLoRATRLHuggingFace

Attention Mechanism Optimization Suite

Deep Learning

Benchmarking framework comparing 4 attention implementations on NVIDIA L4 (seq len 1024, batch 32): FlashAttention-2 achieves 12.3x throughput (573K to 6.03M tok/s) and 99.7% memory reduction (12,582 MB to 38 MB) vs vanilla. Includes batch-size auto-tuner using binary search and ONNX/TensorRT export benchmarks. Key finding: algorithm-level optimization (FlashAttention) outperforms hardware-level optimization (TensorRT) by 6x for attention operations.

PyTorchFlashAttention-2xFormersONNX RuntimeCUDA

Advanced AI Agent System

GenAI

Multi-strategy reasoning system implementing 4 research papers: Chain-of-Thought with self-consistency voting (3 independent paths), Tree-of-Thoughts with beam search (width=3, depth=3), ReAct with real-time web search via Tavily API and ChromaDB vector memory, and Multi-Agent collaboration (Planner, Worker, Critic). LLM-based auto-classifier routes each query to the optimal strategy. Rate-limited (10/min, 100/day), with real-time streaming. Built with Groq LLM, deployed on HuggingFace Spaces.

PythonGroqTavilyChromaDBGradio

Certifications

UC Berkeley

Advanced Large Language Model Agents

UC Berkeley EECS • Jul 2025

AWS

AWS Data Engineer Associate

Amazon Web Services • Dec 2024 - Dec 2027

Microsoft

Microsoft Fabric Data Engineer Associate

Microsoft • Aug 2025

Oracle

Oracle GenAI Professional

Oracle Cloud • Jun 2024 - Jun 2026

Explore My Work By Role

Let's Connect

Open to LLM Engineering, Healthcare AI, ML Research, and Deep Learning roles, applying language models to real clinical and physical systems.

GitHub contribution graph

Or reach me directly:

🤖

Sai Teja's AI

Online · Powered by Claude

ST
👋 Welcome to my portfolio! I'm Sai Teja's AI assistant — ask me anything about his experience, projects, publications, or skills.
💼 Work experience? 📄 IEEE publications? ⚡ CUDA projects? 📬 Contact info?