Back to Resume

Projects & Open Source

LLM infrastructure, generative AI, production ML systems, and 176+ open-source repositories on GitHub.

Featured

Featured Work

LLM infrastructure, generative AI, and production ML systems

LLM Inference Optimization (vLLM)

High-throughput LLM serving infrastructure at Red Hat — optimizing KV cache, continuous batching, and model parallelism for foundation models.

vLLMLLM ServingKV CacheDistributed Inference

LLM Fine-Tuning & Training Hub

Scalable training pipelines for large language models using LoRA, QLoRA, and distributed training with torchtune integration.

torchtuneLoRAFSDPPyTorch

Diffusion Models for Image Generation

Diffusion-based image generation and restoration models — commercialized on Samsung Galaxy S23/S24 with real-time mobile inference via QAT.

Diffusion ModelsQATSamsung SoCPyTorch

Diffusion Language Models

Text generation using diffusion-style denoising — iteratively refining noisy sequences into coherent text, an alternative to autoregressive decoding.

Diffusion LMFlow MatchingPyTorchTransformers

LLM & Model Quantization

End-to-end quantization pipelines for LLMs and vision models — GPTQ, QAT, INT8, pruning for efficient deployment on edge and cloud.

QuantizationONNXGPTQPruning

FlashDet — End-to-End Detection System

Complete desktop app with LoRA/QLoRA fine-tuning, knowledge distillation, ONNX export, and INT8 quantization — train to deploy without code.

LoRAKDQuantizationPyQt5

Open Source

GitHub Repositories

LLMs, diffusion models, efficient inference, and production ML systems

LLM_FineTune

A comprehensive, chapter-by-chapter guide to LLMs — from probability basics to scaling laws, with hands-on fine-tuning code.

LLMFine-TuningScaling Laws

LLMFineTune (Desktop App)

PyQt5 desktop GUI for fine-tuning, evaluating, and deploying LLMs using torchtune — no command-line required.

torchtuneLoRAGUI App

LLMs_Model

Comprehensive guides for working with Large Language Models — architectures, training, and deployment strategies.

LLM ArchitecturesTransformersDeployment

AwesomeKVCache-and-LLMCompression

Curated collection of 150+ research papers on KV Cache Management, KV Cache Compression, and LLM Compression for efficient inference.

KV CacheLLM CompressionEfficient Inference

LoRA

Implementation of LoRA: Low-Rank Adaptation of Large Language Models — parameter-efficient fine-tuning from scratch.

LoRAPEFTFine-Tuning

Diffusion-Language-Model

Diffusion-style denoising for text generation — iteratively refining noisy sequences into coherent text, an alternative to autoregressive LLMs.

Diffusion LMText GenerationPyTorch

REGLA — Gated Linear Attention

Refining Gated Linear Attention — efficient alternative to softmax attention for scalable sequence modeling.

Linear AttentionEfficient TransformersResearch

Attention_mechanisms

Three in-depth surveys covering efficient transformer architectures, attention variants, and optimization techniques.

AttentionTransformersEfficiency

FlashDet

End-to-end object detection system with PyQt5 desktop app — LoRA/QLoRA fine-tuning, knowledge distillation, ONNX export, INT8 quantization. 0.49M–2.44M params, 100+ FPS.

LoRAKnowledge DistillationDesktop App

DataDrift

Research paper on data drift in production ML — taxonomy (covariate/concept/label shift), mathematical formulations (KL, PSI, Wasserstein), monitoring architectures, and 200+ curated papers.

Data DriftMLOps200+ Papers

ml_system_design

Comprehensive guide to ML System Design — covering LLM serving, training pipelines, scaling, and real-world architecture patterns.

System DesignLLM ServingArchitecture

ImageObjectRemoval

Remove objects from photos including shadows and reflections using generative inpainting — end-to-end diffusion-based restoration.

Generative AIInpaintingDiffusion

DSA

170+ stars — well-organized Data Structures and Algorithms covering fundamentals to advanced topics for coding interviews.

DSA170 StarsAlgorithms

View All 176 Repositories