I'm a Senior Full Stack Developer specializing in Generative AI and Large Language Models, with extensive experience across healthcare, logistics, and AI domains. My journey began at University at Buffalo where I earned my MS in Engineering Science and worked as a Graduate Teaching Assistant for Machine Learning with Search Engines. My research foundation includes multimodal search engines combining visual embeddings with NLP for hybrid retrieval, achieving 35% improvement in result relevance over traditional keyword search.
Currently, I lead development of AI-powered applications at Mastery Logistics Systems, building LLM-powered logistics optimization tools, RAG systems, and vector search solutions. My work spans healthcare platforms at UPMC and analytics pipelines at Cigna. My latest research explores Small Language Models in Agentic AI systems, investigating how SLMs can replace LLMs for cost and efficiency gains.
Deep expertise in Large Language Models, RAG systems, multimodal AI, and information retrieval
Expertise in multiple programming languages and modern frontend technologies
Building scalable backend systems with modern architecture patterns and APIs
Deploying and managing applications in cloud environments with CI/CD pipelines
Building data pipelines and managing various database systems for large-scale applications
Small Language Models in Agentic AI
Aug 2025 – Present
As an individual contributor, I explored how small language models (SLMs) can replace large language models (LLMs) in agentic systems, based on Belcak et al., 2025. My work combined literature study with practical side experiments, including fine-tuning open-source SLMs (2–7B) with LoRA/QLoRA on small, synthetic datasets for tool calling and structured outputs. This research builds upon the foundational work published in arXiv:2506.02153 [cs.AI] and DOI: 10.48550/arXiv.2506.02153.
Technologies: SLMs (2-7B), LoRA/QLoRA, PyTorch, Python, Prompt Engineering, Agentic AI, MetaGPT, Open Operator
Impact: Benchmarked SLMs locally on consumer GPU and compared latency and cost with LLM APIs, demonstrating that 40–70% of agent calls could be offloaded to SLMs.
Outcome: This independent research and hands-on validation supported the position that SLM-first architectures can deliver significant cost and efficiency gains in real-world agentic AI applications.
Intelligent Visual Search Engine
Jan 2022 – May 2023
Designed a multimodal search engine for the 'Machine Learning with Search Engines' course, enabling image-to-image and image-to-text retrieval using ResNet-based embeddings and cosine similarity search in Elasticsearch.
Technologies: ResNet, Elasticsearch, NLP, Python, PyTorch, Multimodal AI
Impact: Combined visual embeddings with NLP-based caption search for hybrid retrieval, improving result relevance by 35% over keyword-only search.
Outcome: Successfully demonstrated the effectiveness of multimodal approaches in search engine technology, leading to improved user experience and search accuracy.
AI-Powered Logistics Optimization
Oct 2024 – Present
Building generative AI system for shipment scheduling and route planning with vector search for contextual retrieval. Integrating RAG pipelines with logistics microservices for real-time decision-making.
Technologies: LLMs, RAG, Vector Search, AWS Bedrock, SageMaker, Python, React
Impact: Reducing planning time by 60% and boosting throughput by 25% through AI-driven optimization.
Outcome: Creating production-ready AI systems that directly impact business operations and efficiency.
Document Intelligence & Summarization
Oct 2024 – Present
Developing NLP platform to summarize shipment contracts and compliance documents using transformer-based NLP models and RAG pipelines.
Technologies: LLMs, RAG, Transformers, NLP, Python, LangChain
Impact: Reducing manual document review time by 55% and improving compliance turnaround speed.
Outcome: Automating complex document processing tasks to improve operational efficiency and accuracy.
Made with Angular + Tailwind; AI by LLMs/RAG
@Sampath Garimella 2025