Greg Abrams

Professional Summary

Senior AI Engineer specializing in the architecture and deployment of production-grade Agentic AI systems. I combine hands-on development of multi-agent RAG workflows (LangGraph, Python) with deep expertise in optimizing distributed inference infrastructure (H100/H200, vLLM). Proven track record of building self-correcting retrieval systems, implementing multimodal ingestion pipelines (OCR/Vision), and architecting scalable orchestration layers (Ray, LeaderWorkerSet) for Tier-1 AI workloads.

Professional Experience

AI/ML Cloud Engineer

CoreWeave | Livingston, New Jersey (Remote)

2023 – 2025

Leveraging deep systems knowledge to build and scale next-generation AI applications for the world's leading GPU cloud.

Architected Agentic RAG Platform (LangGraph): Designed and deployed a modular "Agentic" retrieval system to automate technical solution discovery. Replaced monolithic code with LangGraph workflows, enabling conditional routing, self-correction, and state management.
Multimodal Data Integration: Built ingestion pipelines using OCR and vision models to extract text from technical diagrams and screenshots, successfully incorporating trapped visual data into the LLM's context window.
Advanced Reasoning: Implemented Adaptive Multi-Query workflows with LLM-driven Gap Analysis, allowing the system to detect missing info, generate targeted follow-up queries, and merge results from Vector Stores and Knowledge Graphs (Dgraph).
Hybrid Retrieval Pipeline: Built a sophisticated retrieval engine combining semantic search (Vector) and entity relationships (Graph), utilizing document expansion algorithms to improve context relevance by 5x.
High-Performance Inference Optimization: Engineered a high-throughput inference engine using vLLM and SGLang, achieving a 19x throughput increase vs. standard baselines. Deployed custom H200 Remote Reranking APIs with optimized KV-cache management and continuous batching strategies.
Distributed Workload Orchestration: Architected scalable compute clusters using Ray and Kubernetes LeaderWorkerSet (LWS) to manage distributed training and inference lifecycles. Resolved critical multi-node bottlenecks by tuning NCCL topology awareness and Slurm scheduling parameters for massive-scale GPU fleets.
Distributed Model Caching Architecture: Designed a transparent caching proxy for the Hugging Face Hub using Olah and JuiceFS backed by S3. Enabled block-level caching and LRU eviction for partial model downloads, significantly reducing startup latency and egress costs.
DevPod Solution Architecture: Designed and deployed a scalable Golden Path development environment on Kubernetes, enabling researchers to spin up consistent, GPU-accelerated workspaces instantly. Authored the public reference architecture driving enterprise adoption.

Senior Linux Programmer / Administrator

Genesco | Nashville, TN

2022 – 2023

Data Pipeline Engineering: Engineered real-time data integration pipelines using Python and Bash, ensuring data accuracy across a distributed enterprise Linux environment (2,000+ endpoints).
Infrastructure Reliability: Managed the stability of critical Point-of-Sale (POS) infrastructure, implementing automated remediation scripts that significantly reduced operational downtime.
API Integration: Developed custom API integrations to streamline workflows and architected standardized VMware vSphere deployment processes.

Senior Linux Developer / Administrator

Consulting First, Inc. | Nashville, TN

2011 – 2022

Multi-Cloud Architecture: Managed complex infrastructure spanning AWS, Azure, and Google Cloud, migrating legacy on-prem workloads to hybrid cloud environments.
Kubernetes Operations: Designed and managed high-availability Kubernetes clusters, implementing full-stack observability with Prometheus and Grafana prior to industry standardization.
Full-Stack Development: Developed and maintained performant web applications (PHP/Perl/JS), optimizing code for scalability under high traffic loads.

Technical Skills

AI Development

LangGraph Agentic Workflows RAG Pipelines Python LangChain vLLM SGLang Vector Databases (Weaviate/Milvus) Knowledge Graphs (Dgraph) Multimodal Data Ingestion

AI Infrastructure

Ray LeaderWorkerSet (LWS) Slurm NCCL JuiceFS Model Caching Inference Scaling Megatron-LM H100/H200 Optimization

Cloud Native

Kubernetes (CKA/CKAD) Docker Helm Cilium Prometheus Grafana ArgoCD GitOps

Systems & Core

Linux (Red Hat/Ubuntu/Debian) Bash InfiniBand RoCE VMware CI/CD Design S3/Object Storage

Certifications

CKA: Certified Kubernetes Administrator

CKAD: Certified Kubernetes Application Developer

NVIDIA: AI Infrastructure and Operations Fundamentals

VAST Data: Level 101 & 201 (Architecture & Networking)

IBM: Python for Data Science and AI

Education

Bachelor of Business Administration (BBA) – Accounting & Computer Science

Columbus State University | GPA: 3.90 (Dean's List)