Machine Learning for Socio-Technical Systems Lab (ML4STS), University of Rhode Island
Benchmarking LLM-Agents at Fair Machine Learning¶
I worked as a software engineer and researcher maintaining an LLM agent, and I helped build a benchmarking framework to test how fair and reliable these agents actually are. A lot of my time went into keeping the agentic systems running smoothly, handling SLURM batch jobs, tuning CUDA performance, and contributing to a research paper that’s in the pipeline.
The whole project gave me a much clearer picture of how these agent systems behave in the real world and just how tricky it is to measure fairness and performance in large language models.
Skills: Python, HPC, SLURM, CUDA, AI Systems, Benchmarking, Research