Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

FairAgnetBench

Machine Learning for Socio-Technical Systems Lab (ML4STS), University of Rhode Island

Benchmarking LLM-Agents at Fair Machine Learning

I worked as a software engineer and researcher maintaining an LLM agent, and I helped build a benchmarking framework to test how fair and reliable these agents actually are. A lot of my time went into keeping the agentic systems running smoothly, handling SLURM batch jobs, tuning CUDA performance, and contributing to a research paper that’s in the pipeline.

The whole project gave me a much clearer picture of how these agent systems behave in the real world and just how tricky it is to measure fairness and performance in large language models.

Skills: Python, HPC, SLURM, CUDA, AI Systems, Benchmarking, Research