LLM Evaluation Engineer

Engineeringfull timeRemotemid level$65k-$95k

About This Role

We're seeking an LLM Evaluation Engineer to join our AI-first recruiting platform. In this role, you'll design and execute rigorous evaluation frameworks for large language models, develop benchmark datasets, and analyze model performance across recruitment-specific tasks. You'll work directly with our product and ML teams to ensure our AI systems deliver accurate, fair, and reliable candidate assessments at scale.

Requirements

Develop and maintain LLM evaluation metrics and benchmark datasets
Design A/B testing frameworks for model performance comparison
Write clean, production-grade Python code for evaluation pipelines
Analyze model outputs for bias, hallucination, and accuracy across recruitment use cases
Collaborate with ML engineers to implement evaluation findings into model improvements
Document evaluation methodologies and create clear performance reports

Skills

LLM evaluation frameworksPrompt engineeringPythonData annotation & labelingStatistical analysisModel benchmarkingRAG systemsNLP

LLM Evaluation Engineer

About This Role

Requirements

Skills

Check your profile with AI