LLM Evaluation Engineer
Engineeringfull timeRemotemid level$65k-$95k
About This Role
We're seeking an LLM Evaluation Engineer to join our AI-first recruiting platform. In this role, you'll design and execute rigorous evaluation frameworks for large language models, develop benchmark datasets, and analyze model performance across recruitment-specific tasks. You'll work directly with our product and ML teams to ensure our AI systems deliver accurate, fair, and reliable candidate assessments at scale.
Requirements
- Develop and maintain LLM evaluation metrics and benchmark datasets
- Design A/B testing frameworks for model performance comparison
- Write clean, production-grade Python code for evaluation pipelines
- Analyze model outputs for bias, hallucination, and accuracy across recruitment use cases
- Collaborate with ML engineers to implement evaluation findings into model improvements
- Document evaluation methodologies and create clear performance reports
Skills
LLM evaluation frameworksPrompt engineeringPythonData annotation & labelingStatistical analysisModel benchmarkingRAG systemsNLP
