Pivots Hiring

LLM Evaluation Engineer

Engineeringfull timeRemotemid level$65k-$95k

About This Role

We're seeking an LLM Evaluation Engineer to join our AI-first recruiting platform. In this role, you'll design and execute rigorous evaluation frameworks for large language models, develop benchmark datasets, and analyze model performance across recruitment-specific tasks. You'll work directly with our product and ML teams to ensure our AI systems deliver accurate, fair, and reliable candidate assessments at scale.

Requirements

  • Develop and maintain LLM evaluation metrics and benchmark datasets
  • Design A/B testing frameworks for model performance comparison
  • Write clean, production-grade Python code for evaluation pipelines
  • Analyze model outputs for bias, hallucination, and accuracy across recruitment use cases
  • Collaborate with ML engineers to implement evaluation findings into model improvements
  • Document evaluation methodologies and create clear performance reports

Skills

LLM evaluation frameworksPrompt engineeringPythonData annotation & labelingStatistical analysisModel benchmarkingRAG systemsNLP

Check your profile with AI