Remote

LLM Evaluation Engineer

Engineeringfull timemid level$65k-$95k
AI ScreenedRemote B2BEU Talent Pool1 applicants
For hiring agencies & HR teams

EU engineers, ready to place with your US clients

Pre-screened on AI. Remote B2B contracts. View 5 full profiles free — AI score, skills report, interview questions included.

About This Role

We're seeking an LLM Evaluation Engineer to join our AI-first recruiting platform. In this role, you'll design and execute rigorous evaluation frameworks for large language models, develop benchmark datasets, and analyze model performance across recruitment-specific tasks. You'll work directly with our product and ML teams to ensure our AI systems deliver accurate, fair, and reliable candidate assessments at scale.

Requirements

  • Develop and maintain LLM evaluation metrics and benchmark datasets
  • Design A/B testing frameworks for model performance comparison
  • Write clean, production-grade Python code for evaluation pipelines
  • Analyze model outputs for bias, hallucination, and accuracy across recruitment use cases
  • Collaborate with ML engineers to implement evaluation findings into model improvements
  • Document evaluation methodologies and create clear performance reports

Required Skills

LLM evaluation frameworksPrompt engineeringPythonData annotation & labelingStatistical analysisModel benchmarkingRAG systemsNLP