L
38

LLM Evaluation Engineer

2y relevant experience

Not Qualified
For hiring agencies & HR teams

EU engineers, ready to place with your US clients

Pre-screened on AI. Remote B2B contracts. View 5 full profiles free — AI score, skills report, interview questions included.

Executive Summary

This candidate is a highly experienced market intelligence analyst and data scientist whose profile reflects deep expertise in business analytics, sales modeling, and statistical analysis — but falls significantly short of the LLM Evaluation Engineer requirements. their ML background is real but applied to tabular/time-series forecasting domains, with no evidence of engagement with large language models, generative AI, prompt engineering, or RAG systems. Despite Python proficiency and a strong analytical foundation, the core technical requirements for this specialized role are largely unmet. While their transferable skills in statistical testing and ML fundamentals could eventually support an LLM evaluation career with significant upskilling, they is not positioned for this role at present. A NOT_FIT decision is recommended at this time.

Top Strengths

  • Strong analytical and quantitative mindset with 15 years of cross-industry experience
  • Real-world ML modeling experience (LSTM, PySpark ALS, forest models, clustering) applied to business problems
  • Solid Python and SQL skills that form a foundation for growth into ML engineering
  • Statistical rigor including hypothesis testing, cointegration, regression, and time-series forecasting
  • Demonstrated ability to collaborate with cross-functional teams and communicate technical findings to stakeholders

Key Concerns

  • !Zero demonstrated experience with LLMs, generative AI, prompt engineering, or any aspect of the core job requirements — this is a fundamental role mismatch
  • !Career identity is Market Intelligence / Business Analyst, not ML Engineer; making this transition would require significant upskilling not evidenced in the application

Culture Fit

52%

Growth Potential

Moderate

Salary Estimate

$60k-$80k

Assessment Reasoning

The NOT_FIT decision is driven by a fundamental mismatch between the candidate's background and the core requirements of the LLM Evaluation Engineer role. The position requires demonstrated expertise in LLM evaluation frameworks, prompt engineering, RAG evaluation, and model benchmarking — skills that are entirely absent from The candidate's profile. While they has genuine Python and ML experience, it is anchored in business analytics and sales modeling (PySpark ALS, LSTM for churn prediction, market sizing), not in NLP, generative AI, or language model quality assurance. The role requires 3+ years specifically in ML evaluation, testing, or QA; Artem has 0 years in any of these areas. Additionally, the absence of LinkedIn, GitHub, or any public technical presence makes it impossible to identify any hidden LLM experience. their overall score of 38 falls well below the 50-point BORDERLINE threshold. Artem may be a strong candidate for a Data Analyst or ML Analyst role in a different context, but they does not meet the minimum requirements for this position.

Interview Focus Areas

Understanding of LLMs and whether candidate has any unreported hands-on experience with generative AI tools or evaluationMotivation for career pivot into LLM evaluation engineering and what self-directed learning has been doneDepth of Python engineering skills beyond scripting — can they write maintainable, testable code for pipelines?Familiarity with or willingness to rapidly learn LLM evaluation platforms (LangSmith, Arize, W&B)

Code Review

FairJunior Level

No code samples or GitHub profile were provided, limiting direct code quality assessment. Based on resume evidence, Artem writes Python at an analytical/scripting level rather than as a software engineer. their code appears functional for data processing tasks but there is no indication of the clean, maintainable, production-quality Python evaluation pipelines this role requires.

PythonPandasPySparkNumPySQLPostgreSQLJavaScriptArangoDBGit
  • +Demonstrable Python usage across multiple roles including Pandas, PySpark, FuzzyWuzzy, and NumPy
  • +Evidence of working with APIs (Swagger, REST, JavaScript Foxx Microservices) suggesting some software engineering exposure
  • +Patent co-authorship (pattern logging and virtual periphery) hints at some engineering depth
  • -No GitHub profile or public code samples to assess actual code quality, structure, or style
  • -Python usage described is primarily scripting/analytical (Jupyter notebooks, Pandas transformations) rather than production-grade engineering
  • -No evidence of writing evaluation pipelines, test automation, or software engineering best practices
  • -No familiarity with PyTorch/Transformers or OpenAI/Anthropic APIs visible in the resume

Experience Overview

15y total · 2y relevant

This candidate is a seasoned market intelligence and business systems analyst with meaningful Python and ML exposure, but their background is fundamentally misaligned with the LLM Evaluation Engineer role. Despite 15 years of professional experience, virtually none of it involves large language models, generative AI evaluation, prompt engineering, or RAG systems. their ML work at Acronis is real but oriented toward sales modeling and forecasting rather than model quality assurance or NLP.

Matching Skills

PythonStatistical testingMachine learning fundamentalsData annotation & quality assurance (partial)

Skills to Verify

LLM evaluation frameworksPrompt engineeringModel benchmarkingRAG evaluationLLM-specific testing pipelinesOpenAI API / Anthropic Claude experienceCI/CD for ML pipelinesWeights & Biases / Langsmith / Arize
Candidate information is anonymized. Personal details are hidden for fair evaluation.