Boston, MA
MLOps Engineer
Engineeringfull timesenior level
AI ScreenedRemote B2BEU Talent Pool24 applicants
For hiring agencies & HR teams
EU engineers, ready to place with your US clients
Pre-screened on AI. Remote B2B contracts. View 5 full profiles free — AI score, skills report, interview questions included.
About This Role
About the Role
You will own the complete ML lifecycle infrastructure that bridges research and production at scale. This is a systems-ownership role where you'll architect deployment pipelines, build automation tooling, and solve observability challenges at the intersection of data science and platform engineering. Working autonomously with high-impact decisions, you'll design solutions that accelerate experimentation cycles while ensuring production model reliability. Your work will directly shape how data scientists move from notebooks to deployed models, and how those models maintain performance under real-world conditions.
Our Stack
- ML Platform: Python · MLflow/Kubeflow · Ray · Model registries · Feature stores
- Infrastructure: Kubernetes · Docker · Terraform · Multi-cloud (AWS/GCP/Azure) · Service mesh
- CI/CD & Automation: GitHub Actions · ArgoCD · Custom build tooling · Infrastructure testing frameworks
- Observability: Prometheus · Grafana · Datadog · Distributed tracing · Custom metrics pipelines
What You'll Do
- Architect and implement end-to-end ML deployment pipelines using Kubernetes, Docker, and infrastructure-as-code (Terraform), ensuring scalability and reliability across training, serving, and monitoring workflows
- Design novel automation tooling and frameworks that reduce model deployment time from weeks to hours, enabling rapid iteration for data science teams
- Analyze production model performance and deployment bottlenecks through deep instrumentation with Prometheus, Grafana, and custom observability solutions; implement data-driven optimizations
- Own MLOps platform strategy and technical direction, evaluating tradeoffs between MLflow, Kubeflow, and emerging tools to establish standards across the organization
- Partner extensively with data scientists and ML engineers to translate research requirements into production-ready infrastructure, adapting solutions to diverse model architectures and deployment patterns
- Maintain uptime and correctness for production ML services, responding to incidents with root cause analysis and implementing preventive measures to ensure model predictions remain accurate
- Drive CI/CD best practices for ML workflows on AWS/GCP/Azure, establishing rigorous testing frameworks that validate model behavior before production release
What We're Looking For
- 5–8 years building and operating production software systems, with significant focus on infrastructure, deployment pipelines, or platform engineering
- Deep expertise in Kubernetes and container orchestration — you've designed multi-tenant clusters, debugged complex networking issues, and optimized resource allocation in production environments
- Strong Python proficiency for building automation tooling, data pipelines, and production services — not just scripting, but designing maintainable systems
- Hands-on experience with at least one major cloud platform (AWS, GCP, or Azure) including IAM, networking, storage, and compute services — you understand cost optimization and security best practices
- Proven track record implementing CI/CD pipelines for complex systems — you've automated testing, deployment, and rollback strategies with minimal manual intervention
- Solid foundation in ML/data science workflows — familiarity with model training, evaluation, versioning, and deployment lifecycle (prior MLOps or ML platform experience strongly preferred)
- Strong analytical and debugging skills — ability to diagnose performance bottlenecks, resource contention, and failure modes across distributed systems using metrics, logs, and traces
Nice to Have
- Experience with ML experiment tracking and model management tools (MLflow, Kubeflow, Weights & Biases, or similar platforms)
- Proficiency with infrastructure-as-code tools like Terraform, Pulumi, or CloudFormation for managing cloud resources declaratively
- Hands-on experience building observability solutions with Prometheus, Grafana, or modern APM tools — you've designed dashboards and alerting strategies that balance signal and noise
- Open-source contributions to ML infrastructure projects (Kubeflow, Ray, MLflow, etc.)
- Custom Kubernetes operators or controllers to automate ML workflows
Requirements
- 5-8 years building and operating production software systems, with significant focus on infrastructure, deployment pipelines, or platform engineering
- Deep expertise in Kubernetes and container orchestration — you've designed multi-tenant clusters, debugged complex networking issues, and optimized resource allocation in production environments
- Strong Python proficiency for building automation tooling, data pipelines, and production services — not just scripting, but designing maintainable systems
- Hands-on experience with at least one major cloud platform (AWS, GCP, or Azure) including IAM, networking, storage, and compute services — you understand cost optimization and security best practices
- Proven track record implementing CI/CD pipelines for complex systems — you've automated testing, deployment, and rollback strategies with minimal manual intervention
- Solid foundation in ML/data science workflows — familiarity with model training, evaluation, versioning, and deployment lifecycle (prior MLOps or ML platform experience strongly preferred)
- Strong analytical and debugging skills — ability to diagnose performance bottlenecks, resource contention, and failure modes across distributed systems using metrics, logs, and traces
Required Skills
KubernetesPythonDockerTerraformMLflowCI/CDAWSPrometheus
Pre-screened Candidates
685
Senior Machine Learning & Platform EngineerFit
Feb 25
68
AI/ML EngineerReview
Feb 25
68
Senior AI & Backend EngineerReview
Feb 25
65
Senior AI Full Stack DeveloperReview
Feb 25
65
Data ScientistReview
Feb 25
+1 more pre-screened candidate
Free account · Full AI reports · Interview questions
All profiles are anonymized for fair evaluation
