San Francisco, CA

Director of AI Engineering

Engineeringfull timelead level
AI ScreenedRemote B2BEU Talent Pool56 applicants
For hiring agencies & HR teams

EU engineers, ready to place with your US clients

Pre-screened on AI. Remote B2B contracts. View 5 full profiles free — AI score, skills report, interview questions included.

About This Role

About the Role You will build and lead the team that transforms cutting-edge AI research into production systems serving millions of users. This is a dual-track leadership role: you'll architect the technical foundation for our ML platform while growing a high-performing engineering organization. You'll own the full AI/ML roadmap, balancing innovation with reliability as we scale infrastructure to handle exponential growth. The gap between research breakthroughs and battle-tested production systems is where we need you. Our Stack - Modern ML frameworks: PyTorch · TensorFlow · Hugging Face · MLflow · Kubernetes - Cloud-native infrastructure on AWS (SageMaker, ECS, Lambda) with observability via Datadog and Grafana - Cutting-edge MLOps: feature stores, automated retraining pipelines, shadow deployments, and real-time model monitoring - Collaborative tools: GitHub · Terraform · Linear · Notion for documentation and cross-functional alignment What You'll Do - Define and execute the technical vision for AI/ML infrastructure, establishing architectural patterns that balance research velocity with production reliability - Build and scale a high-performing AI engineering team, recruiting senior talent and developing engineering leaders who will shape our technical culture - Drive strategic technical decisions across the ML lifecycle—from experimentation frameworks and model serving infrastructure to monitoring, A/B testing, and cost optimization at scale - Partner with research, product, and platform teams to translate business objectives into technical roadmaps, navigating ambiguity and aligning stakeholders across the organization - Architect production ML systems using PyTorch/TensorFlow, Kubernetes, and cloud ML platforms (AWS SageMaker/GCP Vertex AI), ensuring models perform reliably under real-world conditions - Establish MLOps practices and tooling that accelerate iteration cycles while maintaining rigorous standards for model quality, fairness, and observability - Mentor senior engineers and engineering managers, fostering analytical rigor and innovative problem-solving across technical decision-making What We're Looking For - 10+ years of software engineering experience building production systems, with at least 3 years leading and scaling ML/AI engineering teams through rapid growth - Deep expertise in modern ML frameworks (PyTorch or TensorFlow) with a track record of moving models from research prototypes to production systems serving millions of users - Proven experience designing and operating MLOps infrastructure at scale—CI/CD for ML, model versioning, A/B testing frameworks, and monitoring for model performance drift - Hands-on architecture experience with cloud ML platforms (AWS SageMaker, GCP Vertex AI, or Azure ML) and production model serving infrastructure (Kubernetes-based serving, auto-scaling, latency optimization) - Strategic technical leadership: you've defined multi-quarter ML roadmaps, made build-vs-buy decisions on core infrastructure, and balanced research exploration with production reliability - Strong record of hiring, mentoring, and developing senior ML engineers—you've built high-performing teams and shaped engineering culture through periods of ambiguity - Ability to drive alignment across product, research, and platform teams on complex, long-cycle AI initiatives where requirements evolve as you learn - Bachelor's degree in Computer Science, Engineering, or related field, or equivalent depth of experience building ML systems from the ground up Nice to Have - Experience in fintech or payments domain, particularly building ML systems for fraud detection, risk modeling, or transaction processing at scale - Prior experience at a high-growth startup or tech company where you navigated the transition from MVP to scaled production ML infrastructure - Contributions to open-source ML tooling or active participation in the ML research/engineering community (conference talks, publications, thought leadership)

Requirements

  • 10+ years of software engineering experience building production systems, with at least 3 years leading and scaling ML/AI engineering teams through rapid growth
  • Deep expertise in modern ML frameworks (PyTorch or TensorFlow) with a track record of moving models from research prototypes to production systems serving millions of users
  • Proven experience designing and operating MLOps infrastructure at scale — CI/CD for ML, model versioning, A/B testing frameworks, and monitoring for model performance drift
  • Hands-on architecture experience with cloud ML platforms (AWS SageMaker, GCP Vertex AI, or Azure ML) and production model serving infrastructure (Kubernetes-based serving, auto-scaling, latency optimization)
  • Strategic technical leadership: you've defined multi-quarter ML roadmaps, made build-vs-buy decisions on core infrastructure, and balanced research exploration with production reliability
  • Strong record of hiring, mentoring, and developing senior ML engineers — you've built high-performing teams and shaped engineering culture through periods of ambiguity
  • Ability to drive alignment across product, research, and platform teams on complex, long-cycle AI initiatives where requirements evolve as you learn
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent depth of experience building ML systems from the ground up

Required Skills

PyTorchTensorFlowMLOpsAWS SageMakerKubernetesModel ServingTeam LeadershipTechnical Strategy

Pre-screened Candidates

8

All profiles are anonymized for fair evaluation