Director of AI Engineering

Engineeringfull timeSan Francisco, CAlead level

About This Role

About the Role You will build and lead the team that transforms cutting-edge AI research into production systems serving millions of users. This is a dual-track leadership role: you'll architect the technical foundation for our ML platform while growing a high-performing engineering organization. You'll own the full AI/ML roadmap, balancing innovation with reliability as we scale infrastructure to handle exponential growth. The gap between research breakthroughs and battle-tested production systems is where we need you. Our Stack - Modern ML frameworks: PyTorch · TensorFlow · Hugging Face · MLflow · Kubernetes - Cloud-native infrastructure on AWS (SageMaker, ECS, Lambda) with observability via Datadog and Grafana - Cutting-edge MLOps: feature stores, automated retraining pipelines, shadow deployments, and real-time model monitoring - Collaborative tools: GitHub · Terraform · Linear · Notion for documentation and cross-functional alignment What You'll Do - Define and execute the technical vision for AI/ML infrastructure, establishing architectural patterns that balance research velocity with production reliability - Build and scale a high-performing AI engineering team, recruiting senior talent and developing engineering leaders who will shape our technical culture - Drive strategic technical decisions across the ML lifecycle—from experimentation frameworks and model serving infrastructure to monitoring, A/B testing, and cost optimization at scale - Partner with research, product, and platform teams to translate business objectives into technical roadmaps, navigating ambiguity and aligning stakeholders across the organization - Architect production ML systems using PyTorch/TensorFlow, Kubernetes, and cloud ML platforms (AWS SageMaker/GCP Vertex AI), ensuring models perform reliably under real-world conditions - Establish MLOps practices and tooling that accelerate iteration cycles while maintaining rigorous standards for model quality, fairness, and observability - Mentor senior engineers and engineering managers, fostering analytical rigor and innovative problem-solving across technical decision-making What We're Looking For - 10+ years of software engineering experience building production systems, with at least 3 years leading and scaling ML/AI engineering teams through rapid growth - Deep expertise in modern ML frameworks (PyTorch or TensorFlow) with a track record of moving models from research prototypes to production systems serving millions of users - Proven experience designing and operating MLOps infrastructure at scale—CI/CD for ML, model versioning, A/B testing frameworks, and monitoring for model performance drift - Hands-on architecture experience with cloud ML platforms (AWS SageMaker, GCP Vertex AI, or Azure ML) and production model serving infrastructure (Kubernetes-based serving, auto-scaling, latency optimization) - Strategic technical leadership: you've defined multi-quarter ML roadmaps, made build-vs-buy decisions on core infrastructure, and balanced research exploration with production reliability - Strong record of hiring, mentoring, and developing senior ML engineers—you've built high-performing teams and shaped engineering culture through periods of ambiguity - Ability to drive alignment across product, research, and platform teams on complex, long-cycle AI initiatives where requirements evolve as you learn - Bachelor's degree in Computer Science, Engineering, or related field, or equivalent depth of experience building ML systems from the ground up Nice to Have - Experience in fintech or payments domain, particularly building ML systems for fraud detection, risk modeling, or transaction processing at scale - Prior experience at a high-growth startup or tech company where you navigated the transition from MVP to scaled production ML infrastructure - Contributions to open-source ML tooling or active participation in the ML research/engineering community (conference talks, publications, thought leadership)

Requirements

10+ years of software engineering experience building production systems, with at least 3 years leading and scaling ML/AI engineering teams through rapid growth
Deep expertise in modern ML frameworks (PyTorch or TensorFlow) with a track record of moving models from research prototypes to production systems serving millions of users
Proven experience designing and operating MLOps infrastructure at scale — CI/CD for ML, model versioning, A/B testing frameworks, and monitoring for model performance drift
Hands-on architecture experience with cloud ML platforms (AWS SageMaker, GCP Vertex AI, or Azure ML) and production model serving infrastructure (Kubernetes-based serving, auto-scaling, latency optimization)
Strategic technical leadership: you've defined multi-quarter ML roadmaps, made build-vs-buy decisions on core infrastructure, and balanced research exploration with production reliability
Strong record of hiring, mentoring, and developing senior ML engineers — you've built high-performing teams and shaped engineering culture through periods of ambiguity
Ability to drive alignment across product, research, and platform teams on complex, long-cycle AI initiatives where requirements evolve as you learn
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent depth of experience building ML systems from the ground up

Skills

PyTorchTensorFlowMLOpsAWS SageMakerKubernetesModel ServingTeam LeadershipTechnical Strategy

Director of AI Engineering

About This Role

Requirements

Skills

Check your profile with AI