Austin, TX
Senior ML Engineer
Engineeringfull timesenior level
AI ScreenedRemote B2BEU Talent Pool194 applicants
For hiring agencies & HR teams
EU engineers, ready to place with your US clients
Pre-screened on AI. Remote B2B contracts. View 5 full profiles free — AI score, skills report, interview questions included.
About This Role
About the Role
We're building ML systems that serve millions of users with sub-100ms latency requirements. As a Senior ML Engineer, you'll own the full lifecycle from research prototypes to production deployment—architecting scalable model serving infrastructure, instrumenting rigorous A/B testing frameworks, and driving technical decisions that directly impact product outcomes. You'll work in Austin's growing ML/AI community, collaborating with data engineers and product teams to turn ambitious ideas into reliable, high-performance systems that solve real business problems.
Our Stack
- Modern Python ML stack: PyTorch · TensorFlow · scikit-learn · pandas · NumPy
- Cloud-native ML infrastructure on AWS, GCP, or Azure with Kubernetes orchestration
- MLOps tooling: MLflow · Kubeflow · feature stores · experiment tracking · model registries
- Observability and monitoring: Datadog · Grafana · custom metrics dashboards for model performance
What You'll Do
- Design and implement end-to-end ML pipelines from data ingestion through model training, evaluation, and production deployment using PyTorch/TensorFlow and cloud platforms
- Own the architecture and performance optimization of model serving infrastructure, ensuring sub-100ms latency and 99.9% uptime for user-facing ML features
- Establish rigorous experimentation frameworks including A/B testing, statistical analysis, and continuous monitoring to validate model improvements with measurable business impact
- Drive technical roadmap decisions for ML infrastructure, evaluating emerging tools and adopting those that pragmatically advance system capabilities
- Analyze production model behavior through deep metric analysis, debug performance degradation, and iterate rapidly to maintain model quality as data distributions shift
- Collaborate with backend engineers to integrate ML predictions into product APIs, and partner with data engineers to build robust data pipelines feeding model training at scale
- Mentor team members through technical design reviews, code reviews, and knowledge sharing on ML systems best practices
What We're Looking For
- 5+ years building and deploying machine learning systems in production—you've debugged model serving latency at 3am, not just trained models in notebooks
- Deep expertise in Python ML frameworks (PyTorch or TensorFlow) with focus on model optimization, efficient training pipelines, and production-grade code quality
- Hands-on experience with cloud ML infrastructure (AWS SageMaker, GCP Vertex AI, or Azure ML)—you've architected scalable model deployment pipelines, not just followed tutorials
- Strong fundamentals in system design and distributed computing—ability to reason about tradeoffs in model serving architecture, caching strategies, and fault tolerance
- Production experience with containerization and orchestration (Docker, Kubernetes) for ML workloads—you understand resource allocation, autoscaling, and cost optimization
- Proficiency in MLOps practices and tools (MLflow, Kubeflow, or similar) with track record of building reproducible training pipelines and automated model evaluation frameworks
- Solid SQL skills and data pipeline experience—ability to work closely with data engineers to ensure high-quality training data and efficient feature engineering
- Demonstrated ability to translate business requirements into technical ML solutions, then drive them from experimentation through production deployment with measurable impact
Nice to Have
- Experience building and maintaining real-time model serving infrastructure with sub-100ms latency requirements and high availability guarantees
- Track record of A/B testing ML models in production and using rigorous statistical analysis to validate improvements before full rollout
- Familiarity with infrastructure-as-code (Terraform, CloudFormation) and GitOps workflows for managing ML infrastructure reproducibly
Requirements
- 5+ years building and deploying machine learning systems in production environments — you've debugged model serving latency at 3am, not just trained models in notebooks
- Deep expertise in Python ML frameworks (PyTorch or TensorFlow) with focus on model optimization, efficient training pipelines, and production-grade code quality
- Hands-on experience with cloud ML infrastructure (AWS SageMaker, GCP Vertex AI, or Azure ML) — you've architected scalable model deployment pipelines, not just followed tutorials
- Strong fundamentals in system design and distributed computing — ability to reason about tradeoffs in model serving architecture, caching strategies, and fault tolerance
- Production experience with containerization and orchestration (Docker, Kubernetes) for ML workloads — you understand resource allocation, autoscaling, and cost optimization
- Proficiency in MLOps practices and tools (MLflow, Kubeflow, or similar) with track record of building reproducible training pipelines and automated model evaluation frameworks
- Solid SQL skills and data pipeline experience — ability to work closely with data engineers to ensure high-quality training data and efficient feature engineering
- Demonstrated ability to translate business requirements into technical ML solutions, then drive them from experimentation through production deployment with measurable impact
Required Skills
PythonPyTorchTensorFlowAWSDockerKubernetesMLflowSQL
Pre-screened Candidates
6688
Senior Machine Learning ResearcherFit
Mar 2
85
Senior AI/ML EngineerFit
Mar 2
85
Senior Research EngineerFit
Feb 25
85
Senior ML EngineerFit
Feb 25
85
Senior Machine Learning EngineerFit
Feb 25
+61 more pre-screened candidates
Free account · Full AI reports · Interview questions
All profiles are anonymized for fair evaluation
