Platform Engineer

Engineeringfull timeAustin, TXsenior level

About This Role

About the Role We're looking for an experienced platform engineer to architect and own the infrastructure foundations that power our engineering organization. You'll design fault-tolerant systems using Kubernetes, Terraform, and modern observability tools while directly impacting developer velocity across dozens of teams. This role offers high autonomy to drive platform strategy, tackle complex distributed systems challenges, and shape our cloud-native infrastructure as we scale. What You'll Do - Own end-to-end platform reliability: monitoring, alerting, incident response, and systematic analysis to eliminate failure modes - Architect and implement cloud infrastructure using Kubernetes, Terraform, and AWS, establishing patterns that balance reliability with developer autonomy - Design and build CI/CD pipelines and developer tooling that reduce deployment friction and enable teams to ship confidently - Investigate production performance bottlenecks using observability data, then propose and implement infrastructure optimizations that measurably improve system behavior - Evaluate emerging cloud-native technologies and infrastructure patterns, advocating for adoption when they solve real problems better than current approaches - Partner with engineering teams to understand their infrastructure needs, translating requirements into platform capabilities with clear interfaces and documentation - Mentor engineers on infrastructure best practices, lead design reviews for platform changes, and establish standards that scale What We're Looking For - 6-8 years of production infrastructure engineering experience - building and operating platforms that support engineering teams at scale - Deep Kubernetes expertise - you've architected multi-cluster deployments, debugged CNI issues in production, and can articulate tradeoffs from real battle scars - Strong Infrastructure as Code proficiency with Terraform - you write modular, reusable configurations and have managed state across multiple environments with systematic change control - Hands-on AWS experience - designing fault-tolerant architectures using VPCs, IAM, ECS/EKS, RDS, S3, and other core services. You understand the cost and reliability implications of your design choices - Proven ability to build and maintain CI/CD pipelines - (GitHub Actions, GitLab CI, or Jenkins) that enable safe, frequent deployments with rollback strategies baked in - Fluency in Python, Go, or Bash for automation and tooling - you build internal tools that eliminate toil and improve developer workflows - Strong analytical problem-solving skills with systematic approaches to diagnosing distributed system failures, analyzing performance bottlenecks, and validating infrastructure changes through observability data - Self-directed ownership mindset - you've independently scoped, designed, and delivered infrastructure projects that had measurable impact on engineering velocity or system reliability Nice to Have - Experience with observability tooling (Prometheus, Grafana, Datadog, or ELK stack) and using metrics to drive architecture decisions and capacity planning - Familiarity with GitOps workflows (ArgoCD, Flux) or service mesh architectures (Istio, Consul) in production environments - Prior work in high-growth organizations where you helped evolve infrastructure from startup-scale to enterprise-grade reliability

Requirements

**6-8 years of production infrastructure engineering experience**, building and operating platforms that support engineering teams at scale
**Deep Kubernetes expertise** — you've architected multi-cluster deployments, debugged CNI issues in production, and can articulate tradeoffs between StatefulSets vs. Deployments from real battle scars
**Strong Infrastructure as Code proficiency with Terraform** — you write modular, reusable configurations and have managed state across multiple environments with systematic change control
**Hands-on AWS experience** designing fault-tolerant architectures using VPCs, IAM, ECS/EKS, RDS, S3, and other core services — you understand the cost and reliability implications of your design choices
**Proven ability to build and maintain CI/CD pipelines** (GitHub Actions, GitLab CI, or Jenkins) that enable safe, frequent deployments with rollback strategies baked in
**Fluency in Python, Go, or Bash for automation and tooling** — you build internal tools that eliminate toil and improve developer workflows, not just one-off scripts
**Strong analytical problem-solving skills** with systematic approaches to diagnosing distributed system failures, analyzing performance bottlenecks, and validating infrastructure changes through observability data
**Self-directed ownership mindset** — you've independently scoped, designed, and delivered infrastructure projects that had measurable impact on engineering velocity or system reliability

Skills

KubernetesTerraformAWSDockerPythonCI/CDObservability

Platform Engineer

About This Role

Requirements

Skills

Check your profile with AI