Machine Learning, Platform Engineer
Company: Together AI
Location: San Francisco
Posted on: April 1, 2026
|
|
|
Job Description:
About the Role Our team focuses on enabling custom models and
dedicated inference on Together. We are responsible for building a
container platform, optimizing autoscaling, minimizing cold starts,
achieving the best end-to-end model performance, and providing a
best-in-class developer experience with great tooling. We often
focus on video or audio generation across the stack: CUDA kernels,
pytorch optimization, inference engines, container orchestration,
queueing theory, etc. An ideal candidate will be great at
profiling/optimization but know the word kubernetes, or be
intimately familiar with multi-cluster scheduling and have some
sense of ML bottlenecks. Responsibilities New hires may work on
multi-cluster orchestration, portfolio optimization, predictive
autoscaling, control panes, model bring-up, model optimization,
APIs for managing deployments, inference worker SDKs, and CLI
tools. Analyze and improve the robustness and scalability of
existing distributed systems, APIs, databases, and infrastructure
Partner with product teams to understand functional requirements
and deliver solutions that meet business needs Write clear,
well-tested, and maintainable software and IaC for both new and
existing systems Conduct design and code reviews, create developer
documentation, and develop testing strategies for robustness and
fault tolerance Requirements 5 years of demonstrated experience in
building large scale, fault tolerant, distributed systems.
Experience running serverless inference platforms, doing model
bring-up on short notice, being on call, or running a cloud
provider is a very big plus Good taste and ability to thoughtfully
discuss how what you’ve built has failed over time Experience
designing, analyzing and improving efficiency, scalability, and
stability of various system resources Excellent understanding of
low level operating systems concepts including concurrency,
networking and storage, performance and scale Expert-level
programmer in one or more of Python, Golang, Rust, C++, or Haskell
Proficiency in writing and maintaining Infrastructure as Code (IaC)
using tools like Terraform Experience with Kubernetes internals or
other container orchestration systems Sound judgement for when to
use and when to not use LLMs for code Bachelor’s or Master’s degree
in Computer Science, Computer Engineering, or a related technical
field, or equivalent practical experience Writing-heavy roles or
companies are a plus About Together AI Together AI is a
research-driven artificial intelligence company. We believe open
and transparent AI systems will drive innovation and create the
best outcomes for society, and together we are on a mission to
significantly lower the cost of modern AI systems by co-designing
software, hardware, algorithms, and models. We have contributed to
leading open-source research, models, and datasets to advance the
frontier of AI, and our team has been behind technological
advancement such as FlashAttention, Hyena, FlexGen, and RedPajama.
We invite you to join a passionate group of researchers and
engineers in our journey in building the next generation AI
infrastructure. Compensation We offer competitive compensation,
startup equity, health insurance and other competitive benefits.
The US base salary range for this full-time position is: $160,000 -
$250,000 equity benefits. Our salary ranges are determined by
location, level and role. Individual compensation will be
determined by experience, skills, and job-related knowledge. Equal
Opportunity Together AI is an Equal Opportunity Employer and is
proud to offer equal employment opportunity to everyone regardless
of race, color, ancestry, religion, sex, national origin, sexual
orientation, age, citizenship, marital status, disability, gender
identity, veteran status, and more. Please see our privacy policy
at https://www.together.ai/privacy
Keywords: Together AI, Watsonville , Machine Learning, Platform Engineer, IT / Software / Systems , San Francisco, California