Clera home
·Dashboard

Jobs at Inferact (Now Hiring) — 10 open

Inferact logoInferact

Member of Technical Staff, TPU Performance Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: TPU Performance Engineering, JAX, XLA, Pallas, ML Kernel Optimization

Inferact logoInferact

Member of Technical Staff, AMD GPU Performance Engineering

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: ROCm, HIP, Triton, CK, AITER

Inferact logoInferact

Member of Technical Staff, AMD GPU Performance Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: AMD GPU Optimization, TPU Performance Engineering, ROCm, HIP, Triton

Inferact logoInferact

Member of Technical Staff, Kernel Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: CUDA, C++, Python, GPU Architecture, Kernel Optimization

Inferact logoInferact

Member of Technical Staff, Inference

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: Python, PyTorch, vLLM, TensorRT-LLM, SGLang

Inferact logoInferact

Member of Technical Staff, Performance and Scale

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: Rust, Go, C++, Distributed Systems, Network Protocols

Inferact logoInferact

Member of Technical Staff, Cloud Orchestration

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: Kubernetes, Container Orchestration, Kubernetes Operators, Python, Rust

Inferact logoInferact

Member of Technical Staff, Inference

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: Python, PyTorch, vLLM, TensorRT-LLM, SGLang

Inferact logoInferact

Member of Technical Staff, Developer Relations

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Overview Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the inters…

Skills: LLM Inference Systems, Developer Relations, GPU Serving, Technical Writing, Model Serving

Inferact logoInferact

Member of Technical Staff, TPU Performance Engineering

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: AMD GPU Optimization, TPU Performance Engineering, ROCm, HIP, Triton

Inferact logo

Member of Technical Staff, TPU Performance Engineering

Inferact

Singapore, Singapore • On-site

Apply
SeniorVisa sponsorship

Tired of cold applications?

Sign up with Clera and we'll reach out the moment a role actually fits you — no more spraying applications into the void.

  • $200k–$400k/yr
  • Full-time
  • bachelor degree
  • Medical coverage, Dental coverage, Vision coverage, Equity
  • Visa sponsorship available
  • Posted 2d ago
  • ~40 hrs/week

Responsibilities

Build and optimize TPU backends, compiler integrations, and runtime paths to make vLLM a first-class inference engine on Google TPUs. Improve production-relevant model serving by optimizing kernels and benchmarking latency and throughput.

Requirements

Requires a Bachelor's degree in CS or a related field and hands-on experience optimizing TPU workloads using JAX, XLA, or Pallas. Candidates should have deep knowledge of TPU execution, memory behavior, and ML kernel optimization.

Full job description

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware, a position that took years to build.

About the Role

We're looking for a TPU performance engineer to make vLLM a first-class inference engine on Google TPUs. You'll build and optimize TPU backends, compiler integrations, runtime paths, and benchmarking infrastructure using JAX, XLA, Pallas, and related tooling so vLLM can deliver frontier inference performance on TPU hardware.

You'll work at the boundary of inference systems, kernels, compilers, and hardware architecture, improving production-relevant model serving on TPU with clear correctness, latency, and throughput benchmarks. Your work will help make TPU support in vLLM usable, fast, benchmarked, and maintainable.

Skills and Qualifications

Minimum qualifications:

  • Bachelor's degree or equivalent experience in computer science, engineering, systems, machine learning, or similar.

  • Hands-on experience building or optimizing TPU workloads using JAX, XLA, Pallas, or related compiler and runtime tooling.

  • Deep understanding of TPU execution, memory behavior, compilation, and performance constraints for ML workloads.

  • Experience optimizing ML kernels or inference paths such as attention, GEMM, sampling, KV cache, fused kernels, or backend runtime paths.

  • Strong performance profiling and benchmarking skills, with the ability to use measurements, compiler artifacts, correctness tests, and reproducible benchmarks to guide optimization work.

Preferred qualifications:

  • Experience with vLLM, SGLang, TensorRT-LLM, XLA-based serving, or other LLM inference systems.

  • Familiarity with batching, KV cache, decoding, serving tradeoffs, and backend performance constraints in production inference systems.

  • Experience with compiler technologies such as XLA, MLIR, LLVM, Pallas, or other kernel DSLs, including lowering, fusion, and backend code generation.

  • Knowledge of quantization methods such as INT8, FP8, mixed precision, or TPU-specific numeric formats, including accuracy and performance tradeoffs.

Bonus points if you have:

  • Contributed to vLLM, JAX/XLA, Pallas, PyTorch/XLA, compiler projects, or other open-source ML infrastructure.

  • Built TPU benchmarking infrastructure or automated performance regression detection for accelerator workloads.

  • Worked directly with Google TPU ecosystem stakeholders, accelerator platform teams, or early-access programs to ship backend, compiler, or inference performance improvements.

Logistics

  • Location: This role is based in Singapore.

  • Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is S$200,000 to S$400,000 annually + equity.

  • Visa sponsorship: We sponsor visas on a case-by-case basis.

  • Benefits: Inferact offers a generous benefits package, including medical, dental, and vision coverage.

Related keywords

TPUvLLMJAXXLAPallasMLIRLLVMGEMMKV CacheAttentionFused KernelsQuantizationINT8FP8Mixed PrecisionInference Engine

About Inferact

LinkedInVisit site

Building the future of inference

Industry
Software Development
Company size
11-50 employees
Founded
2025
Headquarters
San Francisco, CA
LinkedIn followers
2,863
Total funding
$150M

Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster.

Offices: San Francisco, CA, US

Information TechnologySoftwareArtificial Intelligence (AI)
View all jobs at Inferact

About Inferact

LinkedInVisit site

Building the future of inference

Industry
Software Development
Company size
11-50 employees
Founded
2025
Headquarters
San Francisco, CA
LinkedIn followers
2,863
Total funding
$150M

Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster.

Offices: San Francisco, CA, US

Information TechnologySoftwareArtificial Intelligence (AI)
View all jobs at Inferact

Similar companies hiring

Amazon (4949)Prolific (3401)AgileEngine (1668)Bosch (1656)Speechify (1456)Google (969)Booz Allen Hamilton (777)Microsoft (721)Transport AI (669)SAP (579)Salesforce (514)Meta (456)
Clera home

Your AI-talent agent. Connecting talents with dream jobs.

Earn $5,000

Tools

  • Salary Calculator
  • Resume Review
  • Startup Map

Explore

  • Jobs
  • Discover Jobs
  • Companies
  • Acquihire
  • Referral

Company

  • Manifesto
  • Engineering
  • We are hiring!
  • FAQs
  • Blog
  • Press

Tools

  • Salary Calculator
  • Resume Review
  • Startup Map

Explore

  • Jobs
  • Discover Jobs
  • Companies
  • Acquihire
  • Referral

Company

  • Manifesto
  • Engineering
  • We are hiring!
  • FAQs
  • Blog
  • Press

© 2026 Clera Labs, Inc.

PrivacyTermsBug Bounty