Jobs at Inferact (Now Hiring) — 10 open

Member of Technical Staff, TPU Performance Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of…

Skills: TPU Performance Engineering, JAX, XLA, Pallas, ML Kernel Optimization

Inferact

Member of Technical Staff, AMD GPU Performance Engineering

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: ROCm, HIP, Triton, CK, AITER

Inferact

Member of Technical Staff, AMD GPU Performance Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: AMD GPU Optimization, TPU Performance Engineering, ROCm, HIP, Triton

Inferact

Member of Technical Staff, Kernel Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: CUDA, C++, Python, GPU Architecture, Kernel Optimization

Inferact

Member of Technical Staff, Inference

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Python, PyTorch, vLLM, TensorRT-LLM, SGLang

Inferact

Member of Technical Staff, Performance and Scale

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Rust, Go, C++, Distributed Systems, Network Protocols

Inferact

Member of Technical Staff, Cloud Orchestration

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Kubernetes, Container Orchestration, Kubernetes Operators, Python, Rust

Inferact

Member of Technical Staff, Inference

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Python, PyTorch, vLLM, TensorRT-LLM, SGLang

Inferact

Member of Technical Staff, Developer Relations

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Overview Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the inters…

Skills: LLM Inference Systems, Developer Relations, GPU Serving, Technical Writing, Model Serving

Inferact

Member of Technical Staff, TPU Performance Engineering

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: AMD GPU Optimization, TPU Performance Engineering, ROCm, HIP, Triton

Member of Technical Staff, TPU Performance Engineering

Inferact

Singapore, Singapore • On-site

Apply

SeniorVisa sponsorship

Tired of cold applications?

Sign up with Clera and we'll reach out the moment a role actually fits you — no more spraying applications into the void.

$200k–$400k/yr
Full-time
bachelor degree
Medical coverage, Dental coverage, Vision coverage, Equity
Visa sponsorship available
Posted 2d ago
~40 hrs/week

Responsibilities

Build and optimize TPU backends, compiler integrations, and runtime paths to make vLLM a first-class inference engine on Google TPUs. Improve production-relevant model serving by optimizing kernels and benchmarking latency and throughput.

Requirements

Requires a Bachelor's degree in CS or a related field and hands-on experience optimizing TPU workloads using JAX, XLA, or Pallas. Candidates should have deep knowledge of TPU execution, memory behavior, and ML kernel optimization.

Full job description

About the Role

We're looking for a TPU performance engineer to make vLLM a first-class inference engine on Google TPUs. You'll build and optimize TPU backends, compiler integrations, runtime paths, and benchmarking infrastructure using JAX, XLA, Pallas, and related tooling so vLLM can deliver frontier inference performance on TPU hardware.

You'll work at the boundary of inference systems, kernels, compilers, and hardware architecture, improving production-relevant model serving on TPU with clear correctness, latency, and throughput benchmarks. Your work will help make TPU support in vLLM usable, fast, benchmarked, and maintainable.

Skills and Qualifications

Minimum qualifications:

Bachelor's degree or equivalent experience in computer science, engineering, systems, machine learning, or similar.
Hands-on experience building or optimizing TPU workloads using JAX, XLA, Pallas, or related compiler and runtime tooling.
Deep understanding of TPU execution, memory behavior, compilation, and performance constraints for ML workloads.
Experience optimizing ML kernels or inference paths such as attention, GEMM, sampling, KV cache, fused kernels, or backend runtime paths.
Strong performance profiling and benchmarking skills, with the ability to use measurements, compiler artifacts, correctness tests, and reproducible benchmarks to guide optimization work.

Preferred qualifications:

Experience with vLLM, SGLang, TensorRT-LLM, XLA-based serving, or other LLM inference systems.
Familiarity with batching, KV cache, decoding, serving tradeoffs, and backend performance constraints in production inference systems.
Experience with compiler technologies such as XLA, MLIR, LLVM, Pallas, or other kernel DSLs, including lowering, fusion, and backend code generation.
Knowledge of quantization methods such as INT8, FP8, mixed precision, or TPU-specific numeric formats, including accuracy and performance tradeoffs.

Bonus points if you have:

Contributed to vLLM, JAX/XLA, Pallas, PyTorch/XLA, compiler projects, or other open-source ML infrastructure.
Built TPU benchmarking infrastructure or automated performance regression detection for accelerator workloads.
Worked directly with Google TPU ecosystem stakeholders, accelerator platform teams, or early-access programs to ship backend, compiler, or inference performance improvements.

Logistics

Location: This role is based in Singapore.
Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is S$200,000 to S$400,000 annually + equity.
Visa sponsorship: We sponsor visas on a case-by-case basis.
Benefits: Inferact offers a generous benefits package, including medical, dental, and vision coverage.

Related keywords

TPUvLLMJAXXLAPallasMLIRLLVMGEMMKV CacheAttentionFused KernelsQuantizationINT8FP8Mixed PrecisionInference Engine

About Inferact

LinkedIn Visit site

Building the future of inference

Industry: Software Development
Company size: 11-50 employees
Founded: 2025
Headquarters: San Francisco, CA
LinkedIn followers: 2,863
Total funding: $150M

Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster.

Offices: San Francisco, CA, US

Information TechnologySoftwareArtificial Intelligence (AI)

View all jobs at Inferact

About Inferact

LinkedIn Visit site

Building the future of inference

Industry: Software Development
Company size: 11-50 employees
Founded: 2025
Headquarters: San Francisco, CA
LinkedIn followers: 2,863
Total funding: $150M

Offices: San Francisco, CA, US

Information TechnologySoftwareArtificial Intelligence (AI)

View all jobs at Inferact

Similar companies hiring

Amazon (4949)Prolific (3401)AgileEngine (1668)Bosch (1656)Speechify (1456)Google (969)Booz Allen Hamilton (777)Microsoft (721)Transport AI (669)SAP (579)Salesforce (514)Meta (456)

·Dashboard

Jobs at Inferact (Now Hiring) — 10 open

Inferact

Member of Technical Staff, TPU Performance Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: TPU Performance Engineering, JAX, XLA, Pallas, ML Kernel Optimization

Inferact

Member of Technical Staff, AMD GPU Performance Engineering

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: ROCm, HIP, Triton, CK, AITER

Inferact

Member of Technical Staff, AMD GPU Performance Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: AMD GPU Optimization, TPU Performance Engineering, ROCm, HIP, Triton

Inferact

Member of Technical Staff, Kernel Engineering

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: CUDA, C++, Python, GPU Architecture, Kernel Optimization

Inferact

Member of Technical Staff, Inference

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Python, PyTorch, vLLM, TensorRT-LLM, SGLang

Inferact

Member of Technical Staff, Performance and Scale

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Rust, Go, C++, Distributed Systems, Network Protocols

Inferact

Member of Technical Staff, Cloud Orchestration

Singapore, Singapore · On-site

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Kubernetes, Container Orchestration, Kubernetes Operators, Python, Rust

Inferact

Member of Technical Staff, Inference

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: Python, PyTorch, vLLM, TensorRT-LLM, SGLang

Inferact

Member of Technical Staff, Developer Relations

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: LLM Inference Systems, Developer Relations, GPU Serving, Technical Writing, Model Serving

Inferact

Member of Technical Staff, TPU Performance Engineering

San Francisco, California, United States · Remote OK

$200k–$400k/yr

SeniorVisa sponsorship$150M raised

Skills: AMD GPU Optimization, TPU Performance Engineering, ROCm, HIP, Triton

Member of Technical Staff, TPU Performance Engineering

Inferact

Singapore, Singapore • On-site

Apply

SeniorVisa sponsorship

Tired of cold applications?

Sign up with Clera and we'll reach out the moment a role actually fits you — no more spraying applications into the void.

$200k–$400k/yr
Full-time
bachelor degree
Medical coverage, Dental coverage, Vision coverage, Equity
Visa sponsorship available
Posted 2d ago
~40 hrs/week

Responsibilities

Requirements

Full job description

About the Role

Skills and Qualifications

Minimum qualifications:

Bachelor's degree or equivalent experience in computer science, engineering, systems, machine learning, or similar.
Hands-on experience building or optimizing TPU workloads using JAX, XLA, Pallas, or related compiler and runtime tooling.
Deep understanding of TPU execution, memory behavior, compilation, and performance constraints for ML workloads.
Experience optimizing ML kernels or inference paths such as attention, GEMM, sampling, KV cache, fused kernels, or backend runtime paths.
Strong performance profiling and benchmarking skills, with the ability to use measurements, compiler artifacts, correctness tests, and reproducible benchmarks to guide optimization work.

Preferred qualifications:

Experience with vLLM, SGLang, TensorRT-LLM, XLA-based serving, or other LLM inference systems.
Familiarity with batching, KV cache, decoding, serving tradeoffs, and backend performance constraints in production inference systems.
Experience with compiler technologies such as XLA, MLIR, LLVM, Pallas, or other kernel DSLs, including lowering, fusion, and backend code generation.
Knowledge of quantization methods such as INT8, FP8, mixed precision, or TPU-specific numeric formats, including accuracy and performance tradeoffs.

Bonus points if you have:

Contributed to vLLM, JAX/XLA, Pallas, PyTorch/XLA, compiler projects, or other open-source ML infrastructure.
Built TPU benchmarking infrastructure or automated performance regression detection for accelerator workloads.
Worked directly with Google TPU ecosystem stakeholders, accelerator platform teams, or early-access programs to ship backend, compiler, or inference performance improvements.

Logistics

Location: This role is based in Singapore.
Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is S$200,000 to S$400,000 annually + equity.
Visa sponsorship: We sponsor visas on a case-by-case basis.
Benefits: Inferact offers a generous benefits package, including medical, dental, and vision coverage.

Related keywords

TPUvLLMJAXXLAPallasMLIRLLVMGEMMKV CacheAttentionFused KernelsQuantizationINT8FP8Mixed PrecisionInference Engine

About Inferact

LinkedIn Visit site

Building the future of inference

Industry: Software Development
Company size: 11-50 employees
Founded: 2025
Headquarters: San Francisco, CA
LinkedIn followers: 2,863
Total funding: $150M

Offices: San Francisco, CA, US

Information TechnologySoftwareArtificial Intelligence (AI)

View all jobs at Inferact

About Inferact

LinkedIn Visit site

Building the future of inference

Industry: Software Development
Company size: 11-50 employees
Founded: 2025
Headquarters: San Francisco, CA
LinkedIn followers: 2,863
Total funding: $150M

Offices: San Francisco, CA, US

Information TechnologySoftwareArtificial Intelligence (AI)

View all jobs at Inferact

Similar companies hiring

Amazon (4949)Prolific (3401)AgileEngine (1668)Bosch (1656)Speechify (1456)Google (969)Booz Allen Hamilton (777)Microsoft (721)Transport AI (669)SAP (579)Salesforce (514)Meta (456)

Jobs at Inferact (Now Hiring) — 10 open

Member of Technical Staff, TPU Performance Engineering

Member of Technical Staff, AMD GPU Performance Engineering

Member of Technical Staff, AMD GPU Performance Engineering

Member of Technical Staff, Kernel Engineering

Member of Technical Staff, Inference

Member of Technical Staff, Performance and Scale

Member of Technical Staff, Cloud Orchestration

Member of Technical Staff, Inference

Member of Technical Staff, Developer Relations

Member of Technical Staff, TPU Performance Engineering

Member of Technical Staff, TPU Performance Engineering

Tired of cold applications?

Responsibilities

Requirements

Full job description

About the Role

Skills and Qualifications

Logistics

Related keywords

About Inferact

About Inferact

Similar companies hiring

Tools

Explore

Company

Jobs at Inferact (Now Hiring) — 10 open

Member of Technical Staff, TPU Performance Engineering

Member of Technical Staff, AMD GPU Performance Engineering

Member of Technical Staff, AMD GPU Performance Engineering

Member of Technical Staff, Kernel Engineering

Member of Technical Staff, Inference

Member of Technical Staff, Performance and Scale

Member of Technical Staff, Cloud Orchestration

Member of Technical Staff, Inference

Member of Technical Staff, Developer Relations

Member of Technical Staff, TPU Performance Engineering

Member of Technical Staff, TPU Performance Engineering

Tired of cold applications?

Responsibilities

Requirements

Full job description

About the Role

Skills and Qualifications

Logistics

Related keywords

About Inferact

About Inferact

Similar companies hiring

Tools

Explore

Company