Speech Research Scientist — Bangalore

Team: Core Speech Research
Location: Bangalore, India
Type: Full-time
Experience: No fixed bar — skill and depth matter more than years

About Smallest.ai

Smallest.ai builds real-time voice intelligence systems operating at enterprise scale.
We work across speech recognition, speech generation, and speech-to-speech systems with a strong focus on low latency, multilingual intelligence, and production reliability.

Our goal is simple: Smaller models. Lower latency. Higher intelligence.

Role Overview

As a Speech Research Scientist, you will work on the core speech stack at Smallest.ai.

You will research, train, evaluate, and productionize models across:

Speech to Text (ASR)
Text to Speech (TTS)
Speech to Speech (S2S)

This is not an offline research role.
You will work at the intersection of research, engineering, and real-world deployment.

Core Research Areas

A. Automatic Speech Recognition (ASR)

Streaming and non-streaming ASR
Multilingual and code-mixed speech
Low-latency decoding and inference
Long-context speech modeling
Robustness to accents, noise, and telephony audio

B. Text to Speech (TTS)

Neural TTS and generative speech models
Controllable speech generation including emotion, style, pitch, rate, and prosody
Speaker adaptation and voice cloning
Stability, expressiveness, and naturalness optimization

C. Speech to Speech (S2S)

End-to-end speech-to-speech models
Streaming voice-to-voice architectures
Codec-based or token-based speech representations
Low-latency conversational speech generation

D. Multilingual and Speaker Intelligence

Multilingual speaker understanding
Cross-lingual speaker embeddings
Speaker identification and verification
Accent and dialect robustness
Low-resource language modeling

E. Multi-Speaker Modeling

Multi-speaker diarization
Overlapping speech detection and separation
Speaker-aware ASR pipelines
Joint diarization and recognition modeling

F. Duplex Conversational Models

Full-duplex speech models
- Simultaneous listening and speaking
- Interruption handling and barge-in detection
Half-duplex conversational models
- Turn detection
- Latency-aware response generation

What You Will Build

Novel model architectures and training strategies
Large-scale multilingual datasets and pipelines
Evaluation frameworks for WER, DER, MOS, latency, and RTF
Streaming inference systems for real-time speech
Research prototypes converted into production models
Your work will directly power live customer-facing systems.

Required Skills

Strong background in speech processing or deep learning
Deep expertise in at least one of the following:
- ASR
- TTS
- Speech-to-speech systems
Strong understanding of modern architectures:
- Transformers, Conformers, diffusion or flow-based models
Experience with CTC, Transducer, attention-based decoding
Strong proficiency in PyTorch
Experience training models at scale

Strong Plus

Multilingual speech experience (Indic or European languages)
Speaker embeddings and diarization systems
Parameter-efficient fine-tuning methods such as LoRA
Streaming inference optimization
Deployment experience using ONNX, TensorRT, or Triton
Publications, open-source contributions, or serious personal research projects

What We Care About

Depth over buzzwords
Clean experiments and reproducibility
Strong benchmarking discipline
Latency, memory, and throughput awareness
Research that translates into shipped systems

We value people who ask:

“How does this behave at scale?”

Not just: “Does this work on the dataset?”

Why Smallest.ai

Work on real-world speech systems at scale
Direct ownership from research to production
Close collaboration with founders and infrastructure teams
Fast iteration cycles with minimal bureaucracy
Competitive compensation and meaningful ESOPs
One of the deepest speech research stacks in India

How to Apply

It would be nice if you can also share:

Resume
Research papers, GitHub repositories, or technical writing
Examples of models you trained or systems you built
A short note on what aspect of LLM or memory research excites you most

Email: [email protected]

Speech Research Scientist — Bangalore

Team: Core Speech Research
Location: Bangalore, India
Type: Full-time
Experience: No fixed bar — skill and depth matter more than years

About Smallest.ai

Our goal is simple: Smaller models. Lower latency. Higher intelligence.

Role Overview

As a Speech Research Scientist, you will work on the core speech stack at Smallest.ai.

You will research, train, evaluate, and productionize models across:

Speech to Text (ASR)
Text to Speech (TTS)
Speech to Speech (S2S)

This is not an offline research role.
You will work at the intersection of research, engineering, and real-world deployment.

Core Research Areas

A. Automatic Speech Recognition (ASR)

Streaming and non-streaming ASR
Multilingual and code-mixed speech
Low-latency decoding and inference
Long-context speech modeling
Robustness to accents, noise, and telephony audio

B. Text to Speech (TTS)

Neural TTS and generative speech models
Controllable speech generation including emotion, style, pitch, rate, and prosody
Speaker adaptation and voice cloning
Stability, expressiveness, and naturalness optimization

C. Speech to Speech (S2S)

End-to-end speech-to-speech models
Streaming voice-to-voice architectures
Codec-based or token-based speech representations
Low-latency conversational speech generation

D. Multilingual and Speaker Intelligence

Multilingual speaker understanding
Cross-lingual speaker embeddings
Speaker identification and verification
Accent and dialect robustness
Low-resource language modeling

E. Multi-Speaker Modeling

Multi-speaker diarization
Overlapping speech detection and separation
Speaker-aware ASR pipelines
Joint diarization and recognition modeling

F. Duplex Conversational Models

Full-duplex speech models
- Simultaneous listening and speaking
- Interruption handling and barge-in detection
Half-duplex conversational models
- Turn detection
- Latency-aware response generation

What You Will Build

Novel model architectures and training strategies
Large-scale multilingual datasets and pipelines
Evaluation frameworks for WER, DER, MOS, latency, and RTF
Streaming inference systems for real-time speech
Research prototypes converted into production models
Your work will directly power live customer-facing systems.

Required Skills

Strong background in speech processing or deep learning
Deep expertise in at least one of the following:
- ASR
- TTS
- Speech-to-speech systems
Strong understanding of modern architectures:
- Transformers, Conformers, diffusion or flow-based models
Experience with CTC, Transducer, attention-based decoding
Strong proficiency in PyTorch
Experience training models at scale

Strong Plus

Multilingual speech experience (Indic or European languages)
Speaker embeddings and diarization systems
Parameter-efficient fine-tuning methods such as LoRA
Streaming inference optimization
Deployment experience using ONNX, TensorRT, or Triton
Publications, open-source contributions, or serious personal research projects

What We Care About

Depth over buzzwords
Clean experiments and reproducibility
Strong benchmarking discipline
Latency, memory, and throughput awareness
Research that translates into shipped systems

We value people who ask:

“How does this behave at scale?”

Not just: “Does this work on the dataset?”

Why Smallest.ai

Work on real-world speech systems at scale
Direct ownership from research to production
Close collaboration with founders and infrastructure teams
Fast iteration cycles with minimal bureaucracy
Competitive compensation and meaningful ESOPs
One of the deepest speech research stacks in India

How to Apply

It would be nice if you can also share:

Resume
Research papers, GitHub repositories, or technical writing
Examples of models you trained or systems you built
A short note on what aspect of LLM or memory research excites you most

Email: [email protected]

Speech Research Scientist | Bangalore

Summary

Location

Type

Experience

Company links

About this role

Speech Research Scientist — Bangalore

About Smallest.ai

Role Overview

Core Research Areas

A. Automatic Speech Recognition (ASR)

B. Text to Speech (TTS)

C. Speech to Speech (S2S)

D. Multilingual and Speaker Intelligence

E. Multi-Speaker Modeling

F. Duplex Conversational Models

What You Will Build

Required Skills

Strong Plus

What We Care About

Why Smallest.ai

How to Apply

What you'll do

Ready to join Smallest?

Frequently Asked Questions

What does a Speech Research Scientist | Bangalore do at Smallest?

Is the Speech Research Scientist | Bangalore position at Smallest remote?

How do I apply for the Speech Research Scientist | Bangalore position at Smallest?

Speech Research Scientist | Bangalore

Summary

Location

Type

Experience

Company links

About this role

Speech Research Scientist — Bangalore

About Smallest.ai

Role Overview

Core Research Areas

A. Automatic Speech Recognition (ASR)

B. Text to Speech (TTS)

C. Speech to Speech (S2S)

D. Multilingual and Speaker Intelligence

E. Multi-Speaker Modeling

F. Duplex Conversational Models

What You Will Build

Required Skills

Strong Plus

What We Care About

Why Smallest.ai

How to Apply

What you'll do

Ready to join Smallest?

Frequently Asked Questions

What does a Speech Research Scientist | Bangalore do at Smallest?

Is the Speech Research Scientist | Bangalore position at Smallest remote?

How do I apply for the Speech Research Scientist | Bangalore position at Smallest?

Join Clera's Talent Pool

Join Clera's Talent Pool