LLM and Memory Researcher — Bangalore

Team: Core AI Research
Location: Bangalore, India
Type: Full-time
Experience: No fixed bar — depth and ownership matter more than years

About Smallest.ai

Smallest.ai builds real-time intelligence systems that operate under strict latency, cost, and reliability constraints.

We work on small, fast, controllable language models designed to run in production — not just in demos.

Our focus areas include:

Small Language Models (SLMs)
Long- and short-term memory systems
Streaming inference
Agent architectures that reason, adapt, and improve over time

We optimize for: Smaller models. Faster tokens. Real memory.

Role Overview

As an LLM and Memory Researcher, you will design and train models that can:

Think under latency constraints
Use memory effectively across time
Adapt from interaction history
Operate in streaming environments
Power real-world agents and workflows

You will work across model architecture, training, memory systems, and deployment.

This role sits at the intersection of research, systems, and product intelligence.

Core Research Areas

A. Language Model Architecture

Small language model design (1B–8B class)
Dense and Mixture-of-Experts variants
Fast decoding architectures
KV-cache optimization and compression
Long-context and sliding-window attention

B. Memory Systems

Short-term working memory
Long-term persistent memory
Retrieval-augmented memory (RAG)
Structured memory representations
Episodic and semantic memory modeling

C. Training and Adaptation

Pretraining and continual training strategies
Instruction tuning and alignment
Preference learning and RLHF-style methods
Online adaptation and feedback loops
Parameter-efficient fine-tuning (LoRA, adapters, partial freeze)

D. Reasoning and Planning

Multi-step reasoning under latency budgets
Tool use and function calling
Agent memory orchestration
Fast-think vs slow-think model architectures
Self-reflection and corrective reasoning

E. Streaming Inference

Token-level streaming input and output
Interruptible generation
Partial context updates
Low-latency response formation

What You Will Build

Novel memory architectures for LLMs
Training pipelines for small and efficient language models
Memory-aware inference engines
Evaluation frameworks for reasoning, memory retention, and hallucination
Research prototypes deployed into real production agents
Your work will directly affect live systems running at scale.

Required Skills

Strong foundation in machine learning and deep learning
Deep experience with large or small language models
Strong understanding of:
- Transformer architectures
- Attention mechanisms
- Positional encoding and context modeling
Proficiency with PyTorch
Experience training or fine-tuning LLMs end-to-end

Strong Plus

Experience with long-context modeling
Memory or retrieval systems beyond vanilla RAG
Reinforcement learning or RLHF pipelines
Agent frameworks or orchestration layers
Experience with model quantization and inference optimization
Publications, open-source work, or deep independent research

What We Care About

First-principles thinking
Clear experimental design
Measurable gains, not vague improvements
Understanding trade-offs between quality, latency, and cost
Research that survives production constraints

We value people who ask:

“What happens after 10 million conversations?”

Not just: “What score does this get on a benchmark?”

Why Smallest.ai

Work on real deployed LLM systems
Build memory systems few companies attempt
Direct ownership from research to production
High autonomy and fast execution culture
Competitive compensation and meaningful ESOPs
Deep focus on small, fast, and efficient AI

How to Apply

It would be nice if you can also share:

Resume
Research papers, GitHub repositories, or technical writing
Examples of models you trained or systems you built
A short note on what aspect of LLM or memory research excites you most

Email: [email protected]

Smallest

LLM and Memory Researcher | Bangalore

full-time•Bengaluru

Summary

Location

Bengaluru

Type

full-time

Experience

0-2 years

Company links

Website

About this role

LLM and Memory Researcher — Bangalore

Team: Core AI Research
Location: Bangalore, India
Type: Full-time
Experience: No fixed bar — depth and ownership matter more than years

About Smallest.ai

Smallest.ai builds real-time intelligence systems that operate under strict latency, cost, and reliability constraints.

We work on small, fast, controllable language models designed to run in production — not just in demos.

Our focus areas include:

Small Language Models (SLMs)
Long- and short-term memory systems
Streaming inference
Agent architectures that reason, adapt, and improve over time

We optimize for: Smaller models. Faster tokens. Real memory.

Role Overview

As an LLM and Memory Researcher, you will design and train models that can:

Think under latency constraints
Use memory effectively across time
Adapt from interaction history
Operate in streaming environments
Power real-world agents and workflows

You will work across model architecture, training, memory systems, and deployment.

This role sits at the intersection of research, systems, and product intelligence.

Core Research Areas

A. Language Model Architecture

Small language model design (1B–8B class)
Dense and Mixture-of-Experts variants
Fast decoding architectures
KV-cache optimization and compression
Long-context and sliding-window attention

B. Memory Systems

Short-term working memory
Long-term persistent memory
Retrieval-augmented memory (RAG)
Structured memory representations
Episodic and semantic memory modeling

C. Training and Adaptation

Pretraining and continual training strategies
Instruction tuning and alignment
Preference learning and RLHF-style methods
Online adaptation and feedback loops
Parameter-efficient fine-tuning (LoRA, adapters, partial freeze)

D. Reasoning and Planning

Multi-step reasoning under latency budgets
Tool use and function calling
Agent memory orchestration
Fast-think vs slow-think model architectures
Self-reflection and corrective reasoning

E. Streaming Inference

Token-level streaming input and output
Interruptible generation
Partial context updates
Low-latency response formation

What You Will Build

Novel memory architectures for LLMs
Training pipelines for small and efficient language models
Memory-aware inference engines
Evaluation frameworks for reasoning, memory retention, and hallucination
Research prototypes deployed into real production agents
Your work will directly affect live systems running at scale.

Required Skills

Strong foundation in machine learning and deep learning
Deep experience with large or small language models
Strong understanding of:
- Transformer architectures
- Attention mechanisms
- Positional encoding and context modeling
Proficiency with PyTorch
Experience training or fine-tuning LLMs end-to-end

Strong Plus

Experience with long-context modeling
Memory or retrieval systems beyond vanilla RAG
Reinforcement learning or RLHF pipelines
Agent frameworks or orchestration layers
Experience with model quantization and inference optimization
Publications, open-source work, or deep independent research

What We Care About

First-principles thinking
Clear experimental design
Measurable gains, not vague improvements
Understanding trade-offs between quality, latency, and cost
Research that survives production constraints

We value people who ask:

“What happens after 10 million conversations?”

Not just: “What score does this get on a benchmark?”

Why Smallest.ai

Work on real deployed LLM systems
Build memory systems few companies attempt
Direct ownership from research to production
High autonomy and fast execution culture
Competitive compensation and meaningful ESOPs
Deep focus on small, fast, and efficient AI

How to Apply

It would be nice if you can also share:

Resume
Research papers, GitHub repositories, or technical writing
Examples of models you trained or systems you built
A short note on what aspect of LLM or memory research excites you most

Email: [email protected]

What you'll do

As an LLM and Memory Researcher, you will design and train models that can think under latency constraints and use memory effectively across time. You will work across model architecture, training, memory systems, and deployment.

Ready to join Smallest?

Take the next step in your career journey

Frequently Asked Questions

What does a LLM and Memory Researcher | Bangalore do at Smallest?

As a LLM and Memory Researcher | Bangalore at Smallest, you will: as an LLM and Memory Researcher, you will design and train models that can think under latency constraints and use memory effectively across time. You will work across model architecture, training, memory systems, and deployment..

Is the LLM and Memory Researcher | Bangalore position at Smallest remote?

The LLM and Memory Researcher | Bangalore position at Smallest is based in Bengaluru, India. Contact the company through Clera for specific work arrangement details.

How do I apply for the LLM and Memory Researcher | Bangalore position at Smallest?

You can apply for the LLM and Memory Researcher | Bangalore position at Smallest directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.

LLM and Memory Researcher | Bangalore

Summary

Location

Type

Experience

Company links

About this role

LLM and Memory Researcher — Bangalore

About Smallest.ai

Role Overview

Core Research Areas

A. Language Model Architecture

B. Memory Systems

C. Training and Adaptation

D. Reasoning and Planning

E. Streaming Inference

What You Will Build

Required Skills

Strong Plus

What We Care About

Why Smallest.ai

How to Apply

What you'll do

Ready to join Smallest?

Frequently Asked Questions

What does a LLM and Memory Researcher | Bangalore do at Smallest?

Is the LLM and Memory Researcher | Bangalore position at Smallest remote?

How do I apply for the LLM and Memory Researcher | Bangalore position at Smallest?

LLM and Memory Researcher | Bangalore

Summary

Location

Type

Experience

Company links

About this role

LLM and Memory Researcher — Bangalore

About Smallest.ai

Role Overview

Core Research Areas

A. Language Model Architecture

B. Memory Systems

C. Training and Adaptation

D. Reasoning and Planning

E. Streaming Inference

What You Will Build

Required Skills

Strong Plus

What We Care About

Why Smallest.ai

How to Apply

What you'll do

Ready to join Smallest?

Frequently Asked Questions

What does a LLM and Memory Researcher | Bangalore do at Smallest?

Is the LLM and Memory Researcher | Bangalore position at Smallest remote?

How do I apply for the LLM and Memory Researcher | Bangalore position at Smallest?

Join Clera's Talent Pool

Join Clera's Talent Pool