Clera - Your AI talent agent
LoginStart
Start
Smallest logo
Smallest

LLM and Memory Researcher | Bangalore

full-time•Bengaluru

Summary

Location

Bengaluru

Type

full-time

Experience

0-2 years

Company links

Website

About this role

LLM and Memory Researcher — Bangalore

Team: Core AI Research
Location: Bangalore, India
Type: Full-time
Experience: No fixed bar — depth and ownership matter more than years

About Smallest.ai

Smallest.ai builds real-time intelligence systems that operate under strict latency, cost, and reliability constraints.

We work on small, fast, controllable language models designed to run in production — not just in demos.

Our focus areas include:

  • Small Language Models (SLMs)

  • Long- and short-term memory systems

  • Streaming inference

  • Agent architectures that reason, adapt, and improve over time

We optimize for: Smaller models. Faster tokens. Real memory.

Role Overview

As an LLM and Memory Researcher, you will design and train models that can:

  • Think under latency constraints

  • Use memory effectively across time

  • Adapt from interaction history

  • Operate in streaming environments

  • Power real-world agents and workflows

You will work across model architecture, training, memory systems, and deployment.

This role sits at the intersection of research, systems, and product intelligence.

Core Research Areas

A. Language Model Architecture

  • Small language model design (1B–8B class)

  • Dense and Mixture-of-Experts variants

  • Fast decoding architectures

  • KV-cache optimization and compression

  • Long-context and sliding-window attention

B. Memory Systems

  • Short-term working memory

  • Long-term persistent memory

  • Retrieval-augmented memory (RAG)

  • Structured memory representations

  • Episodic and semantic memory modeling

C. Training and Adaptation

  • Pretraining and continual training strategies

  • Instruction tuning and alignment

  • Preference learning and RLHF-style methods

  • Online adaptation and feedback loops

  • Parameter-efficient fine-tuning (LoRA, adapters, partial freeze)

D. Reasoning and Planning

  • Multi-step reasoning under latency budgets

  • Tool use and function calling

  • Agent memory orchestration

  • Fast-think vs slow-think model architectures

  • Self-reflection and corrective reasoning

E. Streaming Inference

  • Token-level streaming input and output

  • Interruptible generation

  • Partial context updates

  • Low-latency response formation

What You Will Build

  • Novel memory architectures for LLMs

  • Training pipelines for small and efficient language models

  • Memory-aware inference engines

  • Evaluation frameworks for reasoning, memory retention, and hallucination

  • Research prototypes deployed into real production agents

  • Your work will directly affect live systems running at scale.

Required Skills

  • Strong foundation in machine learning and deep learning

  • Deep experience with large or small language models

  • Strong understanding of:

    • Transformer architectures

    • Attention mechanisms

    • Positional encoding and context modeling

  • Proficiency with PyTorch

  • Experience training or fine-tuning LLMs end-to-end

Strong Plus

  • Experience with long-context modeling

  • Memory or retrieval systems beyond vanilla RAG

  • Reinforcement learning or RLHF pipelines

  • Agent frameworks or orchestration layers

  • Experience with model quantization and inference optimization

  • Publications, open-source work, or deep independent research

What We Care About

  • First-principles thinking

  • Clear experimental design

  • Measurable gains, not vague improvements

  • Understanding trade-offs between quality, latency, and cost

  • Research that survives production constraints

We value people who ask:

“What happens after 10 million conversations?”

Not just: “What score does this get on a benchmark?”

Why Smallest.ai

  • Work on real deployed LLM systems

  • Build memory systems few companies attempt

  • Direct ownership from research to production

  • High autonomy and fast execution culture

  • Competitive compensation and meaningful ESOPs

  • Deep focus on small, fast, and efficient AI

How to Apply

It would be nice if you can also share:

  • Resume

  • Research papers, GitHub repositories, or technical writing

  • Examples of models you trained or systems you built

  • A short note on what aspect of LLM or memory research excites you most

Email: [email protected]

What you'll do

  • As an LLM and Memory Researcher, you will design and train models that can think under latency constraints and use memory effectively across time. You will work across model architecture, training, memory systems, and deployment.

Ready to join Smallest?

Take the next step in your career journey

Frequently Asked Questions

What does a LLM and Memory Researcher | Bangalore do at Smallest?

Toggle
As a LLM and Memory Researcher | Bangalore at Smallest, you will: as an LLM and Memory Researcher, you will design and train models that can think under latency constraints and use memory effectively across time. You will work across model architecture, training, memory systems, and deployment..

Is the LLM and Memory Researcher | Bangalore position at Smallest remote?

Toggle
The LLM and Memory Researcher | Bangalore position at Smallest is based in Bengaluru, India. Contact the company through Clera for specific work arrangement details.

How do I apply for the LLM and Memory Researcher | Bangalore position at Smallest?

Toggle
You can apply for the LLM and Memory Researcher | Bangalore position at Smallest directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.
Clera - Your AI talent agent
© 2026 Clera Labs, Inc.TermsPrivacyHelp

Join Clera's Talent Pool

Get matched with similar opportunities at top startups

This role is hosted on Smallest's careers site.
Join our talent pool first to get notified about similar roles that match your profile.