
Team: Core AI Research
Location: Bangalore, India
Type: Full-time
Experience: No fixed bar — depth and ownership matter more than years
Smallest.ai builds real-time intelligence systems that operate under strict latency, cost, and reliability constraints.
We work on small, fast, controllable language models designed to run in production — not just in demos.
Our focus areas include:
Small Language Models (SLMs)
Long- and short-term memory systems
Streaming inference
Agent architectures that reason, adapt, and improve over time
We optimize for: Smaller models. Faster tokens. Real memory.
As an LLM and Memory Researcher, you will design and train models that can:
Think under latency constraints
Use memory effectively across time
Adapt from interaction history
Operate in streaming environments
Power real-world agents and workflows
You will work across model architecture, training, memory systems, and deployment.
This role sits at the intersection of research, systems, and product intelligence.
Small language model design (1B–8B class)
Dense and Mixture-of-Experts variants
Fast decoding architectures
KV-cache optimization and compression
Long-context and sliding-window attention
Short-term working memory
Long-term persistent memory
Retrieval-augmented memory (RAG)
Structured memory representations
Episodic and semantic memory modeling
Pretraining and continual training strategies
Instruction tuning and alignment
Preference learning and RLHF-style methods
Online adaptation and feedback loops
Parameter-efficient fine-tuning (LoRA, adapters, partial freeze)
Multi-step reasoning under latency budgets
Tool use and function calling
Agent memory orchestration
Fast-think vs slow-think model architectures
Self-reflection and corrective reasoning
Token-level streaming input and output
Interruptible generation
Partial context updates
Low-latency response formation
Novel memory architectures for LLMs
Training pipelines for small and efficient language models
Memory-aware inference engines
Evaluation frameworks for reasoning, memory retention, and hallucination
Research prototypes deployed into real production agents
Your work will directly affect live systems running at scale.
Strong foundation in machine learning and deep learning
Deep experience with large or small language models
Strong understanding of:
Transformer architectures
Attention mechanisms
Positional encoding and context modeling
Proficiency with PyTorch
Experience training or fine-tuning LLMs end-to-end
Experience with long-context modeling
Memory or retrieval systems beyond vanilla RAG
Reinforcement learning or RLHF pipelines
Agent frameworks or orchestration layers
Experience with model quantization and inference optimization
Publications, open-source work, or deep independent research
First-principles thinking
Clear experimental design
Measurable gains, not vague improvements
Understanding trade-offs between quality, latency, and cost
Research that survives production constraints
We value people who ask:
“What happens after 10 million conversations?”
Not just: “What score does this get on a benchmark?”
Work on real deployed LLM systems
Build memory systems few companies attempt
Direct ownership from research to production
High autonomy and fast execution culture
Competitive compensation and meaningful ESOPs
Deep focus on small, fast, and efficient AI
It would be nice if you can also share:
Resume
Research papers, GitHub repositories, or technical writing
Examples of models you trained or systems you built
A short note on what aspect of LLM or memory research excites you most
Email: [email protected]
Take the next step in your career journey
Get matched with similar opportunities at top startups
This role is hosted on Smallest's careers site.
Join our talent pool first to get notified about similar roles that match your profile.