Research at Clera
We're not a research lab publishing papers. We're an engineering team solving hard problems in production. These are the questions we're actively working on โ the ones where the answer isn't obvious and getting it right matters for real people.
Calibration & Match Quality
ActiveHow do you know a match is good before anyone interviews? We're building calibration systems that score candidate-role fit across multiple dimensions โ skills, seniority, trajectory, preferences, and company stage. The goal: fewer false positives, more first-round conversions.
Open questions
- What signals predict a successful placement vs. a wasted interview?
- How do you calibrate match confidence when historical data is sparse?
- Can you detect misalignment before either side notices?
Autonomous Agent Workflows
ActiveClera's AI agents handle candidate communication, interview scheduling, status updates, and preparation โ autonomously. We're researching how to give agents more responsibility while keeping humans in the loop where it matters. The hard part isn't automation โ it's knowing when to stop automating.
Open questions
- How do you teach an agent to escalate gracefully?
- What's the right level of autonomy for career-critical conversations?
- How do you evaluate agent quality beyond task completion?
Structured Retrieval for Talent
ActiveWe use structured search (Typesense) over vector search for candidate retrieval โ and we think that's the right call for our domain. We're researching hybrid approaches that combine the precision of structured filters with the flexibility of semantic understanding, without the hallucination risks of pure embedding-based search.
Open questions
- When does semantic search actually outperform structured queries for recruiting?
- How do you index career trajectories, not just current titles?
- What's the retrieval-augmented generation setup that minimizes hallucinated qualifications?
LLM Evaluation & Prompt Engineering
ActiveEvery prompt in production gets traced via Langfuse. We're building evaluation frameworks that go beyond "did the model respond" to "did the response actually help the candidate." This includes automated quality scoring, regression detection, and A/B testing of prompt strategies across different model providers.
Open questions
- How do you measure whether a match explanation is useful vs. just plausible?
- What evaluation metrics predict downstream outcomes (interviews, hires)?
- How do you A/B test prompts when each candidate interaction is unique?
Multi-Signal Profile Understanding
ActiveA candidate is more than their resume. We combine CV data, LinkedIn profiles, conversation history, stated preferences, and behavioral signals to build a comprehensive understanding of what someone is looking for and what they're good at. The research challenge: how to weight conflicting signals and handle incomplete information gracefully.
Open questions
- When a candidate's CV says one thing and their preferences say another, who wins?
- How do you infer career goals from behavior rather than just stated intent?
- What's the minimum information needed for a reliable match?
Feedback Loops & Continuous Learning
ActiveEvery match outcome teaches us something โ but the signal is noisy and delayed. A candidate who doesn't respond might be busy, not uninterested. A rejected candidate might have been perfect for a different role. We're building feedback systems that learn from messy, real-world outcomes without overfitting to noise.
Open questions
- How do you learn from negative signals without creating bias?
- What's the feedback delay problem in recruiting and how do you handle it?
- Can you separate "bad match" from "bad timing"?
The Bigger Questions
Some problems don't have clean technical solutions. We think about these too.
Should AI agents disclose their nature in every interaction, or only when asked?
We lean toward always disclosing. But the UX implications are real.
How do you build trust in AI recommendations when the stakes are someone's career?
Explanations help. But we're still learning what candidates actually want to see.
What does "fairness" mean in AI matching when every role has different requirements?
Equal treatment isn't always equitable. We're thinking through this carefully.
Can you measure recruiter quality the same way you'd measure model quality?
We think so. If AI agents are the new recruiters, they need the same accountability.
How Research Works Here
Start with a real problem
Every research question comes from production. We don't explore hypotheticals โ we solve problems that our users hit today.
Ship the simplest version
Build the dumbest thing that could work, measure it, and iterate. Most of our best systems started as a single prompt and a Langfuse trace.
Measure what matters
Not model accuracy in isolation โ real outcomes. Did the candidate get an interview? Did the company respond? Did the hire stick? That's what counts.
Interested in working on these problems? We're hiring engineers who like hard problems.