Steampunk logo
Senior AI Evaluation Scientist
OTHERMcLean$135k - $170k

Summary

Location

McLean

Salary

$135k - $170k

Type

OTHER

Claim this Company

Are you the employer? Manage your company page directly.

Explore Jobs

About this role

Overview

We are seeking an experienced Senior AI Evaluation Scientist to design and lead rigorous evaluation programs for predictive and generative AI systems across our enterprise and client engagements. This role is critical to ensuring that AI solutions are accurate, reliable, safe, and aligned with mission outcomes. The Senior AI Evaluation Scientist will develop evaluation frameworks, build automated testing pipelines, and act as a subject-matter expert on AI quality, risk, and performance measurement. This role blends deep technical expertise with analytical rigor, experimentation, and cross-functional collaboration. 

Contributions

  • Lead the design and implementation of comprehensive evaluation frameworks for generative and predictive AI models, including accuracy, robustness, relevance, trustworthiness, fairness, hallucination rates, and safety. 

  • Develop and maintain automated evaluation pipelines that continuously audit model outputs, monitor quality drift, and validate alignment with mission-specific constraints. 

  • Create custom benchmark datasets, challenge sets, and adversarial evaluation strategies tailored to client domains and regulatory requirements. 

  • Conduct in-depth error analysis, model behavior studies, and sensitivity assessments to inform iterative improvements in prompts, retrieval systems, models, and orchestration frameworks. 

  • Partner with AI Product Engineers, LLMOps Engineers, and Data Scientists to drive model improvements through structured experimentation, A/B testing, and scientifically grounded evaluation cycles. 

  • Advise teams on measurement methodologies, statistical significance, and best practices for Trustworthy AI evaluation in alignment with NIST AI RMF, MLSecOps, and agency governance requirements. 

  • Document evaluation results, risks, and findings for technical and non-technical audiences, including engineering teams, leadership, and government clients. 

  • Contribute to the development of standardized tools, reusable templates, and evaluation components to improve repeatability and quality across engagements. 

  • Stay informed of advances in LLM assessment, safety science, red-teaming methodologies, and evaluation frameworks emerging from academia and industry. 

  • Mentor junior evaluation staff and help grow Steampunk’s AI measurement and evaluation capabilities. 

  • You will contribute to the growth of our AI & Data Exploitation Practice! 

Qualifications

  • Ability to hold a position of public trust with the U.S. government. 

  • Bachelor’s degree and 8 years of experience

  • 5+ years of experience evaluating machine learning, NLP, or generative AI systems, with strong familiarity with LLMs and retrieval-based architectures. 

  • Deep understanding of evaluation metrics, statistical testing, dataset construction, experimental design, and model validation methodologies. 

  • Hands-on experience with Python and libraries such as PyTorch, Hugging Face, LangChain, scikit-learn, and evaluation tooling (LLM-as-a-judge, rubric-based evaluators, or custom harnesses). 

  • Proficiency in AI evaluation frameworks such as Ragas 

  • Demonstrated experience designing automated evaluation pipelines and integrating them into CI/CD or LLMOps workflows. 

  • Strong understanding of AI governance, responsible AI principles, bias detection, fairness metrics, and risk identification. 

  • Experience working with structured and unstructured datasets across multiple modalities (text, tabular, documents). 

  • Familiarity with vector databases, RAG architectures, and multi-step LLM workflows. 

  • Familiarity with OWASP LLM Top 10 Risks 

  • Excellent analytical, written, and verbal communication skills, with the ability to translate evaluation insights into clear technical recommendations. 

  • Proven ability to collaborate with cross-functional engineering and product teams while independently driving evaluation strategy. 

  • Experience working in agile or iterative development environments and documenting scientific processes clearly. 

About steampunk

Steampunk relies on several factors to determine salary, including but not limited to geographic location, contractual requirements, education, knowledge, skills, competencies, and experience. The projected compensation range for this position is $135,000 to $170,000.  The estimate displayed represents a typical annual salary range for this position. Annual salary is just one aspect of Steampunk’s total compensation package for employees. Learn more about additional Steampunk benefits here. 

 

Identity Statement

As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud.

 

Steampunk is a Change Agent in the Federal contracting industry, bringing new thinking to clients in the Homeland, Federal Civilian, Health and DoD sectors.  Through our Human-Centered delivery methodology, we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges.  As an employee owned company, we focus on investing in our employees to enable them to do the greatest work of their careers – and rewarding them for outstanding contributions to our growth. If you want to learn more about our story, visit http://www.steampunk.com.

 

 

Other facts

Tech stack
AI Evaluation,Machine Learning,NLP,Generative AI,Statistical Testing,Dataset Construction,Experimental Design,Model Validation,Python,PyTorch,Hugging Face,LangChain,Scikit-learn,AI Governance,Bias Detection,Fairness Metrics

About Steampunk

When Steampunk Digital Consulting was established, we committed ourselves to the concept of providing our clients with Unfair Ideas, and we’ve been re-establishing ourselves every day since to deliver the Unfair Ideas that give our clients an unfair advantage. Because we believe that’s what this business requires: a fresh perspective, a new approach, and an interdependent group of people dedicated to taking on whatever new challenge or opportunity the day brings. That’s why we bring a holistic approach to marketing, gathering together the best of all disciplines under one roof. We understand that today’s modern companies need a creative partner that works the way they do: fast, open, and collaboratively. Which, done correctly, is itself a little unfair.

We truly believe in the transformative power of illustration and design and their ability to simplify communications, elevate experiences, engage and inspire people everywhere. Good design and good relationships come from collaboration.

Team size: 1 employee
LinkedIn: Visit
Industry: Advertising Services
Founding Year: 2017

What you'll do

  • The Senior AI Evaluation Scientist will design and implement evaluation frameworks for AI models, ensuring their accuracy and reliability. They will also develop automated evaluation pipelines and collaborate with cross-functional teams to drive model improvements.

Join Clera's Talent Pool

Get matched with similar opportunities at top startups

This role is hosted on Steampunk's careers site.
Join our talent pool first to get notified about similar roles that match your profile.

Frequently Asked Questions

What does Steampunk pay for a Senior AI Evaluation Scientist?

Steampunk offers a competitive compensation package for the Senior AI Evaluation Scientist role. The salary range is USD 135k - 170k per year. Apply through Clera to learn more about the full compensation details.

What does a Senior AI Evaluation Scientist do at Steampunk?

As a Senior AI Evaluation Scientist at Steampunk, you will: the Senior AI Evaluation Scientist will design and implement evaluation frameworks for AI models, ensuring their accuracy and reliability. They will also develop automated evaluation pipelines and collaborate with cross-functional teams to drive model improvements..

Why join Steampunk as a Senior AI Evaluation Scientist?

Steampunk is a leading Advertising Services company. The Senior AI Evaluation Scientist role offers competitive compensation.

Is the Senior AI Evaluation Scientist position at Steampunk remote?

The Senior AI Evaluation Scientist position at Steampunk is based in McLean, Virginia, United States. Contact the company through Clera for specific work arrangement details.

How do I apply for the Senior AI Evaluation Scientist position at Steampunk?

You can apply for the Senior AI Evaluation Scientist position at Steampunk directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process. You can also learn more about Steampunk on their website.