We just announced our $3M Pre-Seed. Watch our — launch video.
As an ML Engineer at Basis, you’ll own end-to-end projects that bring intelligence into production. You’ll act as the responsible party for systems that help our agents reason, plan, and evaluate themselves — meaning you’ll scope, build, and deliver from first principles. You’ll have full autonomy: plan your projects, define success, run experiments, and decide when your system is ready to ship.
You’ll move fast, instrument deeply, and design for clarity — building the scaffolding that lets models act safely and improve continuously. This is a role for engineers who want to operate like researchers and builders at once: reasoning, experimenting, and shipping systems that get smarter over time.
Design and iterate multi-agent architectures that automate real accounting workflows.
Encode autonomy boundaries, tool usage, and fallback behaviors that make agents safe and reliable.
Manage context and memory for coherence across steps; plan and execute agent loops with measurable success criteria.
Route, evaluate, and optimize models under real-world constraints (latency, cost, accuracy).
2. Design evaluation and experimentation frameworks
Build scalable evaluation pipelines (offline + online) that run hundreds of experiments automatically.
Define golden tasks, labeling strategies, and metrics that make performance measurable and comparable.
Instrument the stack to detect regressions, track error taxonomies, and drive closed-loop improvement.
Use data and experiments to drive product and architectural decisions—not just intuition.
Architect prompt stacks and instruction hierarchies that structure model reasoning.
Build retrieval and indexing pipelines that surface relevant context efficiently.
Parse messy documents into structured representations that agents can reason about.
Design guardrails and validation layers to keep behavior safe and deterministic.
Scope your projects with clarity; write concise specs and architecture docs that eliminate ambiguity.
Build, test, and instrument your systems end-to-end.
Communicate progress clearly: what’s built, what’s learned, what’s next.
Collaborate tightly within your pod — teaching, unblocking, and sharing learnings as you go.
Basis develops AI agents for knowledge work in accounting, aiming to transform the field by 2030 with real agents doing real work in the real economy. Led by 25 employees in New York's Flatiron District, its capabilities improve monthly.
Know someone who'd be great for this?