We are seeking an AI Systems & Data Engineer to join our team. We are building a fast, flexible, and complex platform with a robust, event-driven architecture. This role requires expertise in building data pipelines within the Databricks environment, specifically for ingesting unstructured data, and leveraging that data to build AI agents.
💥 What You’ll Do
- Design and operate Databricks pipelines in Python to ingest and normalize large-scale unstructured data
- Build streaming and batch ingestion using Auto Loader, Delta Live Tables, and Workflows
- Model and maintain AI-ready lakehouse tables with Delta Lake and Unity Catalog
- Prepare retrieval and context datasets for RAG and agent systems
- Orchestrate Temporal-based workflows to coordinate data prep, validation, and AI handoff
- Enforce data quality, lineage, and access controls across pipelines
- Optimize PySpark jobs for performance, reliability, and cost
- Integrate pipeline outputs into production AI systems and APIs
- Monitor freshness, schema drift, and pipeline health
🧰 Tech Stack (So Far)
- Python (primary language for all LLM + orchestration work)
- LangChain + LangGraph + LangSmith
- Databricks + PySpark for processing, labeling, and training context
- Gemini + model routing logic
- Postgres, and custom orchestration via MCP
- GitHub Actions, GCP
You’ll be a crucial member of rolling out products that will have immediate impact.
💻 How We Build
- Engineers come first: your time, focus, and judgment are respected
- Deep work > chaos: fixed cycles & cooldowns protect focus and keep context switching low
- Autonomy is the default: trusted builders who own outcomes, no babysitters
- Ship daily, safely: merge early, integrate vertically, ship often, use feature flags, and keep momentum
- Outcomes over optics: solve real problems, not ticket soup
- Voice matters: from week one, contribute, improve something, and shape how we build
- Senior peers, no ego: collaborate in a high-trust, async-friendly environment
- Bold problems, cool tech: work on complex challenges that actually move the needle
- Fun is part of it: we move fast, but we also celebrate wins and laugh together
✅ What We’re Looking For
- 5-7 years of experience building production-grade ML, data, or AI systems.
- Strong grasp of prompt engineering, context construction, and retrieval design.
- Comfortable working in LangChain and building agents.
- Experience with PySpark and Databricks to handle real-world data scale.
- Ability to write testable, maintainable Python with clear structure.
- Understanding of model evaluation, observability, and feedback loops.
- Excited to push from prototype → production → iteration.
- Familiarity with Databricks Data Intelligence Platform which unifies data warehousing and AI use cases on a single platform.
- Knowledge of Unity Catalog for open and unified governance of data, analytics, and AI on the lakehouse.
- Understanding of data security concerns related to AI and how to mitigate them using the Databricks AI Security Framework (DASF).
- Confident English skills to collaborate clearly and effectively with teammates
🔥 Bonus If You:
- Have built scalable agent-like workflows on the Databricks platform.
- Have worked on semantic chunking, vector search, or hybrid retrieval strategies.
- Can walk us through a real-world prompt failure and how you fixed it.
- Have contributed to OSS tools or internal AI platforms.
- Think of yourself as both an engineer and a systems designer.
- Are familiar with the concept of a data lakehouse architecture.
📍 Location & Compensation
- Must be based in San Francisco, Las Vegas, or Tel Aviv
- Full-time role with competitive comp
- Flexible hours, async-friendly culture, engineering-led environment