We just announced our $3M Pre-Seed. Watch our — launch video.
We are looking for a Solutions Engineer with 3–7 years of experience to own end-to-end data curation and delivery for Protege's fast-growing media vertical. You'll be a technically sharp, customer-comfortable engineer who thrives in ambiguous environments, working directly with unstructured multimodal data (audio, video, speech) to get high-quality datasets into the hands of the world's leading AI labs. This is a forward-deployed, high-impact role at a Series A company already partnered with most of the Mag 7.
Curate and deliver media datasets (audio, video, speech) end-to-end — from ingesting raw partner data to QA-ing and shipping final packages to customers
Translate customer AI data requirements into concrete curation strategies, working hands-on with messy, unstructured, real-world data
Build scripts, lightweight tooling, and repeatable QA workflows to improve delivery speed and consistency
Serve as the internal catalog expert — tracking content coverage, metadata quality, and gaps relative to customer demand
Collaborate cross-functionally with Sales, Product, and Engineering to inform platform roadmap and reduce bespoke delivery work over time
Strong SQL proficiency — must be comfortable querying large, messy, real-world datasets
Hands-on experience with data pipelines, ETL/ELT, or data engineering (e.g., Snowflake, BigQuery, Databricks, Airflow, dbt)
Comfortable in customer-facing or cross-functional environments — able to communicate technical work clearly to external stakeholders
Experience working at an early-stage startup (Series A–C); no pure big-tech-only backgrounds
Comfort operating with unstructured, imperfect, evolving data — this role requires thriving in ambiguity, not waiting for clean inputs
Protege addresses data access challenges in AI, focusing on governance, intellectual property, and security implications to streamline training-data procurement. By reducing negotiation time and effort, it positions itself as a facilitator in the AI data landscape.
Know someone who'd be great for this?