The Mission At OMNY Health, we are bridging the gap between clinical complexity and life-saving research. We are looking for a Data Engineer II who is passionate about the intersection of healthcare and technology. Your …
Sign up with Clera and we'll reach out the moment a role actually fits you — no more spraying applications into the void.
Full-time
Posted 4d ago
~40 hrs/week
Responsibilities
Architect and scale data pipelines to transform raw clinical data into de-identified, research-ready products. Manage hybrid data landscapes using BigQuery and Snowflake while ensuring HIPAA compliance and data integrity.
Requirements
Requires 3-5+ years of data engineering experience with expertise in Python, SQL, GCP, and dbt. Familiarity with healthcare data standards like ICD-10 or FHIR and experience with containerized orchestration via Argo is highly desired.
Full job description
The Mission
At OMNY Health, we are bridging the gap between clinical complexity and life-saving research. We are looking for a Data Engineer II who is passionate about the intersection of healthcare and technology. Your primary mission will be to architect and scale the pipelines that transform raw, messy clinical data into a high-fidelity, de-identified, research-ready data product. You will be a key player in managing our hybrid data landscape, extracting insights from our data sources, and curating them within our BigQuery and Snowflake environments. This role is about ensuring privacy at scale while maintaining the scientific utility of the data powering the next generation of medical breakthroughs.
Requirements
Key Responsibilities
Pipeline Development: Design, build, and maintain robust ETL/ELT pipelines to ingest structured and unstructured healthcare data into our BigQuery and Snowflake warehouses.
Modern Transformations: Lead the development of modular, high-performance transformations using stored procedures, and dbt (data build tool) to map raw clinical data to standardized research schemas in our Common Data Model (CDM).
Cloud-Native Orchestration: Deploy and manage complex workflows using Argo, ensuring high availability and fault tolerance within our GCP ecosystem.
Automated Data Quality: Implement "trust-but-verify" frameworks using SODA to monitor clinical data integrity, ensuring every record in our research product is validated and compliant.
De-identification & Privacy: Implement and automate sophisticated de-identification protocols (Safe Harbor or Expert Determination methods) to ensure HIPAA compliance while preserving data longitudinality.
Data Modeling: Architect scalable data models (Common Data Model) that allow researchers to query complex patient journeys with ease.
Infrastructure: Collaborate with DevOps to manage cloud-native data infrastructure, ensuring high availability and rigorous security controls.
Technical Qualifications
Experience: 3-5+ years in Data Engineering, with a focus on building production-grade healthcare pipelines.
GCP & Storage: Hands-on experience with Google Cloud Platform, specifically CloudSQL and BigQuery.
Warehousing: Deep expertise in BigQuery and Snowflake architectures, including performance tuning and secure data sharing.
Code & Orchestration: Expert-level Python and SQL.
Proven experience with Argo Workflows/Events for containerized orchestration.
Mastery of dbt for maintaining the transformation layer.
Quality Assurance: Experience using SODA (or Great Expectations) to define and enforce data contracts.
Healthcare Domain: Familiarity with healthcare-specific data challenges (ICD-10, FHIR, or provider-specific MS SQL schemas) is a significant plus.
Security Mindset: Understanding of HIPAA regulations and encryption standards.
Core Competencies
The "Curator" Mindset: You don't just move data; you care about its meaning. You understand that a "null" in a lab result is a clinical signal, not just a missing string.
Adaptability: You thrive in the "zero-to-one" phase where documentation might be thin, but the impact is massive.
Collaborative Spirit: You can speak "Data" to engineers and "Insight" to clinical researchers.
Impact: Your work directly enables researchers to find cures and improve patient outcomes.
Innovation: We are tackling the hardest problem in health tech: making data usable without sacrificing privacy.
Growth: As an early hire, you will have a front-row seat (and a steering wheel) in building our engineering culture.
Note on De-identification: > Because this role focuses on a research-ready product, the candidate should be familiar with the trade-offs between data utility and privacy—specifically how to handle dates, zip codes, and unique identifiers in a way that satisfies both statisticians and compliance officers.
Related keywords
GCPBigQuerySnowflakedbtArgoSODAPythonSQLCloudSQLHIPAAICD-10FHIRETLELTCommon Data ModelDe-identification
OMNY Health™ connects patients, providers, and life sciences companies through data and insights to transform healthcare
Industry
Biotechnology Research
Company size
11-50 employees
Founded
2017
Headquarters
Atlanta, Georgia
LinkedIn followers
3,801
OMNY Health™ is a national data ecosystem connecting the world of healthcare to fuel partnerships that improve clinical outcomes and drive patient care. OMNY’s dynamic partnerships with specialty health networks, healthcare systems, academic medical centers, and integrated delivery networks span all fifty states and cover over 100M million patient lives. OMNY Health’s data ecosystem now reflects more than 8+ years of historical data encompassing more than 6.5 billion clinical notes from 770,000+ providers across 200+ specialties - and is growing. To learn more about how OMNY Health transforms lives and drives patient care by connecting providers and life sciences companies through data, visit www.omnyhealth.com or contact [email protected].
Offices: 75 5th St NW, Suite 2090, Atlanta, Georgia 30308, US
OMNY Health™ connects patients, providers, and life sciences companies through data and insights to transform healthcare
Industry
Biotechnology Research
Company size
11-50 employees
Founded
2017
Headquarters
Atlanta, Georgia
LinkedIn followers
3,801
OMNY Health™ is a national data ecosystem connecting the world of healthcare to fuel partnerships that improve clinical outcomes and drive patient care. OMNY’s dynamic partnerships with specialty health networks, healthcare systems, academic medical centers, and integrated delivery networks span all fifty states and cover over 100M million patient lives. OMNY Health’s data ecosystem now reflects more than 8+ years of historical data encompassing more than 6.5 billion clinical notes from 770,000+ providers across 200+ specialties - and is growing. To learn more about how OMNY Health transforms lives and drives patient care by connecting providers and life sciences companies through data, visit www.omnyhealth.com or contact [email protected].
Offices: 75 5th St NW, Suite 2090, Atlanta, Georgia 30308, US