Job Title: Robotics Data Pipeline Engineer – Multimodal Data
Department: Software
Reports To: Teleoperations Lead
Employment Type: Full-Time
Location: Houston, TX or Pensacola Fl
Who We Are
Persona AI is building humanoid robots for the most demanding environments in heavy industry — shipyards, steel mills, fabrication facilities, and offshore platforms — performing welding, grinding, maintenance, inspection, and material-handling work that is dangerous, physically demanding, and increasingly difficult to staff.
We are backed by leading strategic and financial investors and engaged with global industrial leaders across Korea, Japan, the United States, and Singapore. Korea is the center of gravity for our early commercial strategy, anchored by relationships with the world’s leading shipbuilders and steelmakers. Our work spans both the robot platform itself and the systems, partners, and playbooks required to deploy it at scale.
Why Join Persona AI?
We offer competitive compensation, a performance-based bonus, 99% employer covered medical benefits, early-stage equity, competitive PTO, and a company-wide paid winter break between December 24th and January 2nd.
You’ll shape technology that’s redefining the possibilities of robotics and human interaction.
Work alongside passionate teammates who value creativity, and continuous learning.
Enjoy full access to advanced tools,
About the Role
As a Data Pipeline Engineer, you will architect and scale the data infrastructure that feeds our foundation models. Your primary mission is to extract, augment, and align human dexterous manipulation data from massive complex, multi-sensor and egocentric video datasets. Crucially, you will build advanced post-processing algorithms to perform deep force analysis and infer hidden states from raw data—such as processing direct force-torque outputs to quantify grasp dynamics, estimating contact forces from visual cues, extrapolating heavily occluded hand positions, or deriving 3D geometry from 2D frames. You will use spatial, temporal, and cross-modal data augmentation to multiply the value of every minute of data our teleoperation team collects.
What You Will Be Doing
Multimodal Data Pipelines: Architect highly efficient, scalable pipelines to ingest, decode, and synchronously process thousands of hours of high-resolution egocentric video alongside rich sensor streams (IMUs, force-torque sensors, tactile pads, and joint proprioception).
Force Analysis & Hidden State Inference: Develop sophisticated post-processing algorithms to analyze force interactions and infer unobservable or missing states from raw data. This includes calibrating and cleaning direct force-aware data collections, estimating contact forces from object deformation, tracking occluded objects during complex manipulation, or applying inverse kinematics to fill in missing joint trajectories.
Kinematic Retargeting & Alignment: Develop algorithms to translate 3D human hand tracking, wrist motion, and pose estimation into the specific 6DoF/joint-space coordinates of our humanoid’s end-effectors, relying on sensor fusion to ensure absolute precision.
Advanced Data Augmentation: Implement robust data augmentation strategies (spatial transformations, temporal scaling, synthetic viewpoints, and sensor noise injection) to expand expert trajectories and improve the robustness of our learning models.
Teleoperation Synchronization: Work closely with the Hardware Teleoperation Team (UMI & Console operators) to perfectly align human-robot play-data (haptics, force profiles, video, audio, telemetry) with large-scale pre-training datasets.
What We Are Looking For
Education: B.S., M.S., or Ph.D. in Computer Science, Data Engineering, Machine Learning, Robotics, or a related field.
Programming & ML Frameworks: Deep expertise in Python and extensive experience with PyTorch, specifically in handling custom dataloaders for multimodal datasets.
Force & Time-Series Data Processing: Experience analyzing and processing complex time-series data from force-torque (F/T) sensors, load cells, or tactile arrays, ensuring pristine alignment with visual frames.
Video Processing Expertise: Mastery of video processing pipelines and libraries (OpenCV, FFmpeg, Decord) and managing the I/O bottlenecks of terabyte-scale video datasets.
Computer Vision / Pose Estimation: Hands-on experience with 3D hand tracking, human pose estimation (e.g., MediaPipe), and spatial geometry calculations.
Embodied AI Familiarity: Strong understanding of modern imitation learning paradigms, VLA architectures, and frameworks focused on human-to-robot transfer (e.g., EgoScale, EgoMimic, or OpenVLA).
Data Augmentation: Proven ability to implement programmatic and generative data augmentation techniques for computer vision and time-series data.
Bonus Skills
Experience with NVIDIA’s robotic software stack (Isaac, Cosmos, or components of the GR00T framework).
Familiarity with distributed data processing systems (Ray, Apache Spark) for cluster computing.
Background in generating or utilizing synthetic robotic data via simulation (Omniverse, MuJoCo).
Experience integrating spatial awareness or tactile data representations (e.g., Fourier encoding) into visual pipelines.
Persona AI is an Equal Opportunity Employer.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, age, disability, veteran status, or any other characteristic protected by applicable federal, state, or local law.