Akshansh . profile photo

Akshansh .

Senior Data Engineer | Scalable Data Infrastructure | Streaming Pipelines | ML-Driven Systems | AWS, Kafka, Data Lake,Python,SQL

United States
500+ connections
Akshansh . on LinkedIn

Updated 7 months ago

11+

Years Experience

7

Roles

56

Skills

2

Education

About

Lead Data Engineer | Scalable ETL Pipelines | Real-Time Streaming | ML-Driven Systems | LLM Ops AWS, Kafka, Python, SQL Results-driven Lead Data and AI Platform with expertise in scalable data architectures, real-time ML pipelines, and processing structured and unstructured data (geospatial, image, video). Proven ability to collaborate cross-functionally, leveraging expertise in Kafka, Spark, and cloud technologies to enhance data-driven decision-making. Eager to engage in cutting-edge projects that foster innovation and excellence in technology. Key Expertise & Achievements Scalable ETL & Data Transformation Built high-throughput ETL pipelines using AWS Glue, PySpark, and Airflow, reducing batch processing times by 40%. Designed metadata-driven ingestion frameworks to handle structured, semi-structured, and unstructured data at scale. Automated data validation, deduplication, and schema evolution, ensuring data quality and accuracy. Real-Time Streaming & Analytics Developed low-latency streaming pipelines using Apache Flink, Kafka Streams, and Kinesis, reducing event processing lag by 50%. Implemented real-time aggregations and transformations using KSQL and Flink SQL, enabling sub-second insights. Optimized Kafka partitioning, compression, and retention strategies, reducing storage costs by 25%. Data Lake Architecture & Governance Built multi-layered data lakes using Delta Lake, Iceberg, and Hudi, ensuring efficient storage and ACID compliance. Implemented data lineage tracking with AWS Glue Catalog and Apache Atlas for better discoverability and governance. Developed fine-grained access control policies using AWS Lake Formation and IAM, ensuring data security and compliance. Machine Learning Data Pipelines Designed feature engineering pipelines for predictive analytics, improving forecasting model accuracy by 20%. Integrated ML workflows with Spark ML and SageMaker, optimizing model training and inference at scale. Developed recommendation systems using graph-based ML, increasing user engagement. Performance Optimization & Cost Reduction Reduced ETL processing time by 30% with distributed computing and optimized SQL transformations. Implemented data partitioning, clustering, and indexing, cutting query execution times by 50%. Reduced compute and storage costs by 20% through efficient resource allocation and query tuning. I am passionate about building future-ready data ecosystems, enabling real-time insights, machine learning, and high-scale analytics to drive business impact.

Experience (7 roles)

Appen · Full-time

Senior Data Engineer

Current

Appen · Full-time

May 2022 - Present · 3 yrs 4 mos·San Francisco Bay Area · Remote

Led the design and implementation of the Unified PubSub Client (PSC) to optimize data pipeline efficiency. The PubSub systems (e.g., Kafka) improved scalability, reliability, and developer velocity, reducing dependencies between client applications and PubSub services. Built a robust and secure paym...

2 yrs

3 roles · Jun 2020 - May 2022

Data Platform Engineer

Jan 2021 - May 2022 · 1 yr 5 mos

Built and scaled real-time data aggregation pipelines using Apache Flink and Apache Kafka, efficiently supporting a 10x growth in data volume while ensuring low-latency and high-throughput data processing. Designed and implemented machine learning forecasting models using Python, enabling accurate p...

Data Engineer

Aug 2020 - Jan 2021 · 6 mos

Designed and implemented a real-time aggregation framework for datasets, including user levels, organizational hierarchies, roles, channels, and memberships, improving data processing efficiency and scalability. Enhanced intermediate data processing by leveraging Kafka tools like KSQL and KStreams, ...

Data Engineering Intern

Jun 2020 - Aug 2020 · 3 mos

Developed a forecasting framework to predict key organizational metrics, such as message volume and active users, for future time periods (months or years). Built models to forecast executive metrics and enable strategic planning for organizational performance. Designed a classification system to ca...

USC Viterbi School of Engineering

Graduate Research Assistant

USC Viterbi School of Engineering

Feb 2019 - Jun 2020 · 1 yr 5 mos·Los Angeles, California, United States

Specialized in spatial-visual indexing of multimedia data, developing efficient geo-spatial data structures for faster and scalable search functionality. Designed and implemented a geo-spatial indexing system using R-trees* to optimize multimedia data storage and retrieval. Extracted visual and spat...

Is this your profile, Akshansh?

Claim it to keep it updated or request removal.

Claim or Remove

Education (2)

University of Southern California

University of Southern California

2019 - 2020

Skills: Databases · Apache Spark Streaming

Maharaja Agrasen Institute Of Technology, Delhi

Maharaja Agrasen Institute Of Technology, Delhi

2012 - 2016

Skills (56)

Backend

C++

Data Engineering

dbt

DevOps

Terraformdocker

MLOps

Airflow

Other

REST APIsComputer VisionAWS SageMakerRelational DatabasesProject ManagementData VisualizationAmazon AuroraData ScienceCustomer Relationship Management (CRM)MLOpsSharePointDockerDjangoApache KafkaPythonAWSSageMakerApache FlinkApache SparkKafkaElasticsearchTerraformDockerJavaAirflowMySQLdbtDelta LakeSnowflakeCloudS3LambdaECSAmazon AthenaHerokuyoloindexingKubernetesPostgreSQLApache AirflowApache SparkAWSKafkaPythonMySQLCloudApache FlinkDelta LakeApache AtlasIAMSageMaker
Certifications (8)

Generative AI Application Development

Databricks

R Programming

Coursera Course Certificates

Programming for Everybody (Python)

Coursera

Introduction to R

DataCamp

Kaggle R Tutorial on Machine Learning

DataCamp

Coursera Mentor Community and Training Course

Coursera Course Certificates

ML Operations

Databricks

Data Analysis and Statistical Inference

DataCamp

Publications (6)

Spatial Aggregation of Visual Features for Big Image Data Search

IEEE BigMM 2019 · Jan 1, 2019

Boomerang: Rebounding the Consequences of Reputation Feedback on Crowdsourcing Platforms

ACM UIST 2016

Classification and Fraud Detection in Finance Industry

IJCA(International Journal of Computer Applications) · Oct 18, 2017

Crowd Guilds: Worker-led Reputation and Feedback on Crowdsourcing Platforms

CSCW · Feb 25, 2017

Investigating the "Wisdom of Crowds" at Scale

ACM UIST · Nov 1, 2015

The Daemo Crowdsourcing Marketplace

CSCW 2017

Languages (2)
hindi(Full professional proficiency)English(Full professional proficiency)
Volunteer Experience (3)

Student Mentor

Current

USC Viterbi School of Engineering

Jun 2019 - Present · 6 yrs 3 mos

Mentor

Coursera

Jun 2016 - Jul 2018 · 2 yrs 2 mos

Volunteer

Child Rights and You

Jun 2015 - Jun 2018 · 3 yrs 1 mo

Frequently Asked Questions

What is Akshansh .'s current role?
Akshansh . is currently working as Senior Data Engineer at Appen · Full-time.
Where did Akshansh . study?
Akshansh . studied Master's degree, Computer Science at University of Southern California. They have 2 education entries on their profile.
What skills does Akshansh . have?
Akshansh .'s top skills include REST APIs, Computer Vision, dbt, Terraform, AWS SageMaker. They have 56 skills listed on their profile.
Where is Akshansh . based?
Akshansh . is based in United States.

Related Jobs

View all jobs →

Other Profiles

Browse all →

Looking for your next role?

Chat with Clera to discover job matches, salary insights, and get a polished AI-generated resume.

Chat with Clera

This profile is based on publicly available information. Akshansh is not affiliated with or endorsed by Clera. Privacy Policy