Data Engineer & Analytics Engineer experienced in architecting end-to-end cloud data platforms across Azure, AWS, and Databricks.
Specializing in ETL/ELT pipeline automation, PySpark optimization, and data warehouse / lakehouse architecture using ADF, Snowflake, AWS Glue, dbt, Airflow, and CI/CD.
Turning raw, unstructured data into governed, analytics-ready assets for enterprise BI and data science.
As a Big Data Engineer, I designed and deployed 15+ end-to-end ETL pipelines in Azure Data Factory and Databricks, processing millions of records daily for beverage and e-commerce clients. I optimized Spark cluster resource allocation achieving 35% cost reduction and led the migration of 200+ Hive Metastore tables to Databricks Unity Catalog.
I architected scalable AWS Glue data platforms consolidating data from SAP systems and on-prem sources into Amazon Redshift, improving reporting turnaround by 40%. I also automated the conversion of 500+ Snowflake SQL scripts to Databricks, creating reusable migration frameworks that reduced future migration effort by over 40%. My work focused on automation-first approaches, implementing CI/CD pipelines, and establishing data governance frameworks across multi-cloud environments.
Explore my work across different domains
Containerized data pipeline integrating PeeringDB API with Airflow, PostgreSQL, dbt, and Power BI, revealing 90% network underutilization insights.
Streaming data pipeline processing continuous e-commerce orders through Apache Beam with 1-minute window aggregations for revenue metrics.
Cloud-native ELT pipeline automating extraction and transformation with Airbyte, BigQuery, dbt, and Dagster, achieving 95% reduction in manual handling.
Interactive dashboard analyzing nine years of World Happiness data with KPI cards, correlation visuals, and cross-country GDP/health insights.
Analytical dashboard for claims utilization and high-cost patient profiling with PMPM analysis and cost deviation metrics for financial transparency.
Core tools and platforms I use to build production data systems.