Hi, I'm Don Richardson Bayya

Data Engineer & Analytics Professional

Data Engineer & Analytics Engineer experienced in architecting end-to-end cloud data platforms across Azure, AWS, and Databricks.
Specializing in ETL/ELT pipeline automation, PySpark optimization, and data warehouse / lakehouse architecture using ADF, Snowflake, AWS Glue, dbt, Airflow, and CI/CD.
Turning raw, unstructured data into governed, analytics-ready assets for enterprise BI and data science.

  • Azure
  • AWS
  • Databricks
  • Snowflake
  • dbt
  • Airflow

About Me

Don Richardson Bayya

Professional Journey

Celebal Technologies (2021-2024)

As a Big Data Engineer, I designed and deployed 15+ end-to-end ETL pipelines in Azure Data Factory and Databricks, processing millions of records daily for beverage and e-commerce clients. I optimized Spark cluster resource allocation achieving 35% cost reduction and led the migration of 200+ Hive Metastore tables to Databricks Unity Catalog.

I architected scalable AWS Glue data platforms consolidating data from SAP systems and on-prem sources into Amazon Redshift, improving reporting turnaround by 40%. I also automated the conversion of 500+ Snowflake SQL scripts to Databricks, creating reusable migration frameworks that reduced future migration effort by over 40%. My work focused on automation-first approaches, implementing CI/CD pipelines, and establishing data governance frameworks across multi-cloud environments.

3+

Years Experience

10+

Projects Completed

15+

Technologies

My Projects

Explore my work across different domains

PeeringDB Analytics Pipeline

Containerized data pipeline integrating PeeringDB API with Airflow, PostgreSQL, dbt, and Power BI, revealing 90% network underutilization insights.

Airflow dbt PostgreSQL Power BI Docker Python
Real-Time Order Analytics Pipeline

Streaming data pipeline processing continuous e-commerce orders through Apache Beam with 1-minute window aggregations for revenue metrics.

Apache Beam PostgreSQL Python Docker
Modern ELT Stack

Cloud-native ELT pipeline automating extraction and transformation with Airbyte, BigQuery, dbt, and Dagster, achieving 95% reduction in manual handling.

Airbyte Dagster dbt BigQuery Python SQL
Global Happiness Power BI Dashboard

Interactive dashboard analyzing nine years of World Happiness data with KPI cards, correlation visuals, and cross-country GDP/health insights.

Power BI MySQL DAX Power Query SQL
Healthcare Claims Analytics Dashboard

Analytical dashboard for claims utilization and high-cost patient profiling with PMPM analysis and cost deviation metrics for financial transparency.

Power BI MySQL DAX Power Query SQL

My Technical Skills

Core tools and platforms I use to build production data systems.

Programming Languages
Python SQL PySpark DAX
Data Engineering & Orchestration
Databricks Apache Spark Apache Beam Azure Data Factory AWS Glue Airflow Dagster dbt Airbyte
Cloud & Infrastructure
Azure AWS GCP Hadoop Hive Snowflake CI/CD IaC
Databases & Warehouses
PostgreSQL MySQL MongoDB Snowflake Redshift BigQuery
Data Modeling & Architecture
Dimensional Modeling Star Schema Data Warehouse Lakehouse Design ETL/ELT Pipelines
Analytics & Visualization
Power BI Tableau Grafana
DevOps & Containers
Docker Docker Compose Azure DevOps Git GitHub Actions
Machine Learning & Libraries
scikit-learn TensorFlow XGBoost Keras NLP
Tools & Governance
REST APIs Data Quality Testing Monitoring Orchestration Data Governance

Contact Me

Let's connect and discuss opportunities

Email

Email

Phone

Phone

LinkedIn

LinkedIn

GitHub

GitHub