Hi, my name is

Gnana Aravindh Elavazhagan.

I build large-scale data platforms.

I'm a Senior Data Engineer with 8+ years of experience designing and building scalable, reliable data infrastructure. I specialise in batch and streaming systems that power analytics and machine learning at scale.

Check out my work!

gnanaaravindhela23@gmail.com

About Me

Hello! I'm Gnana, a Senior Data Engineer based in Chennai, India. I enjoy creating things that live in the cloud and process massive amounts of data. My interest in data engineering started back in 2017 when I realised the power of turning raw data into actionable insights.

Fast-forward to today, and I've had the privilege of working on complex data platforms at companies like PayPal, Ford Motors, and Tredence Analytics — processing terabytes of data daily. My main focus is building reliable, scalable data infrastructure and mentoring engineers on best practices.

Here are a few technologies I've been working with recently:

▹Apache Spark
▹SQL / BigQuery
▹Apache Kafka
▹GCP (Dataflow, GCS)
▹Apache Airflow
▹Python
▹Snowflake
▹Apache Beam

GAE

Skills & Technologies

Languages

PythonSQLShell / Bash

Big Data

Apache SparkApache KafkaHadoopHiveApache Beam

Cloud — GCP

BigQueryDataflowCloud Storage (GCS)Cloud FunctionsPub/SubCloud Monitoring

Databases

PostgreSQLSnowflakeOracleSQL Server

Orchestration & DevOps

Apache AirflowTerraformDockerKubernetesGitCI/CDTekton

Other Tools

Great ExpectationsDatadogInformatica PowerCenter

Where I've Worked

Senior Data Engineer @ Tredence Analytics

December 2024 – Present

▹Architected and scaled GCP-based data platforms using BigQuery, building 100+ pipelines for data ingestion.
▹Built high-performance batch and real-time pipelines with Dataflow, integrating Oracle, Snowflake, Kafka, and GCS — processing millions of events per day.
▹Orchestrated 200+ production workflows using Airflow, achieving 99.9% pipeline reliability with robust monitoring and alerting.
▹Led technical delivery enabling 30–40% faster reporting, improved data quality, and business-ready insights for leadership and operations teams.
▹Mentored 5+ junior engineers on data engineering best practices and code review standards.
▹Drove data governance, performance optimisation, and stakeholder alignment to enable real-time, insight-driven retail decisions.

Staff Data Engineer @ Altimetrik – PayPal

June 2024 – December 2024

▹Spearheaded end-to-end data engineering for 3 key PayPal/Venmo products (Add Funds, Cash Deposits, Plus Customers).
▹Designed scalable data pipelines enabling KPI tracking that contributed to a 20% increase in transaction and deposit volumes.
▹Built and maintained 50+ ETL pipelines supporting analytics teams and business intelligence reporting.
▹Developed and maintained experimentation frameworks for A/B testing — enabling real-time data collection and analysis for UX experiments.
▹Engineered a robust experimentation data infrastructure, driving a 15% increase in conversion rates through data-backed UX decisions.
▹Optimised BigQuery SQL queries and warehouse performance, reducing report generation time by 60%.
▹Implemented CI/CD pipelines using DataLM and Git for reliable, scalable, production-ready data deployments.
▹Collaborated with cross-functional teams to define data requirements and implement solutions.

Data Engineer @ Ford Motors

June 2021 – June 2024

▹Led enterprise-scale migration of historical data from Hive to GCP BigQuery using Hadoop DistCp and GCS connectors.
▹Built scalable real-time and batch ingestion pipelines using Apache Kafka, Dataflow (Apache Beam), and BigQuery for advanced analytics.
▹Developed streaming Dataflow jobs for cleansing, transformation, and real-time enrichment before warehouse storage.
▹Optimised BigQuery performance using partitioning, clustering, ARRAY<STRUCT> modelling, and fixed slot reservations to reduce cost.
▹Implemented deduplication using BigQuery scheduled queries and key-based logic to improve data accuracy.
▹Led data governance initiatives ensuring GDPR, HIPAA, and CCPA compliance — implementing PII tagging, views, and access control policies.
▹Automated infrastructure using Terraform, Git version control, and Tekton-based CI/CD pipelines for scalable GCP deployments.
▹Established monitoring and alerting using Cloud Monitoring & Logging for reliable real-time data processing.
▹Developed an LLM-powered chatbot using LangChain, Python, and Flask to enhance customer query automation and support operations.

ETL Developer @ Tata Consultancy Services

October 2017 – June 2021

▹Designed and developed 50+ ETL mappings and workflows in Informatica PowerCenter to extract, transform, and load data from Oracle, SQL Server, and flat files into the enterprise data warehouse.
▹Implemented Slowly Changing Dimensions (SCD Type 1 & 2) for tracking historical data changes across customer and product dimensions.
▹Built complex PL/SQL stored procedures and shell scripts for data transformation, cleansing, and reconciliation.
▹Optimised Oracle SQL queries and ETL workflows, improving overall pipeline performance by 35% and reducing batch processing windows.
▹Migrated legacy batch jobs to Informatica PowerCenter, eliminating manual processing steps and reducing errors by 70%.
▹Worked with Hadoop ecosystem (HDFS, Hive) for batch analytics and large-scale data processing.
▹Provided production support including on-call coverage, incident root cause analysis, and SLA adherence for critical data pipelines.
▹Collaborated with business analysts and DBAs to translate data requirements into technical specifications and physical data models.

Some Things I've Built

Featured Project

Real-Time Streaming Data Platform

An end-to-end streaming pipeline ingesting events from Kafka, transforming them via Cloud Dataflow (Apache Beam), and landing into BigQuery with exactly-once semantics. Handles 100K+ events/sec with automatic autoscaling and sub-second latency.

Apache Kafka
Cloud Dataflow
Streaming
BigQuery

Other Noteworthy Projects

Oracle to BigQuery Batch Pipeline

Enterprise batch ingestion pipeline extracting data from Oracle, transforming via Dataflow, landing in GCS, and loading into BigQuery for analytics-ready datasets.

Oracle
Dataflow
GCS
BigQuery

Snowflake to BigQuery Data Pipeline

Scheduled data sync pipeline moving curated Snowflake datasets through Dataflow into GCS staging and BigQuery, enabling unified reporting across cloud platforms.

Snowflake
Dataflow
GCS
BigQuery

Writing & Insights

I write about data engineering, system design, lessons learned from production systems, and best practices I've discovered along the way.

January 15, 2026

What's Next?

Get In Touch

I'm currently open to new opportunities and interesting data engineering challenges. Whether you have a question, want to discuss a project, or just want to say hi — my inbox is always open!

Say Hello