Summary
Overview
Work History
Skills
Activecertifications
Current Role
Timeline
Generic

Alekhya Akki

Summary

Over 7 years of experience as a Data Engineer, specializing in designing and implementing complex data solutions. Proficient in Big Data technologies: Hadoop, Spark, HDFS, Hive, and more. Skilled in programming languages: Python, SQL, Scala, PowerShell, and JavaScript. Extensive experience with databases: MySQL, SQL Server, Oracle, Teradata, Snowflake. Strong background in data modeling and ETL processes using SSIS and DBT transformations. Managed data workflows, ensuring data quality and compliance. Developed and deployed high-performance machine learning models in production. Designed and implemented end-to-end ETL pipelines in Azure Data Factory (ADF). Leveraged DBT for storing transformed data and creating SQL models. Utilized Azure Databricks for efficient data processing and transformation. Implemented data ingestion strategies into Snowflake using Snowpipe and bulk loading. Led the migration of on-premises databases to cloud platforms, minimizing downtime. Applied Spark Streaming for real-time data processing and transformation. Integrated Power BI with Snowflake for in-depth reporting and analysis. Developed Spark applications using PySpark and Spark-SQL for data extraction and transformation. Created Directed Acyclic Graphs (DAGs) in Airflow for workflow management. Automated regular AWS tasks using Python scripts for efficient operations.

Overview

7
7
years of professional experience

Work History

Data Engineer

Helix
San Mateo, CA
08.2022 - Current
  • Worked with Azure cloud services such as Azure Data Factory (ADF), Azure monitoring services, Blob storage containers, Cosmos DB, Azure Databricks, Azure SQL databases, Azure Synapse analytics
  • Extensive expertise in ADF for data orchestration, transformation, and comprehensive data monitoring and management
  • Designed end-to-end ETL pipelines in Azure Data Factory (ADF) to integrate data from company source databases and implementing efficient transformations using DBT (Data Build Tool)
  • Proficient in pulling data from sources such as SharePoint, flat files, and APIs
  • Leveraged DBT (Data build Tool) to store transformed data and created SQL models based on project requirements
  • Employed Common Table Expressions (CTE's) within DBT SQL models to enhance code readability and facilitate the maintenance of intricate queries
  • Scheduled and triggered DBT (Data Build Tool) models using ADF for seamless ETL pipeline execution
  • Monitored pipelines using Azure Monitoring Services to ensure continuous data arrival and promptly addressed any pipeline failures
  • Recorded pipeline status in an Excel sheet for efficient tracking and analysis
  • Proactively identified and resolved pipeline issues to maintain the integrity of data flow
  • Utilized Azure Databricks for efficient data processing and transformation
  • Utilized Azure Databricks as a Data Warehouse, testing and validating transformed data through the execution of SQL queries and leveraging PySpark for data manipulation
  • Capable of performing data modeling and designing databases to ensure efficient storage and retrieval of data
  • Utilized the Unity Catalog from Delta Lake for managing metadata and ensuring data quality
  • Created different layers of Datamarts using DBT and ingested data into Delta tables for optimized storage and retrieval
  • Generated YAML (.yml) files for each SQL model to maintain structured and versioned configurations, ensuring consistent and reproducible data storage formats
  • Utilized Power BI for in-depth reporting and analysis and addressing any discrepancies in the data
  • Developed insightful visualizations to aid in decision-making processes
  • Created, referred to, and contributed to Confluence pages for comprehensive documentation of various pipelines and onboarding processes within the project
  • Ensured that documentation was up-to-date and easily accessible for team members
  • Managed version control using GitHub, facilitating collaborative development, and ensuring an organized codebase.

Data Engineer

Tata Consultancy Services
India, India
09.2019 - 12.2021
  • Developed Spark applications using PySpark and Spark-SQL to efficiently extract, transform, and aggregate data from various file formats
  • Leveraging Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL, and Spark Streaming, I ensured a comprehensive approach to data processing
  • Extensively worked with AWS cloud services such as EC2, S3, EMR, Redshift, Lambda, and Glue
  • AWS Glue and Lambda were employed for building data pipelines (ELT/ETL Scripts), extracting data from diverse sources (MySQL, AWS S3 files), and loading it into the Data Warehouse (AWS Redshift)
  • Wrote ETL/ELT scripts to efficiently extract data from different sources, transform it, and load it into AWS Redshift, Spark SQL, PySpark, AWS Athena, and AWS Glue were combined to create robust ETL processes for seamless data movement
  • Contributed to the development of a serverless querying environment by writing to the Glue metadata catalog, enabling refined data querying from AWS Athena
  • Employed Spark Streaming APIs for on-the-fly transformations and actions to build a common learner data model
  • Utilized Spark Streaming to consume XML messages from Kafka, processing UI updates in real-time
  • Raw data was ingested into AWS S3 from Kinesis Firehose for initial processing
  • AWS Lambda functions were triggered upon raw data ingestion, processing and loading refined data into another S3 bucket, and writing to SQS queue as Aurora topics
  • Applied Spark DataFrames for preprocessing jobs, flattening JSON documents into flat files
  • D-Stream data was loaded into Spark RDD, and in-memory data computation was performed to generate output responses
  • Heavily worked with AWS databases, including RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis)
  • Applied Spark Dataframes for preprocessing jobs, skillfully flattening JSON documents into easily manageable flat files
  • Enhanced data accessibility and readability, paving the way for subsequent analysis and transformations
  • Enhanced the efficiency of our data processing workflows, ensuring they aligned precisely with project requirements
  • Successfully orchestrated the integration of various components, ensuring a seamless dataflow from extraction to transformation, storage, and real-time processing
  • Leveraging Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD, and Spark YARN, actively contributed to the continuous improvement of our data processing algorithms and workflows
  • Incorporated Power BI seamlessly into the data processing pipeline, providing stakeholders with dynamic dashboards and real-time analytics, ultimately enhancing decision-making capabilities with visually impactful insights across diverse data processing stages.

Hadoop Developer

Micron
India, India
07.2018 - 08.2019
  • Leveraged Sqoop and Flume for efficient data ingestion into the Hadoop environment
  • Applied Sqoop for structured data transfers, ensuring compatibility with the Big Data environment
  • Employed Flume to capture and transport clickstream data from front-facing application logs
  • Implemented effective error handling mechanisms to maintain data integrity
  • Monitored and optimized Flume configurations to enhance data transfer efficiency
  • Implemented Kafka functionalities for distributed messaging within the architecture
  • Leveraged Kafka's distribution and partitioning features for efficient data processing
  • Established a replicated commit log service in Kafka to maintain consistent data feeds
  • Used Kafka as a messaging system to implement real-time streaming solutions
  • Integrated Spark Streaming with Kafka for seamless processing of real-time data
  • Ensured low-latency data processing, making it suitable for time-sensitive applications
  • Real-time streaming solutions contributed to immediate insights and actionable intelligence
  • Created Spark applications using Scala and Java to handle diverse data processing tasks
  • Leveraged Spark's processing power to efficiently analyze and transform large datasets
  • Developed Sqoop scripts to facilitate the migration of data from Oracle to the Big Data environment
  • Ensured smooth and efficient transfer of data, considering data volume and complexity
  • Handled incremental loading of customer and transaction data based on date
  • Automated the data migration process, reducing manual intervention and enhancing reliability
  • Utilized Python scripting, PySpark, and Spark SQL for in-depth analysis of large datasets
  • Applied complex SQL queries on various source systems, including Oracle and SQL Server
  • Identified inconsistencies in data collected from diverse sources for data quality improvement
  • The analysis provided valuable insights for informed decision-making and enhanced data quality
  • Actively participated in designing and developing data ingestion processes in the Hadoop environment
  • Ensured seamless data flow between various stages of the Hadoop processing pipeline
  • Led the design of object models, data models, tables, and constraints for the Oracle Database
  • Collaborated with the project team to ensure database design alignment with project goals
  • Ensured the creation of efficient database structures supporting data processing requirements
  • Developed necessary stored procedures, functions, triggers, and packages for database functionality
  • Automated regular AWS tasks, including snapshot creation, using Python scripts
  • Created Directed Acyclic Graphs (DAGs) in Airflow for efficient workflow management
  • Ensured seamless integration between Python scripts, Airflow, and AWS services
  • Incorporated Tableau seamlessly into the Spark and AWS-driven data processing pipeline, providing stakeholders with advanced data visualization tools and interactive dashboards, ultimately enhancing the decision-making capabilities with visually impactful insights.

Data Engineer

UBS
India, India
07.2017 - 06.2018
  • Gathered Business Requirements, interacted with Users and SMEs to get a better understanding of the data and performed Data entry, data auditing, creating data reports & monitoring all data for accuracy
  • Performed data discovery and build a stream that automatically retrieves data from multitude of sources (SQL databases, external data such as social network data, user reviews) to generate KPI's using Tableau
  • Wrote ETL scripts in Python/SQL for extraction and validating the data
  • Created data models in Python to store data from various sources
  • Worked primarily on SQL Server, creating Store Procedures, Functions, Triggers, Indexes and Views
  • Leveraged Tableau to create interactive and visually appealing dashboards, providing a comprehensive view of key performance indicators (KPIs) and operational metrics across global programs
  • Utilized Excel's Data Analysis Toolpak to perform in-depth analysis on large datasets, including statistical and econometric modeling
  • Created and modified reports using Tableau, providing detailed insights into the gathered business requirements and supporting data-driven decision-making
  • Worked with large data sets, automate data extraction, built monitoring/reporting dashboards and high-value, automated Business Intelligence solutions (data warehousing and visualization).

Skills

  • Big Data Technologies
  • Hadoop
  • MapReduce
  • Spark
  • HDFS
  • Sqoop
  • YARN
  • Oozie
  • Hive
  • Impala
  • Zookeeper
  • Apache Flume
  • Apache Airflow
  • Cloudera
  • HBase
  • Programming Languages
  • Python
  • SQL
  • NoSQL
  • T-SQL
  • Scala
  • Power Shell Scripting
  • JavaScript
  • Cloud Services
  • Azure Data Lake Storage Gen 2
  • Azure Data Factory
  • Blob storage
  • Azure SQL DB
  • Databricks
  • Azure Event Hubs
  • AWS RDS
  • Amazon SQS
  • Amazon S3
  • AWS EMR
  • Lambda
  • AWS SNS
  • Databases
  • MySQL
  • SQL Server
  • Oracle
  • MS Access
  • Teradata
  • Snowflake
  • NoSQL Databases
  • MongoDB
  • Cassandra DB
  • Development Strategies
  • Agile
  • Lean Agile
  • Pair Programming
  • Waterfall
  • Test-Driven Development
  • Visualization & ETL tools
  • Power BI
  • Tableau
  • Informatica
  • Talend
  • SSIS
  • SSRS
  • Version Control & Containerization tools
  • Jenkins
  • GitHub
  • Monitoring tools

Activecertifications

AWS Certified Developer – Associate

Current Role

Data Engineer

Timeline

Data Engineer

Helix
08.2022 - Current

Data Engineer

Tata Consultancy Services
09.2019 - 12.2021

Hadoop Developer

Micron
07.2018 - 08.2019

Data Engineer

UBS
07.2017 - 06.2018
Alekhya Akki