Summary

Overview

Work History

Skills

Activecertifications

Current Role

Timeline

Alekhya Akki

Summary

Over 7 years of experience as a Data Engineer, specializing in designing and implementing complex data solutions. Proficient in Big Data technologies: Hadoop, Spark, HDFS, Hive, and more. Skilled in programming languages: Python, SQL, Scala, PowerShell, and JavaScript. Extensive experience with databases: MySQL, SQL Server, Oracle, Teradata, Snowflake. Strong background in data modeling and ETL processes using SSIS and DBT transformations. Managed data workflows, ensuring data quality and compliance. Developed and deployed high-performance machine learning models in production. Designed and implemented end-to-end ETL pipelines in Azure Data Factory (ADF). Leveraged DBT for storing transformed data and creating SQL models. Utilized Azure Databricks for efficient data processing and transformation. Implemented data ingestion strategies into Snowflake using Snowpipe and bulk loading. Led the migration of on-premises databases to cloud platforms, minimizing downtime. Applied Spark Streaming for real-time data processing and transformation. Integrated Power BI with Snowflake for in-depth reporting and analysis. Developed Spark applications using PySpark and Spark-SQL for data extraction and transformation. Created Directed Acyclic Graphs (DAGs) in Airflow for workflow management. Automated regular AWS tasks using Python scripts for efficient operations.

Overview

years of professional experience

Work History

Data Engineer

Helix

San Mateo, CA

08.2022 - Current

Worked with Azure cloud services such as Azure Data Factory (ADF), Azure monitoring services, Blob storage containers, Cosmos DB, Azure Databricks, Azure SQL databases, Azure Synapse analytics
Extensive expertise in ADF for data orchestration, transformation, and comprehensive data monitoring and management
Designed end-to-end ETL pipelines in Azure Data Factory (ADF) to integrate data from company source databases and implementing efficient transformations using DBT (Data Build Tool)
Proficient in pulling data from sources such as SharePoint, flat files, and APIs
Leveraged DBT (Data build Tool) to store transformed data and created SQL models based on project requirements
Employed Common Table Expressions (CTE's) within DBT SQL models to enhance code readability and facilitate the maintenance of intricate queries
Scheduled and triggered DBT (Data Build Tool) models using ADF for seamless ETL pipeline execution
Monitored pipelines using Azure Monitoring Services to ensure continuous data arrival and promptly addressed any pipeline failures
Recorded pipeline status in an Excel sheet for efficient tracking and analysis
Proactively identified and resolved pipeline issues to maintain the integrity of data flow
Utilized Azure Databricks for efficient data processing and transformation
Utilized Azure Databricks as a Data Warehouse, testing and validating transformed data through the execution of SQL queries and leveraging PySpark for data manipulation
Capable of performing data modeling and designing databases to ensure efficient storage and retrieval of data
Utilized the Unity Catalog from Delta Lake for managing metadata and ensuring data quality
Created different layers of Datamarts using DBT and ingested data into Delta tables for optimized storage and retrieval
Generated YAML (.yml) files for each SQL model to maintain structured and versioned configurations, ensuring consistent and reproducible data storage formats
Utilized Power BI for in-depth reporting and analysis and addressing any discrepancies in the data
Developed insightful visualizations to aid in decision-making processes
Created, referred to, and contributed to Confluence pages for comprehensive documentation of various pipelines and onboarding processes within the project
Ensured that documentation was up-to-date and easily accessible for team members
Managed version control using GitHub, facilitating collaborative development, and ensuring an organized codebase.

Data Engineer

Tata Consultancy Services

India, India

09.2019 - 12.2021

Developed Spark applications using PySpark and Spark-SQL to efficiently extract, transform, and aggregate data from various file formats
Leveraging Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL, and Spark Streaming, I ensured a comprehensive approach to data processing
Extensively worked with AWS cloud services such as EC2, S3, EMR, Redshift, Lambda, and Glue
AWS Glue and Lambda were employed for building data pipelines (ELT/ETL Scripts), extracting data from diverse sources (MySQL, AWS S3 files), and loading it into the Data Warehouse (AWS Redshift)
Wrote ETL/ELT scripts to efficiently extract data from different sources, transform it, and load it into AWS Redshift, Spark SQL, PySpark, AWS Athena, and AWS Glue were combined to create robust ETL processes for seamless data movement
Contributed to the development of a serverless querying environment by writing to the Glue metadata catalog, enabling refined data querying from AWS Athena
Employed Spark Streaming APIs for on-the-fly transformations and actions to build a common learner data model
Utilized Spark Streaming to consume XML messages from Kafka, processing UI updates in real-time
Raw data was ingested into AWS S3 from Kinesis Firehose for initial processing
AWS Lambda functions were triggered upon raw data ingestion, processing and loading refined data into another S3 bucket, and writing to SQS queue as Aurora topics
Applied Spark DataFrames for preprocessing jobs, flattening JSON documents into flat files
D-Stream data was loaded into Spark RDD, and in-memory data computation was performed to generate output responses
Heavily worked with AWS databases, including RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis)
Applied Spark Dataframes for preprocessing jobs, skillfully flattening JSON documents into easily manageable flat files
Enhanced data accessibility and readability, paving the way for subsequent analysis and transformations
Enhanced the efficiency of our data processing workflows, ensuring they aligned precisely with project requirements
Successfully orchestrated the integration of various components, ensuring a seamless dataflow from extraction to transformation, storage, and real-time processing
Leveraging Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD, and Spark YARN, actively contributed to the continuous improvement of our data processing algorithms and workflows
Incorporated Power BI seamlessly into the data processing pipeline, providing stakeholders with dynamic dashboards and real-time analytics, ultimately enhancing decision-making capabilities with visually impactful insights across diverse data processing stages.

Hadoop Developer

Micron

India, India

07.2018 - 08.2019

Leveraged Sqoop and Flume for efficient data ingestion into the Hadoop environment
Applied Sqoop for structured data transfers, ensuring compatibility with the Big Data environment
Employed Flume to capture and transport clickstream data from front-facing application logs
Implemented effective error handling mechanisms to maintain data integrity
Monitored and optimized Flume configurations to enhance data transfer efficiency
Implemented Kafka functionalities for distributed messaging within the architecture
Leveraged Kafka's distribution and partitioning features for efficient data processing
Established a replicated commit log service in Kafka to maintain consistent data feeds
Used Kafka as a messaging system to implement real-time streaming solutions
Integrated Spark Streaming with Kafka for seamless processing of real-time data
Ensured low-latency data processing, making it suitable for time-sensitive applications
Real-time streaming solutions contributed to immediate insights and actionable intelligence
Created Spark applications using Scala and Java to handle diverse data processing tasks
Leveraged Spark's processing power to efficiently analyze and transform large datasets
Developed Sqoop scripts to facilitate the migration of data from Oracle to the Big Data environment
Ensured smooth and efficient transfer of data, considering data volume and complexity
Handled incremental loading of customer and transaction data based on date
Automated the data migration process, reducing manual intervention and enhancing reliability
Utilized Python scripting, PySpark, and Spark SQL for in-depth analysis of large datasets
Applied complex SQL queries on various source systems, including Oracle and SQL Server
Identified inconsistencies in data collected from diverse sources for data quality improvement
The analysis provided valuable insights for informed decision-making and enhanced data quality
Actively participated in designing and developing data ingestion processes in the Hadoop environment
Ensured seamless data flow between various stages of the Hadoop processing pipeline
Led the design of object models, data models, tables, and constraints for the Oracle Database
Collaborated with the project team to ensure database design alignment with project goals
Ensured the creation of efficient database structures supporting data processing requirements
Developed necessary stored procedures, functions, triggers, and packages for database functionality
Automated regular AWS tasks, including snapshot creation, using Python scripts
Created Directed Acyclic Graphs (DAGs) in Airflow for efficient workflow management
Ensured seamless integration between Python scripts, Airflow, and AWS services
Incorporated Tableau seamlessly into the Spark and AWS-driven data processing pipeline, providing stakeholders with advanced data visualization tools and interactive dashboards, ultimately enhancing the decision-making capabilities with visually impactful insights.

Data Engineer

UBS

India, India

07.2017 - 06.2018

Gathered Business Requirements, interacted with Users and SMEs to get a better understanding of the data and performed Data entry, data auditing, creating data reports & monitoring all data for accuracy
Performed data discovery and build a stream that automatically retrieves data from multitude of sources (SQL databases, external data such as social network data, user reviews) to generate KPI's using Tableau
Wrote ETL scripts in Python/SQL for extraction and validating the data
Created data models in Python to store data from various sources
Worked primarily on SQL Server, creating Store Procedures, Functions, Triggers, Indexes and Views
Leveraged Tableau to create interactive and visually appealing dashboards, providing a comprehensive view of key performance indicators (KPIs) and operational metrics across global programs
Utilized Excel's Data Analysis Toolpak to perform in-depth analysis on large datasets, including statistical and econometric modeling
Created and modified reports using Tableau, providing detailed insights into the gathered business requirements and supporting data-driven decision-making
Worked with large data sets, automate data extraction, built monitoring/reporting dashboards and high-value, automated Business Intelligence solutions (data warehousing and visualization).

Skills

Big Data Technologies
Hadoop
MapReduce
Spark
HDFS
Sqoop
YARN
Oozie
Hive
Impala
Zookeeper
Apache Flume
Apache Airflow
Cloudera
HBase
Programming Languages
Python
SQL
NoSQL
T-SQL
Scala
Power Shell Scripting
JavaScript
Cloud Services
Azure Data Lake Storage Gen 2
Azure Data Factory
Blob storage
Azure SQL DB
Databricks
Azure Event Hubs
AWS RDS
Amazon SQS

Amazon S3
AWS EMR
Lambda
AWS SNS
Databases
MySQL
SQL Server
Oracle
MS Access
Teradata
Snowflake
NoSQL Databases
MongoDB
Cassandra DB
Development Strategies
Agile
Lean Agile
Pair Programming
Waterfall
Test-Driven Development
Visualization & ETL tools
Power BI
Tableau
Informatica
Talend
SSIS
SSRS
Version Control & Containerization tools
Jenkins
GitHub
Monitoring tools

Activecertifications

AWS Certified Developer – Associate

Current Role

Data Engineer

Timeline

Data Engineer

Helix

08.2022 - Current

Data Engineer

Tata Consultancy Services

09.2019 - 12.2021

Hadoop Developer

Micron

07.2018 - 08.2019

Data Engineer

UBS

07.2017 - 06.2018

Alekhya Akki

Summary

Overview

Work History

Data Engineer

Data Engineer

Hadoop Developer

Data Engineer

Skills

Activecertifications

Current Role

Timeline

Data Engineer

Data Engineer

Hadoop Developer

Data Engineer

Similar Profiles

Preethi Manisha VempatiPreethi Manisha Vempati

Yougender YYougender Y

Lavanya PadamatiLavanya Padamati

Sakhena Meghana KuthadaSakhena Meghana Kuthada

Dimitris ManikisDimitris Manikis