Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

SRILEKHA NAGU

Denton,TX

Summary

Professional Summary

Dedicated Data Engineer with over 6 years of hands-on experience, specializing in the design, implementation, and optimization of end-to-end data pipelines. Proficient in harnessing cutting-edge technologies to transform raw data into actionable insights, driving well-informed decision-making.

Key Skills:

  • Extensive experience in designing, developing, and executing data pipelines and data lake requirements using the Big Data Technology stack, Python, PL/SQL, SQL, REST APIs, and the Azure cloud platform.
  • Proficiency with key Big Data tools, including HDFS, Kafka, Map Reduce, Spark, PIG, HIVE, Sqoop, HBase, Flume, and Zookeeper for designing and deploying comprehensive big data ecosystems.
  • Expertise in Spark Data Frame Operations for critical data validation and analytics on Hive data within Cloudera infrastructure.
  • Skilled in developing advanced MapReduce systems to process various file types, including Text, Sequence, XML, and JSON.
  • Successfully migrated on-premises applications to leverage Azure cloud databases and storage.
  • Hands-on experience with Azure services, including SQL Database, SQL Data Warehouse, Analysis Services, HDInsight, Data Lake, and Data Factory.
  • Proficient in building CI/CD pipelines on AWS using Code Commit, Code Build, Code Deploy, and Code Pipeline, as well as utilizing AWS Cloud Formation, API Gateway, and AWS Lambda for automation and infrastructure security.
  • Expertise in Azure data solutions, including storage account provisioning, Azure Data Factory, SQL server, SQL Databases, SQL Data Warehouse, Azure Data Bricks, and Azure Cosmos DB.
  • Strong understanding of Spark Architecture with Databricks and Structured Streaming.
  • Practical experience with Python and Apache Airflow to create, schedule, and monitor workflows.
  • Knowledge of data analytics services, such as Quick Sight, Glue Data Catalog, and Athena.
  • Proficiency in working with Apache Kafka and Confluent environments, including KTables, Global KTables, and KStreams for Kafka streaming.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

Affirmative Insurance, Affirmative Insurance Holdings
01.2022 - Current
  • Led data migration from on-premises SQL servers to Azure cloud databases, including Azure Synapse Analytics and Azure SQL DB.
  • Utilized Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics for ETL processes and data ingestion.
  • Performed data processing in Azure Databricks.
  • Worked with Kafka streaming for subscriber-side data processing, integrating messages into databases.
  • Leveraged Apache Spark for real-time data processing.
  • Designed a reusable Python pattern for Synapse integration, aggregations, ft change data capture,deduplication, and high watermark implementation.
  • Accelerated development and promoted standardization across teams.
  • Integrated Kubernetes with cloud-native services, such as AWS EKS and GCP GKE, to enhance scalability.
  • Led the migration of data to Snowflake and AWS from legacy data warehouses.
  • Contributed to the Data and Reporting team, creating actionable insights and visualizations for informed decision-making.
  • Extracted and analyzed data from various sources, implementing data wrangling and cleanup using Python-Pandas.
  • Demonstrated proficiency with common Python Data Engineering packages, including Pandas, Numpy, Pyarrow, Pytest, Scikit-Learn, and Boto3.
  • Created and maintained CI/CD pipelines, applying automation to environments and applications.
  • Utilized Python for data manipulation and wrote data into JSON files for testing Django websites.
  • Developed and maintained Docker container clusters managed by Kubernetes.
  • Managed infrastructure as code using AWS Terraform templates.
  • Configured Jenkins pipelines to execute various steps, including unit testing, integration testing, and static analysis tools.

Azure Data Engineer

DXC Technology
02.2020 - 11.2021
  • Implemented data quality checks, validations, and monitoring processes to ensure the accuracy and integrity of data in the pharmaceutical company.
  • Architected and implemented medium to large-scale Business Intelligence (BI) solutions on Azure using Azure Data Platform services, including Azure Data Lake, Data Factory, Data Lake Analytics, and Stream Analytics.
  • Utilized Azure Data Lake, Azure Data Factory, and Azure Databricks to efficiently move and transform on-premises data to the cloud, meeting the analytical needs of the organization.
  • Analyzed data using SQL, Python, and Apache Spark, creating and presenting analytical reports for management and technical teams.
  • Deployed models as Python packages, APIs for backend integration, and microservices within a Kubernetes orchestration layer for Docker containers.
  • Created pipelines in Azure Data Factory for data extraction, transformation, and loading (ETL) from diverse sources, including Azure SQL, Blob storage, and Azure SQL Data Warehouse.
  • Developed and implemented data acquisition jobs using Scala, Sqoop, Hive, and Pig, optimizing MapReduce jobs for efficient Hadoop Distributed File System (HDFS) usage.
  • Enhanced data processing efficiency by converting and parsing data formats using PySpark Data Frames, reducing data conversion and parsing time.
  • Established and maintained continuous integration and deployment (CI/CD) pipelines, applying automation to environments and applications.
  • Proficient in automation tools such as GIT, Terraform, and Ansible.
  • Implemented Python automation for Capital Analysis and Review, utilizing Pandas and NumPy modules for data manipulation and analysis, ensuring accurate reporting and streamlined decision-making.
  • Led the migration to AWS, utilizing Amazon Redshift for data warehousing and HiveQL for reporting, resulting in a 30% reduction in data retrieval and processing time.


Data Engineer

HDFC Bank
10.2018 - 01.2020
  • Spearheaded various stages of the Software Development Lifecycle (SDLC), encompassing requirement gathering, design, development, deployment, and application analysis.
  • Proficiently managed data import from diverse sources, executing transformations with Hive and MapReduce, and loading data into HDFS. Also, efficiently extracted data from SQL into HDFS using Sqoop.
  • Developed advanced analytical components utilizing Scala, Spark, Apache Mesos, and Spark Stream.
  • Installed and configured Hadoop, MapReduce, and HDFS, leading to the creation of multiple MapReduce jobs in PIG and Hive for data cleansing and pre-processing.
  • Expertly facilitated Big Data Integration and Analytics, leveraging technologies such as Hadoop, SOLR, Spark, Kafka, Storm, and web Methods.
  • Collaborated with the DevOps Team, utilizing CI/CD tools like Jenkins and Docker to establish end-to-end application processes, encompassing deployment in lower environments and delivery.
  • Designed and implemented Python code to collect data from HBase (Cornerstone) and devised a PySpark-based solution for data processing.
  • Engineered a Java API (Commerce API) for seamless connection to Cassandra via Java services.

Application Developer/ Data Engineer

Care Health Insurence
07.2017 - 09.2018
  • Created and managed workflows using Oozie, orchestrating MapReduce jobs and Hive Queries.
  • Developed Session Beans and controller Servlets to proficiently manage HTTP requests originating from Talend.
  • Proficiently executed data visualization, including the design of interactive dashboards using Tableau. Generated complex reports comprising charts, summaries, and graphs to effectively convey insights to the team and stakeholders.
  • Provided support for the development of web portals, completed data modeling in PostgreSQL, and contributed to front-end development using HTML/CSS and jQuery.
  • Engineered Python code to collect and process data from HBase (Cornerstone) and formulated a PySpark-based solution for implementation.
  • Designed and implemented a Java API (Commerce API) to enable seamless connectivity to Cassandra through Java services.

Education

Master of Science - Data Science

University of North Texas
Denton, TX
05.2023

Bachelor of Technology -

Kamala Institute of Technology And Science
2020

Skills

  • Microsoft Azure Cloud Exposure
  • Azure Databricks
  • Azure data factory
  • Azure Synapse analytics/SQL(Data warehouse)
  • Logic apps
  • Azure data lake
  • Azure Analysis services
  • Azure Key Vault services
  • Databases
  • Snowflake, MySQL, Teradata, Oracle, MS SQL SERVER, PostgreSQL, DB2
  • Big Data Ecosystem
  • Hadoop Map Reduce, Impala, HDFS, Hive, Pig, HBase, Flume, Storm, Sqoop, Oozie, Kafka, Spark, Zookeeper
  • AWS
  • EC2, Amazon S3, ECS, Amazon RDS, VPC, IAM, Amazon Elastic Load
  • Balancing, CloudFront, Auto Scaling, CloudWatch, Redshift
  • Continues Integration/ Containerization
  • Jenkins, Docker, Kubernetes
  • Version Control
  • Git, SVN, Bitbucket
  • Operating Systems
  • Windows, Windows, Linux, & Mac OS
  • IDE Tools
  • Eclipse, PyCharm, Jupyter, PowerBI, Tableau
  • Environment:
  • Python 3, Django, Rest, Apache Tomcat, Azure, MS Team Foundation Server, jQuery, HTML, CSS, JSON, XML, JIRA, PyCharm

Certification

Az-900 Azure Fundamentals

DP-203 Data Engineering on Microsoft Azure

Timeline

Azure Data Engineer

Affirmative Insurance, Affirmative Insurance Holdings
01.2022 - Current

Azure Data Engineer

DXC Technology
02.2020 - 11.2021

Data Engineer

HDFC Bank
10.2018 - 01.2020

Application Developer/ Data Engineer

Care Health Insurence
07.2017 - 09.2018

Master of Science - Data Science

University of North Texas

Bachelor of Technology -

Kamala Institute of Technology And Science
SRILEKHA NAGU