Summary
Overview
Work History
Education
Skills
Timeline
Generic

SAIMANIKANTA KONANKI

chester,US

Summary

1 years of experience in IT, I excel in extracting, transforming, and loading data from source systems to Azure Data Lake Storage utilizing Spark streaming jobs with Azure Event Hub. I've developed ETL pipelines with Azure Data Factory and managed data transformations through Python or Scala scripts within Azure Databricks. Additionally, I've implemented numerous Delta Live tables (DLT) and crafted Power BI dashboards upon user requests. Innovative change agent with a unique mix of high-level technology direction and deep technical expertise.

Overview

2
2
years of professional experience

Work History

Data Engineer

Tcs
03.2021 - 08.2022

• Implemented Extract, Transform, and Load (ETL) processes to transfer data from source systems to Azure Data Storage Services utilizing a blend of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data ingestion was directed to one or more Azure

Services, including Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, with subsequent data processing conducted in Azure Databricks.

• Configured Azure Data Factory (ADF) to ingest data from diverse sources, both relational and non-relational databases, tailored to meet specific business functional requirements.

• Configured Spark streaming for real-time data reception from Azure Event Hub or Apache Flume, with Scala utilized to store the streaming data in an Azure table. Data Lake served as the repository for processing various data types, with the creation of Spark DataFrames.

• Leveraged various aggregation techniques offered by the Spark framework within the transformation layer, employing Apache Spark RDDs, Data Frame APIs, and Spark SQL.

• Applied expertise in optimizing Spark applications, adjusting parameters such as batch interval time, level of parallelism, and memory allocation to enhance processing speed and efficiency.

• Implemented migration of data from existing applications to Azure DW and Databricks through the creation of PySpark notebooks.

• Designed and executed end-to-end data solutions encompassing storage, integration, processing, and visualization components within the Azure environment.

• Managed batch processing of data sources utilizing Apache Spark.

• Prepared comprehensive ETL design documents detailing database structure, Change Data Capture mechanisms, error handling procedures, and strategies for restart and data refresh.

• Developed Power BI visualizations and dashboards to facilitate data analysis and interpretation.

• Engaged in unit testing and resolution of various bottlenecks encountered throughout the data engineering process.

• Demonstrated proficiency in applied statistics, exploratory data analysis (EDA), and visualization techniques using Power BI, Tableau, and Matplotlib.

Intern

ABC Supply Co
08.2020 - 12.2021

• Built scalable distributed data solutions in the EMR cluster environment with Amazon EMR.

• Utilized Spark-Streaming APIs to transform data from Kafka, persisting it into HDFS.

• Designed and developed data integration programs in a Hadoop environment, integrating with NoSQL data store Cassandra for analysis.

• Developed Spark SQL scripts in Python for accelerated data processing.

• Processed large datasets stored in AWS S3 buckets, performing preprocessing in Glue using Spark data frames.

• Updated Python scripts to align training data with our AWS Cloud Search database for document classification.

• Developed Spark workflows in Scala to extract and transform data from AWS.

• Created MapReduce/Spark Python modules for machine learning and predictive analytics in Hadoop on AWS.

• Managed job creation, debugging, scheduling, and monitoring using Airflow and Oozie.

• Developed Spark Applications in Python, implementing Apache Spark data processing projects for RDBMS and streaming services.

• Migrated MapReduce programs to Spark transformations using Spark and Scala.

• Worked with RDS databases like MySQL and NoSQL databases like MongoDB and HBase.

• Developed Tableau visualizations and dashboards using Tableau Desktop.

• Handled day-to-day issues and fine-tuned applications for optimal performance.

• Collaborated with team members and stakeholders in designing and developing the data environment.

Education

Master of Science - computer and information science

new england college
12.2023

Bachelor of Science - computer science and engineering

saveetha school of engineering
08.2021

Skills

Languages : Python, pysprak, SQL

Big Data Services : Azure - Databricks, Synapse Analytics

Apache Tools : Apache-Kafka, Airflow, Spark

Databases : Databricks, , MySQL,

Reporting Tools : Power BI, Tableau,

Cloud Services : Azure, AWS

Timeline

Data Engineer

Tcs
03.2021 - 08.2022

Intern

ABC Supply Co
08.2020 - 12.2021

Master of Science - computer and information science

new england college

Bachelor of Science - computer science and engineering

saveetha school of engineering
SAIMANIKANTA KONANKI