Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

SHIVAPRASAD SANDRALA

Data Engineer
Powell,OH

Summary

Highly skilled and dedicated Data Engineer with over 6 years of experience in software analysis, design, development, and implementation of Cloud and Big Data solutions. Proficient in leveraging technologies such as Big Query, Spark, Scala, Hadoop, and Oracle Database to build and maintain robust data pipelines. Hands on experience in Normalization and De - Normalization techniques for optimum performance in relational and dimensional database environments. Skilled in leveraging innovative technologies and approaches to renovate, extend, and transform core data assets, including SQL-based, NoSQL-based, and Cloud-based data platforms. Extensive expertise in developing data models, pipeline architectures, and providing ETL solutions for project models. Managed end-to-end operations of ETL data pipelines using Matillion on AWS Cloud Services and Azure Data Factory on Azure Cloud Services, ensuring seamless data ingestion, Data Processing/Transformation,Data Curation. Proven ability to design and specify Informatica ETL processes, optimizing schema loading and performance. Skilled in ETL architecture design and implementation, consistently delivering high-performance solutions. Certified in software engineering concepts, with hands-on experience in system design, application development, testing, and operational stability. Proficient in coding using modern programming languages and database querying languages, ensuring efficient and maintainable code. Utilized JIRA as a project management tool to effectively track and prioritize data engineering tasks, ensuring timely delivery of projects and seamless collaboration with cross-functional teams. Experienced in working with Agile methodologies, facilitating iterative and incremental development cycles, and promoting efficient communication and collaboration within the team. Knowledge and skills in secondary tools such as Microsoft Azure, SQL data warehouse, PolyBase, and Visual Studio. Proficient in SQL and other relational databases. Experienced in integrating Power BI reports into other applications using embedded analytics (Power BI service or API automation) and developing custom visuals for Power BI. Proficient in utilizing Python for data manipulation, analysis, and scripting, enabling efficient data processing and transformation. Skilled in PySpark, leveraging the power of Apache Spark for distributed data processing, machine learning, and real-time analytics. Strong troubleshooting and problem-solving skills, capable of identifying and resolving.

Overview

6
6
years of professional experience
5
5
years of post-secondary education

Work History

Data Engineer

New York life insurance
Powell, OH
08.2023 - Current
  • Design and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing, and Reporting of voluminous, rapidly changing data.
  • Responsible for maintaining transactional data in the source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.
  • Creating SQL Plus scripts and packages to generate comprehensive reports.
  • Developing and automating Shell scripts to streamline processes and eliminate manual tasks.
  • Adapting existing logics or developing new ones to meet evolving customer requirements.
  • Managing monthly data transfers from mainframe systems to Oracle databases.
  • Collaborated with business stakeholders to analyze requirements and develop customized SQL logic, ensuring system alignment with evolving business needs.
  • Proficient in Databricks data streaming tech architecture, with a strong understanding of building Analysis Services reporting models.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
  • Experienced in connecting to various data sources in Databricks, importing data, and transforming it for Business Intelligence purposes.
  • Leverage Azure Databricks to migrate on-premises data to the cloud, optimizing data processing and analytics capabilities.
  • Develop and execute data pipelines using Azure Databricks to transform and load data into cloud-based data warehouses or data lakes.

Cloud Data Engineer

Star Infra IT solutions
Naperville, IL
08.2022 - 08.2023
  • Currently leveraging Spark Context, Spark SQL, Data Frames, and Pair RDD's to perform large-scale data processing and analysis, ensuring efficient data manipulation and transformation.
  • Proficient in Spark Streaming and Kafka integration, enabling real-time data processing and analysis for streaming applications, and implementing data pipelines for continuous data ingestion.
  • Utilizing HBase as a NoSQL database, ensuring high-speed data storage and retrieval for real-time and big data applications.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Extensively working with Spark Streaming APIs to develop and deploy real-time data processing applications, enabling timely insights and decision-making.
  • Experienced in using Alteryx and Matillion for data integration and ETL processes, ensuring seamless data flow and transformation across various data sources and systems.
  • Implementing data warehousing solutions using Amazon Redshift and Snowflake data warehouse, providing scalable and high-performance storage and retrieval capabilities for analytics and reporting.
  • Proficient in Python and PySpark for data engineering tasks, including data processing, data cleaning, and feature engineering, ensuring efficient and scalable data analysis.
  • Expertise in developing and optimizing SQL queries, ensuring efficient data retrieval and manipulation for various data engineering tasks.
  • Experienced in working with BIGQUERY, AWS S3, and Azure Blob storage for data storage and retrieval, enabling seamless integration and accessibility of data across different cloud platforms.
  • Working with release management technologies such as Jenkins, github, gitlab and Ansible.
  • Testing Python Applications using GitLab CI/CD for python utilizing gitlab-runner(windows).
  • Proficient in Scala programming language for developing Spark applications, providing strong functional programming capabilities for distributed data processing.
  • Skilled in migrating data from Oracle to Snowflake, ensuring smooth data transfer and maintaining data integrity throughout the process.
  • Experienced in Continuous Integration and Continuous Deployment (CI/CD) practices, ensuring streamlined and automated deployment of data engineering solutions.
  • Passionate about staying up to date with latest advancements in data engineering and continuously expanding knowledge and skillset in Python, PySpark, and Scala.
  • Salesforce

Data Engineer

DXC Technology
Hyderabad
08.2018 - 01.2021
  • Conducted thorough analysis of business requirements, documenting and translating them into actionable insights.
  • Created comprehensive process and system flow charts to visualize implementation plan.
  • Worked on End-to-End Software Development Life Cycle process in Agile Environment using SCRUM methodologies.
  • Designed and implemented a cutting-edge Big Data Analysis system, leveraging Tableau as primary dashboarding tool.
  • Architected system to efficiently handle large volumes of data, ensuring optimal performance and data integrity.
  • Led redevelopment of centralized enterprise data warehouse by reverse engineering existing reports.
  • Streamlined data storage and retrieval processes, enhancing overall system efficiency and reliability.
  • Developed multiple complex Extract, Transform, Load (ETL) processes and Cubes to extract data from diverse sources using tools such as SSIS, SSAS, and .Net.
  • Ensured seamless data integration and maintained data consistency throughout system.
  • Spearheaded design and implementation of end-to-end data solutions on Azure platform, leveraging services such as Azure Data Factory, Azure Databricks, and Azure Synapse Analytics.
  • Developed scalable and efficient data pipelines using Azure Data Factory, ensuring smooth and reliable movement of data from various sources to target systems.
  • Designed and implemented data ingestion processes, including data extraction, transformation, and loading (ETL), using Azure Data Factory and Azure Databricks.
  • Utilized Azure Synapse Analytics (formerly SQL Data Warehouse) to build high-performance data warehousing solutions, enabling advanced analytics and reporting capabilities for organization.
  • Successfully developed and optimized data pipelines using Python, PySpark, and Scala, ensuring seamless data integration and transformation across various sources and formats.
  • Leveraged Azure Data Lake Storage and Azure Blob Storage to efficiently store and manage large volumes of structured and unstructured data, enabling seamless data access and retrieval.
  • Developed and maintained data pipelines using Azure ecosystem, including Azure Databricks, Azure Data Lake Storage, and Azure Synapse Analytics, for seamless and scalable data processing and analysis.
  • Demonstrated strong problem-solving skills and ability to troubleshoot complex data issues, ensuring stability and reliability of Azure data solutions.
  • Actively kept up to date with latest developments in Azure data engineering, continuously expanding knowledge and skillset through training and certifications.

Education

Master of Science - Computer Science

Chicago State University
Chicago, IL
01.2021 - 05.2022

Bachelor of Technology - Electronics and Communication Engineering

Mahatma Gandhi University
08.2014 - 05.2018

Skills

  • Python
  • PySpark
  • Shell Scripting
  • Apache Spark
  • Hadoop
  • HDFS
  • MapReduce
  • Hive
  • HBase
  • MySQ
  • Oracle
  • NoSQL
  • Amazon Redshift
  • Microsoft Azure
  • Snowflake
  • Amazon Dynamo DB
  • Amazon S3
  • Teradata
  • Amazon RDS
  • Tableau
  • Power BI
  • Informatica
  • Apache Airflow
  • SSIS
  • AWS Glue
  • Azure Data Factory
  • Matillion
  • Alteryx
  • Terraform
  • ETL
  • GitHub
  • JIRA
  • Databricks
  • Palantir Foundry
  • Windows
  • Linux/macOS

Timeline

Data Engineer

New York life insurance
08.2023 - Current

Cloud Data Engineer

Star Infra IT solutions
08.2022 - 08.2023

Master of Science - Computer Science

Chicago State University
01.2021 - 05.2022

Data Engineer

DXC Technology
08.2018 - 01.2021

Bachelor of Technology - Electronics and Communication Engineering

Mahatma Gandhi University
08.2014 - 05.2018
SHIVAPRASAD SANDRALAData Engineer