Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

Sainath Alampally

Summary

Data Engineer with over 3+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler. To seek and maintain full-time position that offers professional challenges utilizing interpersonal skills, excellent time management and problem-solving skills.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer

Wells Fargo
08.2022 - Current
  • Proficient in Spark and Scala for building scalable data pipelines and performing complex transformations, optimizing data processing performance through distributed computing capabilities
  • Experienced with big data tools such as Hadoop Distributed File System (HDFS) and Apache Hive for storing, querying, and analyzing large volumes of data, optimizing performance and scalability
  • Implemented performance optimization techniques in Hive including distributed cache, partitioning, and bucketing, enhancing query performance for large-scale data analysis
  • Leveraged Spark RDDs and Scala to transform Hive/SQL queries into efficient Spark transformations, optimizing data processing performance and leveraging distributed computing capabilities
  • Developed data quality scripts using SQL and Hive to ensure successful data load and maintain high data integrity standards
  • Implemented robust data loading mechanisms into relational databases such as SQL, NoSQL databases like MongoDB, and enterprise data warehouses like Teradata, ensuring data integrity and optimal performance
  • Designed and implemented extraction modules within a framework to seamlessly retrieve data from diverse sources including files and tables from databases, applying framework functionalities for efficient ETL processes
  • Skilled in utilizing Autosys for job scheduling and automation, ensuring timely execution of data pipelines and maintenance of data workflows
  • Proficient in utilizing JFrog Artifactory for managing and storing artifacts within the CI/CD pipeline, ensuring reliable artifact management and version control across development, testing, and production environments
  • Proficient in shell scripting for the execution and management of various frameworks and technologies, enhancing efficiency and reliability in software deployment and operations
  • Contributed to the optimization of data processing workflows by implementing parallel processing techniques and leveraging cluster computing frameworks in Spark, resulting in significant improvements in job execution time and resource utilization
  • Utilized ScalaTest framework to conduct comprehensive unit and integration testing, ensuring thorough coverage of methods and functionalities to uphold code reliability and functionality standards
  • Implemented SonarQube for test coverage analysis and reporting, ensuring code quality and adherence to best practices throughout the development lifecycle
  • Collaborated with cross-functional teams to gather requirements, troubleshoot issues, and optimize data workflows, contributing to overall project success and business objectives.

Data Engineer

Truist Financial
05.2020 - 07.2021
  • Worked on creating the notebooks for moving data from raw to stage and then to curated zones using Azure data bricks
  • Hands on experience working with Delta tables
  • Built the pipelines to copy the data from source to destination in Azure Data Factory
  • Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and 'big data' technologies like Hadoop Hive, Azure Data Lake storage
  • Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs
  • Extensively worked with Spark-SQL context to create data frames and datasets to pre-process the model data
  • Written Pyspark and spark SQL transformation in Azure Databricks to perform complex transformations for business rule implementation
  • Implemented Spark Kafka streaming to pick up the data from Kafka and send to Spark pipeline
  • Worked on performance tuning of the spark applications for faster SLA's and cost savings
  • Created an Architectural solution that leverages the best Azure analytics tools to solve our specific need for business use cases
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse
  • Created and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks
  • Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS, and Data API's
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters
  • Worked on Unit testing, Code reviews and Production deployment with offshore /onshore resources.

Data Analyst Intern

Sutherland Global Services
01.2019 - 04.2020
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding and recommended structural changes and enhancements to systems and Databases
  • Coordinate with the business users in providing appropriate, effective, and efficient way to design the new reporting needs based on the user with the existing functionality
  • Worked closely with Business team on automating Metrics dashboard generation for project lifecycle - Excel VBA and Tableau
  • Worked extensively on creating tables, views, and SQL queries in MS SQL Server
  • Worked with internal architects and assisting in the development of current and target state data architectures
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior
  • Used Tableau and SAS analytics to provide end users analytical reports
  • Remain knowledgeable in all areas of business operations to identify systems needs and requirements.

Education

Master of Science - Computer Science

University of Central Missouri
Warrensburg, MO
12.2022

Bachelor of Technology - Electronics And Communication Engineering

SRM Institute of Science And Technology
Tamil Nadu
05.2020

Skills

  • Databases MySQL, Oracle, snowflake, Teradata, PostgreSQL, DB2, MS-SQL Server, HBASE
  • Big Data Ecosystem HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka, Flume, Cassandra, Impala, Oozie, MapReduce
  • Big Data Tools Databricks, Data lake, HDFS, Hive, Sqoop, MapReduce, Oozie, Linux, PuTTY, Bash Shell, Unix, and Tableau
  • Cloud Technologies Microsoft Azure and AWS
  • Programming / Query Languages SQL, Python Programming, NoSQL, Spark, SAS, PL/SQL, Linux shell scripts
  • ETL development, Data Warehousing, Data Modeling, Data Migration, Data Governance, NoSQL Databases, SQL and Databases, and Data Analysis

Certification

Microsoft Azure

Timeline

Data Engineer

Wells Fargo
08.2022 - Current

Data Engineer

Truist Financial
05.2020 - 07.2021

Data Analyst Intern

Sutherland Global Services
01.2019 - 04.2020

Master of Science - Computer Science

University of Central Missouri

Bachelor of Technology - Electronics And Communication Engineering

SRM Institute of Science And Technology
Sainath Alampally