Summary

Overview

Work History

Education

Skills

Personal Information

Certification

Timeline

Srinath Potlapalli

Portland,OR

Summary

Big Data Engineer with 7+ years of experience designing, developing, and maintaining enterprise analytical solutions using big data technologies. Proven data and business expertise in Retail, Finance, and Healthcare domains. Proficient in choosing and evaluating the right technologies needed for building data pipelines from ingestion, curation, and consumption for both batch and streaming use cases on cloud and on-prem environments. Including 5+ Years of experience as a Developer using Big Data Technologies like Aws, Databricks/Spark, Azure, and Hadoop Ecosystems.

Overview

years of professional experience

Certification

Work History

Senior Data Engineer

Nike

Beaverton, OR

02.2022 - Current

Worked with the Business team, and SMEs across departments to gather the requirements and help them with analytical solutions
Played a vital role in design and development for building the common architecture for retail data across the Geo’s
Created ETL workflows to extract data from different sources, transform it into a standardized format and load it into a data warehouse for future analysis
Developed a common framework using spark to ingest data from different data sources (Teradata to S3 and S3 to Snowflake etc.,) Developed reusable spark scripts and functions for data processing that can be leveraged in different data pipelines
Built ETL pipelines on Snowflake and the data products are used by stakeholders for querying and serve as backend objects for visualizations
Worked on performance tuning of spark jobs by adjusting the memory parameters and the cluster configuration
Implemented data ingestion processes from various data sources such as databases, APIs, and streaming platforms into Amazon S3
Migrated data from on-premises Teradata systems to AWS storage(S3) buckets
Involved in Databricks migration project
And a good understanding of Spark Architecture with Databricks
Designed end-to-end ETL workflows within Databricks, integrating various data sources and destinations
Responsible for estimating the cluster size, monitoring, and troubleshooting the Spark jobs using the Databricks cluster
Integration with Cloud Services: Integrated Databricks with cloud-based storage services like AWS S3 or Azure Blob Storage for data storage and retrieval
Worked on transforming data in the Databricks platform to parquet formats for efficient data storage
Provisioned and configured Databricks cluster with the appropriate instance types, node configurations, and worker nodes to optimize performance
Installed and managed necessary libraries and dependencies for Python and Spark development within the Databricks environment
Utilized Databricks notebooks to explore and analyze large-scale datasets using Pyspark and Python
Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming the data to uncover insights into customer usage patterns
Facilitated collaborative development by reviewing and merging code changes using the Databricks collaboration feature
Implemented Databricks Delta Lake for data versioning, ACID transactions, and incremental data processing
Used airflow to trigger Databricks jobs
Conducted training sessions for team members to familiarize them with Databricks features and best practices
Used Sqoop to ingest the data from the Oracle database and store them on S3
Worked on ingesting data from JSON, CSV files using spark and EMR and stored the output data in Parquet file format on S3
Excellent knowledge of AWS services (S3, EMR, Athena, EC2), Snowflake, and Big Data technologies
Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3
Worked on ingesting real-time data using Kafka
Created data pipelines to use for business reports and processed streaming data by using Kafka on-premises cluster
Created different topics for reading the data in Kafka
Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka
Processed the data from Kafka pipelines from topics and show the real-time streaming in dashboards
Designed, and developed Azure (AAS & SSAS) cubes for data visualization
Used Visualization tools such as Power View for Excel, and Tableau for visualizing and generating reports
Used Airflow for scheduling and orchestration of the data pipelines
Developed BMX Jenkins pipelines to deploy the source code
Providing knowledge transition to support team.

Data Engineer

Capital One

Richmond, VA

06.2021 - 02.2022

Collaborated with Business Analysts, and SMEs across departments to gather business requirements, and identify workable items for further development
Interact with Business analysts to translate any new business requirements into technical specifications
Worked closely with the business team in the UAT process and assisted them with all the data-related questions
Design and implement data storage solutions using Azure services such as Azure SQL Database, Azure Cosmos DB, and Azure Data Lake Storage
Developed and maintained data pipelines using Azure Data Factory and Azure Databricks
Ingested data from different databases to Azure Data Storage (Azure Data Lake, Azure Storage, Azure SQL) and processed the data in Azure Databricks
Created ETL pipelines moving data from Azure Data Storage to Snowflake using Azure Data Factory
Developed ETLs using PySpark
Used both Data frame API and Spark SQL API
Using Spark, performed various transformations and actions in Azure Databricks, and the result data is saved back to ADLS and from there to the target database Snowflake
Optimized data processing and storage for performance and cost efficiency
Automated monthly data purge processes through Azure Data Lake and Azure Data Factory, resulting in decreased storage costs
Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure
Created monitors, alarms, and notifications for EC2 hosts using Cloudwatch
Updated cloud formation templates with IAM roles for S3 bucket access, security groups, subnet ID, EC2 Instance Types, Ports, and AWS Tags
Worked with various AWS Components such as EC2, S3, IAM, VPC, RDS, Route 53, SQS, and SNS
Designed several DAGs (Directed Acyclic Graphs) for automating ETL pipelines.

Data Engineer

Nike

Beaverton, OR

08.2018 - 05.2021

Worked on designing and developing 5 different flows Point of sales, Store traffic, Labor, customer survey, and Audit data
Worked on implementing complex transformations within Spark and Hive for forecasting Nike’s future demand plan using multiple datasets (sales, planning, product, etc.,) within the data lake
Responsible for creating Hive tables, loading the structured data resulting from Map Reduce jobs into the tables, and writing Hive queries to further analyze the logs to identify issues and behavioral patterns
Worked on installing cluster, commissioning & and decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration
Troubleshooting many cloud-related issues such as Data Node down, Network failure, and data block missing
Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis
Experienced in running Hadoop streaming jobs to process terabytes of XML format data
Expertise in performing tuning of Spark Applications for setting the right Batch Interval time
Worked with various AWS Components such as EC2, S3, IAM, VPC, RDS, Route 53, SQS, and SNS
Created EC2 instances and managed the volumes using EBS
Used Perforce version control system and Python Tools for Visual Studio (PTVS) IDE
Updated cloud formation templates with IAM roles for S3 bucket access, security groups, subnet ID, EC2 Instance Types, Ports, and AWS Tags
Created monitors, alarms, and notifications for EC2 hosts using Cloud Watch
Reviewed firewall settings security group and updated on Amazon AWS
Worked on Multiple instances, managing the Elastic load balancer, Auto Scaling, and setting the security groups to design a fault-tolerant and highly available system
Loading data from large data files into Hive tables and HBase NoSQL databases
Proficient with snowflake architecture and concepts
Hands-on experience developing front-end web applications
Established front-end web application structure and UI layout using HTML, CSS, and JavaScript
Created S3 bucket policies, and IAM role-based policies and customized the JSON template
Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3).

Data Engineer

Capital Group

Los Angeles, CA

12.2017 - 07.2018

Extensively involved in business and functionality requirement analysis, understanding source systems thoroughly by creating the design process flow used for standard Big Data Implementation
Participated in code/design analysis, strategy development, and project planning
Provided technical support for Data Scientists and Data Analysts
Coordinated with the offshore team for deployments of the code in lower environments and production
Developed Spark scripts, using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries, and writing data back into the OLTP system
Worked on Ingestion, Parsing, and loading the data from CSV and JSON files using Hive and Spark
Worked on Docker to build Docker images
Facilitated deployment of a multi-clustered environment using AWS EC2 and EMR apart from deploying Dockers for cross-functional deployment
Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performed necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS
Developed Spark jobs using Python (pyspark) for faster real-time analytics and used Spark SQL for querying
Used Python 3.X (NumPy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop a variety of models and algorithms for analytic purposes
Worked on web crawling to ingest the data from different sources and save the data in Azure data lake
Worked with Hadoop infrastructure to store data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in AWS
Used Autosys for scheduling and orchestrating the workflows of different data pipelines
Experience with Agile and Scrum Methodologies
Involved in designing, creating, and managing Continuous Build and Integration Environments.

Hadoop Developer

Optum

Minnesota, MN

08.2017 - 12.2017

Review Source to transformation mapping (STM) documents to understand the functionality and requirements and be involved in gathering business requirements
Hands-on experience in designing, developing, and maintaining software solutions in the Hadoop cluster
Worked on analyzing Hadoop cluster and different big data analytic tools including Hive and Sqoop
Develop a data pipeline using Sqoop and MapReduce to ingest current data and historical data in the data staging area
Responsible for defining data flow in the Hadoop ecosystem to different teams
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, and Spark Yarn
Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Python
Involved in using Sqoop for importing and exporting data between RDBMS and HDFS
Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting
Generated Custom SQL to verify the dependency for the daily, Weekly, and Monthly jobs
Good experience in testing different data pipelines and ensuring the highest data quality
Worked on building aggregate tables using Hive and Spark which feeds Dashboards to satisfy different Business KPIs
Used Visualization tools such as Power View for Excel to generate reports
Unit Test the deliverables and prepare the Test Result Summary
Used Autosys and Oozie for scheduling the Hive, PIG, Spark, and MapReduce jobs
Used Autosys job scheduler end-to-end data processing pipelines and scheduling the workflows.

ETL Developer

Net Technologies

India

06.2015 - 06.2016

Involved in requirement analysis, design, development, testing, and documentation
Assisted in building the ETL source to Target specification documents by understanding the business requirements
Created Technical design documents to list the extract, transform, and load techniques and business rules
Created tables, indexes, views, sequences, synonyms, tablespaces, nested tables, and database links using SQL and PL/SQL
Building a data warehouse by integrating the data marts using complex queries
Hands-on experience in ETL tools i.e
(SSIS, Informatica, Data Stage) Using Informatica Designer, developed mappings, which populated the data into the target
Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull requests
Extensively worked on converting ORACLE scripts into Teradata scripts
Used Git Hub for version control with colleagues
Automated the Informatica jobs using UNIX shell scripting.

Education

Bachelor of Technology - Mechanical Engineering

Vellore Institute of Technology

Master of Science - Technology

University of Central Missouri

Skills

Hadoop
AWS
Azure
Databricks
Spark
Kafka
Cloudera
Shell
Python
Snowflake
Power BI

Tableau
Azure Cubes
Data Modeling
API Development
Data Warehousing
Machine Learning
NoSQL Databases
Big Data Technologies
SQL and Databases
Data Analysis
Relational Databases

Personal Information

Title: Sr. Data Engineer

Certification

Aws associate certification
Aws professional certification

Timeline

Senior Data Engineer

Nike

02.2022 - Current

Data Engineer

Capital One

06.2021 - 02.2022

Data Engineer

Nike

08.2018 - 05.2021

Data Engineer

Capital Group

12.2017 - 07.2018

Hadoop Developer

Optum

08.2017 - 12.2017

ETL Developer

Net Technologies

06.2015 - 06.2016

Bachelor of Technology - Mechanical Engineering

Vellore Institute of Technology

Master of Science - Technology

University of Central Missouri

Srinath Potlapalli

Summary

Overview

Work History

Senior Data Engineer

Data Engineer

Data Engineer

Data Engineer

Hadoop Developer

ETL Developer

Education

Bachelor of Technology - Mechanical Engineering

Master of Science - Technology

Skills

Personal Information

Certification

Timeline

Senior Data Engineer

Data Engineer

Data Engineer

Data Engineer

Hadoop Developer

ETL Developer

Bachelor of Technology - Mechanical Engineering

Master of Science - Technology

Similar Profiles

Rosa KimRosa Kim

Hailey ColeHailey Cole

Tyler LewisTyler Lewis

Ashley DanielsAshley Daniels

Manar Al-MasriManar Al-Masri