Summary
Overview
Work History
Education
Skills
Personal Information
Certification
Timeline
Generic

Srinath Potlapalli

Portland,OR

Summary

Big Data Engineer with 7+ years of experience designing, developing, and maintaining enterprise analytical solutions using big data technologies. Proven data and business expertise in Retail, Finance, and Healthcare domains. Proficient in choosing and evaluating the right technologies needed for building data pipelines from ingestion, curation, and consumption for both batch and streaming use cases on cloud and on-prem environments. Including 5+ Years of experience as a Developer using Big Data Technologies like Aws, Databricks/Spark, Azure, and Hadoop Ecosystems.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

Nike
Beaverton, OR
02.2022 - Current
  • Worked with the Business team, and SMEs across departments to gather the requirements and help them with analytical solutions
  • Played a vital role in design and development for building the common architecture for retail data across the Geo’s
  • Created ETL workflows to extract data from different sources, transform it into a standardized format and load it into a data warehouse for future analysis
  • Developed a common framework using spark to ingest data from different data sources (Teradata to S3 and S3 to Snowflake etc.,) Developed reusable spark scripts and functions for data processing that can be leveraged in different data pipelines
  • Built ETL pipelines on Snowflake and the data products are used by stakeholders for querying and serve as backend objects for visualizations
  • Worked on performance tuning of spark jobs by adjusting the memory parameters and the cluster configuration
  • Implemented data ingestion processes from various data sources such as databases, APIs, and streaming platforms into Amazon S3
  • Migrated data from on-premises Teradata systems to AWS storage(S3) buckets
  • Involved in Databricks migration project
  • And a good understanding of Spark Architecture with Databricks
  • Designed end-to-end ETL workflows within Databricks, integrating various data sources and destinations
  • Responsible for estimating the cluster size, monitoring, and troubleshooting the Spark jobs using the Databricks cluster
  • Integration with Cloud Services: Integrated Databricks with cloud-based storage services like AWS S3 or Azure Blob Storage for data storage and retrieval
  • Worked on transforming data in the Databricks platform to parquet formats for efficient data storage
  • Provisioned and configured Databricks cluster with the appropriate instance types, node configurations, and worker nodes to optimize performance
  • Installed and managed necessary libraries and dependencies for Python and Spark development within the Databricks environment
  • Utilized Databricks notebooks to explore and analyze large-scale datasets using Pyspark and Python
  • Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming the data to uncover insights into customer usage patterns
  • Facilitated collaborative development by reviewing and merging code changes using the Databricks collaboration feature
  • Implemented Databricks Delta Lake for data versioning, ACID transactions, and incremental data processing
  • Used airflow to trigger Databricks jobs
  • Conducted training sessions for team members to familiarize them with Databricks features and best practices
  • Used Sqoop to ingest the data from the Oracle database and store them on S3
  • Worked on ingesting data from JSON, CSV files using spark and EMR and stored the output data in Parquet file format on S3
  • Excellent knowledge of AWS services (S3, EMR, Athena, EC2), Snowflake, and Big Data technologies
  • Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3
  • Worked on ingesting real-time data using Kafka
  • Created data pipelines to use for business reports and processed streaming data by using Kafka on-premises cluster
  • Created different topics for reading the data in Kafka
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka
  • Processed the data from Kafka pipelines from topics and show the real-time streaming in dashboards
  • Designed, and developed Azure (AAS & SSAS) cubes for data visualization
  • Used Visualization tools such as Power View for Excel, and Tableau for visualizing and generating reports
  • Used Airflow for scheduling and orchestration of the data pipelines
  • Developed BMX Jenkins pipelines to deploy the source code
  • Providing knowledge transition to support team.

Data Engineer

Capital One
Richmond, VA
06.2021 - 02.2022
  • Collaborated with Business Analysts, and SMEs across departments to gather business requirements, and identify workable items for further development
  • Interact with Business analysts to translate any new business requirements into technical specifications
  • Worked closely with the business team in the UAT process and assisted them with all the data-related questions
  • Design and implement data storage solutions using Azure services such as Azure SQL Database, Azure Cosmos DB, and Azure Data Lake Storage
  • Developed and maintained data pipelines using Azure Data Factory and Azure Databricks
  • Ingested data from different databases to Azure Data Storage (Azure Data Lake, Azure Storage, Azure SQL) and processed the data in Azure Databricks
  • Created ETL pipelines moving data from Azure Data Storage to Snowflake using Azure Data Factory
  • Developed ETLs using PySpark
  • Used both Data frame API and Spark SQL API
  • Using Spark, performed various transformations and actions in Azure Databricks, and the result data is saved back to ADLS and from there to the target database Snowflake
  • Optimized data processing and storage for performance and cost efficiency
  • Automated monthly data purge processes through Azure Data Lake and Azure Data Factory, resulting in decreased storage costs
  • Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure
  • Created monitors, alarms, and notifications for EC2 hosts using Cloudwatch
  • Updated cloud formation templates with IAM roles for S3 bucket access, security groups, subnet ID, EC2 Instance Types, Ports, and AWS Tags
  • Worked with various AWS Components such as EC2, S3, IAM, VPC, RDS, Route 53, SQS, and SNS
  • Designed several DAGs (Directed Acyclic Graphs) for automating ETL pipelines.

Data Engineer

Nike
Beaverton, OR
08.2018 - 05.2021
  • Worked on designing and developing 5 different flows Point of sales, Store traffic, Labor, customer survey, and Audit data
  • Worked on implementing complex transformations within Spark and Hive for forecasting Nike’s future demand plan using multiple datasets (sales, planning, product, etc.,) within the data lake
  • Responsible for creating Hive tables, loading the structured data resulting from Map Reduce jobs into the tables, and writing Hive queries to further analyze the logs to identify issues and behavioral patterns
  • Worked on installing cluster, commissioning & and decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration
  • Troubleshooting many cloud-related issues such as Data Node down, Network failure, and data block missing
  • Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis
  • Experienced in running Hadoop streaming jobs to process terabytes of XML format data
  • Expertise in performing tuning of Spark Applications for setting the right Batch Interval time
  • Worked with various AWS Components such as EC2, S3, IAM, VPC, RDS, Route 53, SQS, and SNS
  • Created EC2 instances and managed the volumes using EBS
  • Used Perforce version control system and Python Tools for Visual Studio (PTVS) IDE
  • Updated cloud formation templates with IAM roles for S3 bucket access, security groups, subnet ID, EC2 Instance Types, Ports, and AWS Tags
  • Created monitors, alarms, and notifications for EC2 hosts using Cloud Watch
  • Reviewed firewall settings security group and updated on Amazon AWS
  • Worked on Multiple instances, managing the Elastic load balancer, Auto Scaling, and setting the security groups to design a fault-tolerant and highly available system
  • Loading data from large data files into Hive tables and HBase NoSQL databases
  • Proficient with snowflake architecture and concepts
  • Hands-on experience developing front-end web applications
  • Established front-end web application structure and UI layout using HTML, CSS, and JavaScript
  • Created S3 bucket policies, and IAM role-based policies and customized the JSON template
  • Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3).

Data Engineer

Capital Group
Los Angeles, CA
12.2017 - 07.2018
  • Extensively involved in business and functionality requirement analysis, understanding source systems thoroughly by creating the design process flow used for standard Big Data Implementation
  • Participated in code/design analysis, strategy development, and project planning
  • Provided technical support for Data Scientists and Data Analysts
  • Coordinated with the offshore team for deployments of the code in lower environments and production
  • Developed Spark scripts, using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries, and writing data back into the OLTP system
  • Worked on Ingestion, Parsing, and loading the data from CSV and JSON files using Hive and Spark
  • Worked on Docker to build Docker images
  • Facilitated deployment of a multi-clustered environment using AWS EC2 and EMR apart from deploying Dockers for cross-functional deployment
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performed necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS
  • Developed Spark jobs using Python (pyspark) for faster real-time analytics and used Spark SQL for querying
  • Used Python 3.X (NumPy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop a variety of models and algorithms for analytic purposes
  • Worked on web crawling to ingest the data from different sources and save the data in Azure data lake
  • Worked with Hadoop infrastructure to store data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in AWS
  • Used Autosys for scheduling and orchestrating the workflows of different data pipelines
  • Experience with Agile and Scrum Methodologies
  • Involved in designing, creating, and managing Continuous Build and Integration Environments.

Hadoop Developer

Optum
Minnesota, MN
08.2017 - 12.2017
  • Review Source to transformation mapping (STM) documents to understand the functionality and requirements and be involved in gathering business requirements
  • Hands-on experience in designing, developing, and maintaining software solutions in the Hadoop cluster
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Hive and Sqoop
  • Develop a data pipeline using Sqoop and MapReduce to ingest current data and historical data in the data staging area
  • Responsible for defining data flow in the Hadoop ecosystem to different teams
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, and Spark Yarn
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Python
  • Involved in using Sqoop for importing and exporting data between RDBMS and HDFS
  • Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting
  • Generated Custom SQL to verify the dependency for the daily, Weekly, and Monthly jobs
  • Good experience in testing different data pipelines and ensuring the highest data quality
  • Worked on building aggregate tables using Hive and Spark which feeds Dashboards to satisfy different Business KPIs
  • Used Visualization tools such as Power View for Excel to generate reports
  • Unit Test the deliverables and prepare the Test Result Summary
  • Used Autosys and Oozie for scheduling the Hive, PIG, Spark, and MapReduce jobs
  • Used Autosys job scheduler end-to-end data processing pipelines and scheduling the workflows.

ETL Developer

Net Technologies
India
06.2015 - 06.2016
  • Involved in requirement analysis, design, development, testing, and documentation
  • Assisted in building the ETL source to Target specification documents by understanding the business requirements
  • Created Technical design documents to list the extract, transform, and load techniques and business rules
  • Created tables, indexes, views, sequences, synonyms, tablespaces, nested tables, and database links using SQL and PL/SQL
  • Building a data warehouse by integrating the data marts using complex queries
  • Hands-on experience in ETL tools i.e
  • (SSIS, Informatica, Data Stage) Using Informatica Designer, developed mappings, which populated the data into the target
  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull requests
  • Extensively worked on converting ORACLE scripts into Teradata scripts
  • Used Git Hub for version control with colleagues
  • Automated the Informatica jobs using UNIX shell scripting.

Education

Bachelor of Technology - Mechanical Engineering

Vellore Institute of Technology

Master of Science - Technology

University of Central Missouri

Skills

  • Hadoop
  • AWS
  • Azure
  • Databricks
  • Spark
  • Kafka
  • Cloudera
  • Shell
  • Python
  • Snowflake
  • Power BI
  • Tableau
  • Azure Cubes
  • Data Modeling
  • API Development
  • Data Warehousing
  • Machine Learning
  • NoSQL Databases
  • Big Data Technologies
  • SQL and Databases
  • Data Analysis
  • Relational Databases

Personal Information

Title: Sr. Data Engineer

Certification

  • Aws associate certification
  • Aws professional certification

Timeline

Senior Data Engineer

Nike
02.2022 - Current

Data Engineer

Capital One
06.2021 - 02.2022

Data Engineer

Nike
08.2018 - 05.2021

Data Engineer

Capital Group
12.2017 - 07.2018

Hadoop Developer

Optum
08.2017 - 12.2017

ETL Developer

Net Technologies
06.2015 - 06.2016

Bachelor of Technology - Mechanical Engineering

Vellore Institute of Technology

Master of Science - Technology

University of Central Missouri
Srinath Potlapalli