Summary
Overview
Work History
Education
Skills
Publications
Timeline
Generic

Sai Nadipineni

Cloud Data Engineer
Dallas,TX

Summary

Experienced Senior Data Engineer with 10+ years in Big Data, Cloud, and distributed systems. Proven expertise in building scalable ETL pipelines, real-time data processing, and cloud-native solutions using Spark, Databricks, Snowflake, and Kafka. Worked with leading clients across retail, healthcare, telecom, and automotive industries, including Nike and General Motors. Adept at leading cross-functional teams and delivering enterprise-grade data solutions.

Overview

11
11
years of professional experience
4027
4027
years of post-secondary education

Work History

Senior Data Engineer

ELXR Technologies
10.2024 - Current

Client: Nike, Portland, OR

  • Design and build scalable data pipelines using Databricks and Apache Spark.
  • Develop and maintain data warehousing solutions using Snowflake.
  • Utilize Apache Airflow for orchestrating and scheduling ETL workflows.
  • Create interactive data visualizations and dashboards using Tableau.
  • Optimize Python scripts using DataFrames, SQL, DataSets, and RDD/MapReduce in Spark.
  • Enhance data pipelines and ETL processes with Hive, Spark SQL, and PySpark.
  • Configure and performance-tune Amazon EMR clusters for ETL processes.
  • Implement data quality checks and ensure compliance with governance protocols.
  • Integrate data from various sources into data warehouses and data lakes.
  • Monitor data pipeline performance and troubleshoot issues to maintain operations.
  • Automate repetitive tasks using scripting languages to improve efficiency.
  • Ensure data security, privacy, and compliance with relevant regulations.
  • Collaborate with cross-functional teams to deliver effective data solutions.
  • Document data workflows and maintain pipeline configurations for scalability.
  • Environment: Spark, Databricks, Snowflake, Airflow, Scala, Kafka, GIT, Map Reduce, HDFS, SparkSQL, Mac OS, Linux, Jenkins, Jira, Agile.

Data Engineer II

ImageVision.ai
05.2021 - 10.2024

Client: Nike, Portland, OR

  • Design, architect, and develop innovative data solutions to meet the unique needs of the business.
  • Comprehend business requirements and make crucial decisions to design and implement solutions in collaboration with team members.
  • Collaborate closely with external teams and business clients to address issues, devising solutions that amplify customer experience and contribute to the organization’s growth.
  • Contribute to requirement gathering efforts and translating them into meticulous design documents, thereby laying the foundation for successful data solutions.
  • Take responsibility for understanding system requirements and translating user stories into code using the most relevant frameworks and technologies.
  • Ensure the seamless deployment of your crafted solutions onto servers, rigorously subjecting them to unit and regression tests, and promptly resolving any issues along the way.
  • Develop technical and user documentation for applications, as well as product documentation and training materials for different functionalities.
  • Lead and mentor a team of junior data engineers, overseeing their projects from inception to completion, ensuring quality design and implementation.
  • Engage in the full spectrum of tasks encompassing design, development, testing, deployment, and day-to-day operational monitoring, all vital for maintaining a robust Data Engineering platform.
  • Implement and enforce data governance policies, including data lineage tracking, auditing, and access controls. Ensure that data engineering solutions comply with regulatory requirements and industry standards to maintain data integrity and security.
  • Proactively monitor system performance and identify optimization opportunities. Apply techniques like query optimization, data partitioning, and resource allocation adjustments to enhance the efficiency and speed of data processing workflows.
  • Environment: Spark, Databricks, Snowflake, Airflow, Scala, Kafka, GIT, Map Reduce, HDFS, SparkSQL, Mac OS, Linux, Jenkins, Jira, Agile.

Technical Manager

Advansoft International Ltd.
12.2018 - 04.2021
  • Responsible for guiding the full lifecycle of a Big Data Solution (Hadoop) solution, including requirements analysis, technical architecture design, application design and development, testing, and deployment.
  • Strategize about Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
  • Contribute to teams effort in analyzing complex data sets and troubleshooting the failed jobs.
  • Involved in data validations and quality control checks.
  • Adhered to the agile process using Scrum methodology.
  • Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
  • Performed various benchmarking steps to optimize the performance of Spark jobs and thus improve the overall processing.
  • Developed documentation for new and existing programs, design specific enhancements to application.
  • Collaborated with the business/IT stakeholders and other involved teams to understand requirements and build data pipelines.
  • Coordinated with the testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.
  • Environment: Hadoop, Spark, Scala, Kafka, GIT, Map Reduce, HDFS, SparkSQL, Mac OS, Linux, Jenkins, Jira, Agile.

Software Developer

Vintech Solutions Inc.
07.2014 - 10.2018

Client: 84.51° – Cincinnati, OH

  • Designing, developing data loading strategies and involving in transformations of data for business to analyze large datasets in various formats.
  • Analyze requirements and users’ needs to design, test, and develop the required software.
  • Optimizing the interface between databases, both traditional and custom, and complex analytics, which runs efficiently over vast quantities of data.
  • Creating Real time workflow using Apache NiFi for Pulling JSON data from various sources into Hadoop through Kafka consumer API and batch workflow like pulling data from SFTP server into HDFS using Apache NiFi.
  • Implementing Spark-Structured-Streaming using Scala to perform necessary transformations and data models, which gets the data from Kafka in real time and persisting it into HDFS.
  • Discovering, creating, designing and leading the implementations of new algorithms using functional and object-oriented programming languages.
  • Working closely with cloud infrastructure teams to troubleshoot complex issues.
  • Developing creative methods to address new and complex problems.
  • Performing data analysis as required by client projects using Hive and SparkSQL.
  • Integrating Kerberos and OpenSSL certificates with Hadoop cluster to make it more strong and secure from unauthorized access.
  • Implementing project using agile software development methodology with the help of JIRA ticketing tool and Scrum meetings.
  • Strategize about storing, retrieving, analyzing and manipulating huge customer datasets for development, execution of software as per customer requirements.
  • Familiar with real time data streaming analytics tools like Kafka, NiFi, Spark etc.
  • Develop and direct programming framework testing and approval techniques, security.
  • Give specialized inputs and recommend software upgrade for clients' current projects and frameworks.
  • Environment: Spark, PySpark, Spark-structured-streaming, Apache NiFi, Shell Scripting, SparkSQL, Kafka, Git, Jenkins, Google cloud.


Client: IQVIA – Durham, NC

  • Developed end-to-end scalable distributed data pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HDFS with Apache Spark using Scala.
  • Used Spark-Structured-Streaming to perform necessary transformations and data model, which gets the data from Kafka in real time and Persists into HDFS.
  • Used Spark SQL to process huge amount of structured data and implemented Spark RDD transformations, Actions, Data Frames, Case classes for the required input data.
  • Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
  • Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
  • Created Real time workflow using Apache NiFi for Pulling JSON data from Twitter into Hadoop through Kafka consumer API.
  • Also created Batch Data Ingestion workflow like Pulling data from SFTP server into HDFS using Apache NiFi.
  • Experienced on using different formats like Avro, Parquet and ORC.
  • Setting up and maintain the continuous integration server and Live production server using Jenkins.
  • Developed various scripts using Shell Script.
  • Strong problem-solving experience on test, identify, address, debug and then resolve technical issues that affect the integrity of the application.
  • Work collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Experience with CDH5 distribution and Cloudera Manager to manage and monitor Hadoop clusters.
  • Experience in manage and reviewing Hadoop log files.
  • Used Kerberos and integrated it to Hadoop cluster to make it more strong and secure from unauthorized access.
  • Experience using Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Implemented the project by using Agile Methodology and attended Scrum Meetings daily.
  • Administration Knowledge on creation of clusters using Nifi, Kafka clusters and secured them using Cloudera Manager and Java OpenSSL certificates.
  • Environment: CDH5, HDFS, MapReduce, Hive, Spark, Spark-structured-streaming, Apache NiFi, Shell Scripting, SparkSQL, Kafka, Bit-bucket, Jenkins.


Client: Charter Communications – Denver, CO

  • Developed real-time pipeline for streaming data using Kafka and Zookeeper, which will arrange the events into Topics based on meta-rules.
  • Developed Apache Spark pipelines in Scala for data processing, data validation and data integrity.
  • Worked on Spark streaming using Scala based API to collect streaming data of live Kafka events.
  • HDFS will receive objects from Spark in Parquet format, where we store it for queries from downstream reporting systems using Spark SQL (RDDs and Data frames).
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developing Hive User Defined Functions in Java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues.
  • Developed custom mappers in the python script and Hive UDFs and UDAFs based on the given requirement.
  • Used Maven and Jenkins for compiling, building and packing the applications.
  • Created ETL/Talend jobs both design and code to process data to target databases
  • Azkaban and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
  • Processed business requests in ad-hoc of loading data in to production database using Talend Jobs.
  • Environment: Hadoop, Map Reduce, HDFS, Mesos, Spark, Scala, GIT, Hive, Druid, SparkSQL, CentOS, Mac OS, Linux, Jenkins, Maven, puppet.


Client: Cerner Corporation – Kansas City, MO

  • Good experience with NoSQL database HBase and MongoDB.
  • Strong experience in developing, debugging and tuning Map Reduce jobs in Hadoop environment.
  • Experience in writing ETL jobs using PIG Latin and HIVE QL.
  • Responsible for guiding the full lifecycle of a Big Data Solution (Hadoop) solution, including requirements analysis, technical architecture design, application design and development, testing, and deployment.
  • Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases like MongoDB and HBase to find which one of them better suites the current requirements.
  • Writing, testing, and running MapReduce pipelines using Apache Crunch.
  • Created MapReduce pipelines composed with many user-defined functions using Apache Crunch.
  • Joining and Data Aggregation using Apache Crunch.
  • Developed multi-core CPU pipeline applications to analyze large data sets.
  • Created custom MapReduce programs using Hadoop for big data processing.
  • Environment: Hadoop, Map Reduce, HDFS, HBase, Crunch, GIT, Hive, Solr, REST, Cloudera CDH4, Mac OS, Linux, Jenkins, Maven, Chef.


Client: General Motors (On-Star) – Detroit, MI

  • As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie and Sqoop on cluster scaling from 4 nodes in development environment to 12 nodes in pre-production (pilot) stage and up to 24 nodes in production.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion using IBM Infosphere Streams and Flume.
  • Hands-on experience with IBM Big Data product offerings such as IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL.
  • Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API using languages like Java(Spring), Scala and Akka (Asynchronous programming Framework)
  • Involved in Developing Assert Tracking project where we use to collect real-time vehicle location data using IBM streams from JMS queue and processed that data in Vehicle Tracking using ESRI – GIS Mapping Software, Scala and Akka Actor Model.
  • Involved in developing web-services using REST, HBase Native API and BigSQL Client to query data from HBase.
  • Experienced in Developing Hive queries in BigSQL Client for various use cases.
  • Involved in developing few Shell Scripts and automated them using CRON job scheduler.
  • Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases like MongoDB and HBase to find which one of them better suites the current requirements.
  • Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL, Java.



Education

Software Engineering

University of Houston – Clear Lake
Houston, TX
05.2014

Electronics and Communication Engineering

Koneru Lakshmaiah College of Engineering
05.2012

Skills

Big data/Hadoop Ecosystem: Spark, Databricks, Kafka, SparkSQL, Nifi, HDFS, Airflow, Snowflake, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie

Java / J2EE Technologies: Core Java, JDBC, XML, REST, SOAP

Programming Languages: C, C, Java, Scala, SQL, PL/SQL, Linux shell scripts

NoSQL Databases: Cassandra, HBase

Database: Oracle 11g/10g, DB2

Web Technologies: HTML, XML, JDBC, REST

Frameworks: MVC, Hibernate 3, Spring 3

Tools Used: Eclipse, IntelliJ, GIT, Putty, Winscp

Operating System: Ubuntu (Linux), Windows , Mac OS

Testing: Hadoop Testing, Hive Testing, Quality Center (QC)

Monitoring and Reporting tools: Ganglia, Nagios, Custom Shell scripts

Publications

“ Speech Enhancement Using A Recursive Filter ” - International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, June-July 2012, pp.631-635

Timeline

Senior Data Engineer

ELXR Technologies
10.2024 - Current

Data Engineer II

ImageVision.ai
05.2021 - 10.2024

Technical Manager

Advansoft International Ltd.
12.2018 - 04.2021

Software Developer

Vintech Solutions Inc.
07.2014 - 10.2018

Software Engineering

University of Houston – Clear Lake

Electronics and Communication Engineering

Koneru Lakshmaiah College of Engineering
Sai NadipineniCloud Data Engineer