Summary

Overview

Work History

Education

Skills

Timeline

SAI RAM BOLUGODDU

Irving,TX

Summary

8+ years of strong experience in Application Development using Pyspark,Java, Python, Scala and R & in depth understanding of Distributed Systems Architecture and Parallel Processing Frameworks. Strong experience using pyspark,HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, and HBase. Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance. Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features. Experience in developing Spark Applications using Spark RDD, Spark-SQL and Dataframe APIs. Worked with real-time data processing and streaming techniques using Spark streaming and Kafka. Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop. Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries. Significant experience writing custom UDFs in Hive and custom Input Formats in MapReduce. Involved in creating Hive tables, loading with data and writing Hive ad-hoc queries that will run internally in MapReduce and TEZ, Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing, Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data. Strong understanding of real time streaming technologies Spark and Kafka. Knowledge of job work flow management and coordinating tools like Oozie. Strong experience building end to end data pipelines on Hadoop platform. Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase. Strong understanding of Logical and Physical database models and entity-relationship modeling. Experience with Software development tools such as JIRA, Play, GIT. Good understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables. Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data. Strong understanding of Java Virtual Machines and multi-threading process. Experience in writing complex SQL queries, creating reports and dashboards. Proficient in using Unix based Command Line Interface, Expertise in handling ETL tools like Informatica. Excellent analytical, communication and interpersonal skills. Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).

Overview

years of professional experience

Work History

Senior Data Engineer & Data Scientist

Citi Bank

12.2022 - Current

Citi Bank is a financial service that enables growth and economic progress
Citi, the global bank an institution connecting millions of people across hundreds of countries and cities
Citibank is the consumer division of the financial services multinational Citigroup
Citibank provides credit cards, mortgages, personal loans, commercial loans, and lines of credit
I work for an Analytical team to analyze call routing data through Third party companies on the network and cost charged in total
My job is to develop reports and analyze costs related to LNP, LCR using Pyspark , mongo db, mySQL and Azure cloud HDInsight
I worked on improve performance of the existing jobs
Develop a framework to organize the files and maintain logs
Create jobs and run with jenkins
Work with pySpark to scripts on Azure
Designed and implemented an ETL framework to load data from multiple sources into Azure HDinsight
Work with azure data flow to ingest streaming data
Utilized Databricks through web and CLI
Utilize Databricks File System API’s for implementing data ingestion pipelines
Worked on Batch data of different granularity ranging from hourly, Daily to weekly and monthly
Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows Create and monitor the performance of Azure clusters on databricks
Write SQLs scripts and sparkSQL scripts TDCH scripts for full and incremental refresh of Hadoop tables
Optimizing Hive queries by parallelizing with portioning and bucketing
Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC
Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs Designed and published visually rich and intuitive Stream sets pipelines to migrate data Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager Used Agile Scrum methodology/ Scrum Alliance for development Environment: Hadoop, HDFS, Azure, Databricks, Python, Kafka, MapReduce, YARN, Spark, Hive, Scala, MySQL.

Senior Data Engineer

Capital Group

02.2021 - 12.2022

Capital Group is an American financial services company and manage equities through three investment groups that make investment and proxy voting decisions independently has been singularly focused on delivering superior results for long-term investors using high-conviction portfolios, rigorous research, and individual accountability
Today, Capital Group manages more than $1.7 trillion in equity and fixed income assets for millions of individual and institutional investors
Develop Shell script that reads Json files and apply it to Sqoop and Hive
Work with pySpark to migrate Fixed width, ORC, csv etc files
Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata
Utilized SQOOP, ETL and Hadoop File System API’s for implementing data ingestion pipelines
Worked on Batch data of different granularity ranging from hourly, Daily to weekly and monthly
Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari, spark and Work with stream sets and develop pipelines using streamsets
Developing and writing SQLs and stored procedures in Teradata
Loading data into snow flake and writing Snow SQLs scripts TDCH scripts for full and incremental refresh of Hadoop tables
Optimizing Hive queries by parallelizing with portioning and bucketing
Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC
Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs Designed and published visually rich and intuitive Stream sets pipelines to migrate data Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager Used Agile Scrum methodology/ Scrum Alliance for development Environment: Hadoop, HDFS, AWS, Vertica, Scala, Kafka, MapReduce, YARN, Spark, Hive, Scala, MySQL, Kerberos, Maven, Stream sets.

Sr. Hadoop/Data Engineer

Energy Transfer

02.2019 - 01.2021

Energy Transfer owns and operates one of the largest and most diversified portfolios of energy assets in North America, with a strategic footprint in all the major U.S
Production basins
Energy Transfer is a publicly traded limited partnership with core operations that include complementary natural gas midstream, intrastate and interstate transportation and storage assets; crude oil, natural gas liquids (NGL) and refined product transportation and terminal ling assets; and NGL fractionation
Setting up Data Lake in google cloud using Google cloud storage, Big Query, and Big Table
Planning and design of data warehouse in STAR schema
Designing structure of tables and documenting it
Developing scripts in Big Query and connecting it to reporting tools
Designed and implemented end to end big data platform on Teradata Appliance Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 Using Hadoop spark
Involvement developing architecture solution of the project to migrate data
Developed Python, Bash scripts to automate and provide Control flow
Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI
Work with Pyspark to perform ETL and generate reports
Writing regression SQL to merge the validated Data into Prod environment
Develop Python, PySpark , Bash scripts logs to Transform, and Load data across on premise and cloud platform
Write UDFs in Hadoop Pyspark to perform transformations and loads
Use NIFI to load data into HDFS as ORC files
Writing TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster
Working with ORC, AVRO and Json file formats
And create external tables and query on top of these files Using Big Query
Working with google cloud storage
Research and development of strategies to minimize the cost in google cloud
Using Apache solar for search operations on Brazil Walmart data
Apache slor to find legal documents and text search so that viewers can find related data Creating indexes to findf the documents
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala
Working with multiple sources
Migrating tables from Teradata and DB2 to Hadoop cluster
Migrating processed ready tables from Hadoop and Google Cloud Storage using Aorta Framework developed by Walmart store Inc
Source Analysis, Tracing back the sources of the data and finding its roots though Teradata, DB2 etc
Identifying the jobs that load the source tables and documenting it
Being an active part of Agile Scrum process with Sprints of 2 weeks
Working With Jira, Microsoft planner to track the progress of the project.

Senior Data Engineer

Wipro limited

01.2016 - 02.2019

Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients’ most complex digital transformation needs
Leveraging our holistic portfolio of capabilities in consulting, design, engineering, and operations, we help clients realize their boldest ambitions and build future-ready, sustainable businesses
With over 240,000 employees and business partners across 66 countries, we deliver on the promise of helping our customers, colleagues, and communities, to thrive in an ever-changing world
Developed Hive, Bash scripts for source data validation and transformation
Automated data loading into HDFS and Hive for pre-processing the data using One Automation
Gather data from Data warehouses in Teradata and Snowflake Developed Spark/Scala, Python for regular expression project in the Hadoop/Hive environment
Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata
Generate reports using Tableau
Experience at building Big Data applications using Cassandra and Hadoop Utilized SQOOP, ETL and Hadoop Filesystem APIs for implementing data ingestion pipelines Worked on Batch data of different granularity ranging from hourly, daily to weekly and monthly
Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari, PIG, and Hive
Developing and writing SQLs and stored procedures in Teradata
Loading data into snow flake and writing Snow SQLs scripts TDCH scripts for full and incremental refresh of Hadoop tables
Optimizing Hive queries by parallelizing with partitioning and bucketing
Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC
Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs, Snow SQLs Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager Used Agile Scrum methodology/ Scrum Alliance for development Environment: Hadoop, HDFS, AWS, Vertica, Bash, Scala, Kafka, MapReduce, YARN, Drill, Spark, Pig, Hive, Scala, Python, Java, NiFi, HBase, MySQL, Kerberos, Maven, Shell Scripting, SQL

Education

Skills

C
Scala
Core Java
J2EE (SERVLETS, JSP, JDBC, JAVA BEANS, EJB)
Bugzilla
QuickTestPro (QTP) 92
Selenium
Quality Center
Test Link
TWS
SPSS
SAS
Documentum
Tableau
Mahout
Linux (Ubuntu, CentOS)
Windows
Mac OS
MVC
Struts
Spring
Hibernate
Agile

UML
Design Patterns
Oracle 11g
MS-Access
MySQL
SQL-Server 2000/2005/2008/2012
Teradata
Eclipse
Visual Studio
IDLE
IntelliJ
HTML
CSS
XML
JavaScript
Maven
Java Script
UNIX
Python
R Language
Restful
SOAP

Timeline

Senior Data Engineer & Data Scientist

Citi Bank

12.2022 - Current

Senior Data Engineer

Capital Group

02.2021 - 12.2022

Sr. Hadoop/Data Engineer

Energy Transfer

02.2019 - 01.2021

Senior Data Engineer

Wipro limited

01.2016 - 02.2019

SAI RAM BOLUGODDU

Summary

Overview

Work History

Senior Data Engineer & Data Scientist

Senior Data Engineer

Sr. Hadoop/Data Engineer

Senior Data Engineer

Education

Skills

Timeline

Senior Data Engineer & Data Scientist

Senior Data Engineer

Sr. Hadoop/Data Engineer

Senior Data Engineer

Similar Profiles

Senthilkumar SSenthilkumar S

null null

Briana SaenzBriana Saenz

Terri A. ThorntonTerri A. Thornton

Elizabeth WhiteElizabeth White