Summary
Overview
Work History
Education
Skills
Timeline
Generic

SAI RAM BOLUGODDU

Irving,TX

Summary

8+ years of strong experience in Application Development using Pyspark,Java, Python, Scala and R & in depth understanding of Distributed Systems Architecture and Parallel Processing Frameworks. Strong experience using pyspark,HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, and HBase. Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance. Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features. Experience in developing Spark Applications using Spark RDD, Spark-SQL and Dataframe APIs. Worked with real-time data processing and streaming techniques using Spark streaming and Kafka. Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop. Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries. Significant experience writing custom UDFs in Hive and custom Input Formats in MapReduce. Involved in creating Hive tables, loading with data and writing Hive ad-hoc queries that will run internally in MapReduce and TEZ, Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing, Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data. Strong understanding of real time streaming technologies Spark and Kafka. Knowledge of job work flow management and coordinating tools like Oozie. Strong experience building end to end data pipelines on Hadoop platform. Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase. Strong understanding of Logical and Physical database models and entity-relationship modeling. Experience with Software development tools such as JIRA, Play, GIT. Good understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables. Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data. Strong understanding of Java Virtual Machines and multi-threading process. Experience in writing complex SQL queries, creating reports and dashboards. Proficient in using Unix based Command Line Interface, Expertise in handling ETL tools like Informatica. Excellent analytical, communication and interpersonal skills. Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).

Overview

8
8
years of professional experience

Work History

Senior Data Engineer & Data Scientist

Citi Bank
12.2022 - Current
  • Citi Bank is a financial service that enables growth and economic progress
  • Citi, the global bank an institution connecting millions of people across hundreds of countries and cities
  • Citibank is the consumer division of the financial services multinational Citigroup
  • Citibank provides credit cards, mortgages, personal loans, commercial loans, and lines of credit
  • I work for an Analytical team to analyze call routing data through Third party companies on the network and cost charged in total
  • My job is to develop reports and analyze costs related to LNP, LCR using Pyspark , mongo db, mySQL and Azure cloud HDInsight
  • I worked on improve performance of the existing jobs
  • Develop a framework to organize the files and maintain logs
  • Create jobs and run with jenkins
  • Work with pySpark to scripts on Azure
  • Designed and implemented an ETL framework to load data from multiple sources into Azure HDinsight
  • Work with azure data flow to ingest streaming data
  • Utilized Databricks through web and CLI
  • Utilize Databricks File System API’s for implementing data ingestion pipelines
  • Worked on Batch data of different granularity ranging from hourly, Daily to weekly and monthly
  • Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows Create and monitor the performance of Azure clusters on databricks
  • Write SQLs scripts and sparkSQL scripts TDCH scripts for full and incremental refresh of Hadoop tables
  • Optimizing Hive queries by parallelizing with portioning and bucketing
  • Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC
  • Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs Designed and published visually rich and intuitive Stream sets pipelines to migrate data Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager Used Agile Scrum methodology/ Scrum Alliance for development Environment: Hadoop, HDFS, Azure, Databricks, Python, Kafka, MapReduce, YARN, Spark, Hive, Scala, MySQL.

Senior Data Engineer

Capital Group
02.2021 - 12.2022
  • Capital Group is an American financial services company and manage equities through three investment groups that make investment and proxy voting decisions independently has been singularly focused on delivering superior results for long-term investors using high-conviction portfolios, rigorous research, and individual accountability
  • Today, Capital Group manages more than $1.7 trillion in equity and fixed income assets for millions of individual and institutional investors
  • Develop Shell script that reads Json files and apply it to Sqoop and Hive
  • Work with pySpark to migrate Fixed width, ORC, csv etc files
  • Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata
  • Utilized SQOOP, ETL and Hadoop File System API’s for implementing data ingestion pipelines
  • Worked on Batch data of different granularity ranging from hourly, Daily to weekly and monthly
  • Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari, spark and Work with stream sets and develop pipelines using streamsets
  • Developing and writing SQLs and stored procedures in Teradata
  • Loading data into snow flake and writing Snow SQLs scripts TDCH scripts for full and incremental refresh of Hadoop tables
  • Optimizing Hive queries by parallelizing with portioning and bucketing
  • Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC
  • Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs Designed and published visually rich and intuitive Stream sets pipelines to migrate data Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager Used Agile Scrum methodology/ Scrum Alliance for development Environment: Hadoop, HDFS, AWS, Vertica, Scala, Kafka, MapReduce, YARN, Spark, Hive, Scala, MySQL, Kerberos, Maven, Stream sets.

Sr. Hadoop/Data Engineer

Energy Transfer
02.2019 - 01.2021
  • Energy Transfer owns and operates one of the largest and most diversified portfolios of energy assets in North America, with a strategic footprint in all the major U.S
  • Production basins
  • Energy Transfer is a publicly traded limited partnership with core operations that include complementary natural gas midstream, intrastate and interstate transportation and storage assets; crude oil, natural gas liquids (NGL) and refined product transportation and terminal ling assets; and NGL fractionation
  • Setting up Data Lake in google cloud using Google cloud storage, Big Query, and Big Table
  • Planning and design of data warehouse in STAR schema
  • Designing structure of tables and documenting it
  • Developing scripts in Big Query and connecting it to reporting tools
  • Designed and implemented end to end big data platform on Teradata Appliance Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 Using Hadoop spark
  • Involvement developing architecture solution of the project to migrate data
  • Developed Python, Bash scripts to automate and provide Control flow
  • Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI
  • Work with Pyspark to perform ETL and generate reports
  • Writing regression SQL to merge the validated Data into Prod environment
  • Develop Python, PySpark , Bash scripts logs to Transform, and Load data across on premise and cloud platform
  • Write UDFs in Hadoop Pyspark to perform transformations and loads
  • Use NIFI to load data into HDFS as ORC files
  • Writing TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster
  • Working with ORC, AVRO and Json file formats
  • And create external tables and query on top of these files Using Big Query
  • Working with google cloud storage
  • Research and development of strategies to minimize the cost in google cloud
  • Using Apache solar for search operations on Brazil Walmart data
  • Apache slor to find legal documents and text search so that viewers can find related data Creating indexes to findf the documents
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala
  • Working with multiple sources
  • Migrating tables from Teradata and DB2 to Hadoop cluster
  • Migrating processed ready tables from Hadoop and Google Cloud Storage using Aorta Framework developed by Walmart store Inc
  • Source Analysis, Tracing back the sources of the data and finding its roots though Teradata, DB2 etc
  • Identifying the jobs that load the source tables and documenting it
  • Being an active part of Agile Scrum process with Sprints of 2 weeks
  • Working With Jira, Microsoft planner to track the progress of the project.

Senior Data Engineer

Wipro limited
01.2016 - 02.2019
  • Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients’ most complex digital transformation needs
  • Leveraging our holistic portfolio of capabilities in consulting, design, engineering, and operations, we help clients realize their boldest ambitions and build future-ready, sustainable businesses
  • With over 240,000 employees and business partners across 66 countries, we deliver on the promise of helping our customers, colleagues, and communities, to thrive in an ever-changing world
  • Developed Hive, Bash scripts for source data validation and transformation
  • Automated data loading into HDFS and Hive for pre-processing the data using One Automation
  • Gather data from Data warehouses in Teradata and Snowflake Developed Spark/Scala, Python for regular expression project in the Hadoop/Hive environment
  • Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata
  • Generate reports using Tableau
  • Experience at building Big Data applications using Cassandra and Hadoop Utilized SQOOP, ETL and Hadoop Filesystem APIs for implementing data ingestion pipelines Worked on Batch data of different granularity ranging from hourly, daily to weekly and monthly
  • Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari, PIG, and Hive
  • Developing and writing SQLs and stored procedures in Teradata
  • Loading data into snow flake and writing Snow SQLs scripts TDCH scripts for full and incremental refresh of Hadoop tables
  • Optimizing Hive queries by parallelizing with partitioning and bucketing
  • Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC
  • Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs, Snow SQLs Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager Used Agile Scrum methodology/ Scrum Alliance for development Environment: Hadoop, HDFS, AWS, Vertica, Bash, Scala, Kafka, MapReduce, YARN, Drill, Spark, Pig, Hive, Scala, Python, Java, NiFi, HBase, MySQL, Kerberos, Maven, Shell Scripting, SQL

Education

Skills

  • C
  • Scala
  • Core Java
  • J2EE (SERVLETS, JSP, JDBC, JAVA BEANS, EJB)
  • Bugzilla
  • QuickTestPro (QTP) 92
  • Selenium
  • Quality Center
  • Test Link
  • TWS
  • SPSS
  • SAS
  • Documentum
  • Tableau
  • Mahout
  • Linux (Ubuntu, CentOS)
  • Windows
  • Mac OS
  • MVC
  • Struts
  • Spring
  • Hibernate
  • Agile
  • UML
  • Design Patterns
  • Oracle 11g
  • MS-Access
  • MySQL
  • SQL-Server 2000/2005/2008/2012
  • Teradata
  • Eclipse
  • Visual Studio
  • IDLE
  • IntelliJ
  • HTML
  • CSS
  • XML
  • JavaScript
  • Maven
  • Java Script
  • UNIX
  • Python
  • R Language
  • Restful
  • SOAP

Timeline

Senior Data Engineer & Data Scientist

Citi Bank
12.2022 - Current

Senior Data Engineer

Capital Group
02.2021 - 12.2022

Sr. Hadoop/Data Engineer

Energy Transfer
02.2019 - 01.2021

Senior Data Engineer

Wipro limited
01.2016 - 02.2019

SAI RAM BOLUGODDU