Summary

Overview

Work History

Skills

Accomplishments

Education

Timeline

Gowtham Reddy

Summary

Having around 10+ years of Software developer experience in Application development/Architecture and Data Analytics with specialization in Web Applications and Client-Server, Big data applications with Java and Big Data Technologies and expertise in Java, Scala, Python, Spark, Pyspark, Hadoop Map Reduce, in various industrial sectors including but not limited to banking, insurance, education, cloud and 3+ year of experience in AWS, Kafka, ElasticSearch, Devops and Linux Administration. Strong Big data analytics experience using the Hadoop Ecosystem tools Map-Reduce, HDFS, Apache Spark, JupyterLab, AWS EMR, Jira, Glue. Site Reliability Engineering responsibilities for Kafka platform that scales 2 GB/Sec and 20 million messages/sec. Hands-on cloud platform experience with vast AWS services like AWS EMR, Glue, Redshift, SWF, S3, IAM, SQS, SNS, IAM, Cloudformation, Cloudwatch, DynamoDB. Experienced with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Developed Big data applications using Apache Spark and Apache Hadoop, handling 200 million records at the smallest scale. Hands-on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Dataframes, Spark Set Operations and Spark SQL. Excellent Programming skills at a higher level of abstraction using Java and Python. Experience in using D-Streams, Accumulator, Broadcast variables, JIRA, Rally, Remedy, RDD caching for Spark. Working knowledge of Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.

Overview

years of professional experience

years of post-secondary education

Work History

Sr. Data Engineer

Best Buy

Philadelphia, Pennsylvania

01.2024 - Current

Leveraged the microservices architecture to build large and complex projects
Wrote Spark jobs using Java APIs to read data from S3 and perform map-reduce operations on those datasets for different use cases on the AWS EBS Snapshots data
Uploaded and processed more than 10 terabytes of data from various structured and unstructured sources into HDFS
Integrated multiple AWS services (AWS EMR migration, Glue, Redshift, SWF, S3) using Apache Spark, Pyspark and coding in JAVA, to deliver a final product to solve complex business use cases
Successfully led the migration of 2 AWS EMR services and migrations, resulting in a reduction of operating costs by $2.3M per service
Leveraged AWS Glue, to execute serverless ad-hoc jobs to analyze snapshot data and monitor respective logs using CloudWatch
Designed and implemented by configuring Topics in new Kafka cluster in all environment
Successfully secured the Kafka cluster with Kerberos Implemented Kafka Security Features using SSL and without Kerberos
Utilizing Entity Framework, ADO.Net and LINQ is connecting to Data Access Management with SQL Server
Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features
Aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV and other compressed file format codecs like Zip, Snappy, deflate
Worked on ADO.Net components SQL Connection Object, SQL Command Object, Data Reader, Data Adapter, Data Set and Data View to provide communication to the database
Migrated EMR jobs from Hadoop to Spark, reducing its runtime by around 25% on average across all region scales, thereby reducing instance requirements to achieve the sought SLA
Explored for possible storage optimization opportunities for data stored in S3, and implemented solutions to bring down storage footprint around 3.4 exabytes (reduced by 27%) annually
Lead efforts on various security campaigns to make access to critical resources logged and secured
Designed and provisioned the platform architecture to execute Hadoop and Machine Learning use cases under Cloud infrastructure, AWS, EMR migrations, and S3
Wrote queries to perform COPY and UNLOAD operations on AWS Redshift data
Built Self-service data pipelines using AWS Services like, SNS, Step-Function, Lambda, Glue, EMR, EC2, Athena, Quick Sight, Redshift etc
Presented design ideas to solve complex business problems in an innovative way
Mentored multiple interns in designing and developing prototypes to solve some major problems related to handling data at scales growing exponentially
Developed and deployed various Lambda functions in AWS with in-built AWS Lambda Libraries
Expertise understanding of AWS DNS Services through Route53
Understanding of Simple, Weighted, Latency, Failover & Geolocational Route types
Prepared projects, dashboards, reports and questions for all JIRA related services
Responsible for day-to-day metrics generation and monitoring for maintaining systems by handling scaling issues or code bugs to keep the system availability as high as possible
Well versed with Rally and Jira
Owned multiple services with responsibilities ranging from handling build and deployment into those pipelines between beta, gamma and prod environments
Implemented various performance and durability improvements in services that managed storage and cost calculation operations for billions of snapshots, in a correct, faster and durable way
Performed 24x7 on-call duties to detect any issues in the services and resolved those quickly in a proactive manner, to minimize the impact and control the blast radius successfully
Technologies Used: Apache Hadoop, HDFS, Pyspark, Spark, MapReduce, Java, Python, JSON, AWS - EMR, Migration, Glue, Kafka, Redshift, Jira, Step-Function, Aurora, ADO.Net, DynamoDB, Cloudformation, Cloudwatch, SNS, lambda, SQS, ETL, VPC, Subnets, Pipelines, Shell scripting, Jupyter Hub, SSH, Security Initiatives

Data Engineer

Walmart

, Arkansas

01.2022 - 12.2023

Understand requirements, build codes, and guide other developers during development activities to develop high standard stable codes within the limits of Confidential and client processes, standards and guidelines
Develop Informatica mappings to be implemented based on client requirements and for analytics team
Perform end to end system integration testing
Involve in functional testing and regression testing
Review and write SQL scripts to verify data from source systems to target
Worked on transformations to transform the data required by analytics team for visualization and business decisions
Review plan and provide feedback on gaps, timeline and execution feasibility etc
As required in the project
Participate in KT sessions by customer/other business teams and provide feedback on requirements
Involved in migrating the client data warehouse architecture from on-premises into Azure cloud
Create pipelines in ADF using linked services to extract, transform and load data from multiple sources like Azure SQL, Blob storage and Azure SQL Data warehouse
Creating storage accounts which involved with end-to-end environment for running jobs
Implement Azure Data Factory operations and deployment into Azure for moving data from on-premises into cloud
Design data auditing and data masking for security purpose
Monitoring end to end integration using Azure monitor
Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and Table calculations
Implementation of data movements from on-premises to cloud in Azure
Develop batch processing solutions by using Data Factory and Azure Data bricks
Implement Azure Data bricks clusters, notebooks, jobs, and auto scaling
Design for data auditing and data masking
Design for data encryption for data at rest and in transit
Design relational and non-relational data stores on Azure
Preparing ETL test strategy, designs and test plans to execute test cases for ETL and BI systems
Designed and developed ETL workflows and datasets in Alteryx
Creating ETL test scenarios and test cases and plans to execute test cases
Interacting with business users and understanding their requirements
Good understanding of data warehouse concepts
Good exposure and understanding of Hadoop Ecosystem
Proficient in SQL and other relational databases
Good exposure to Microsoft Power BI
Managing data privacy and security in Power BI
Extensively involved in designing and developing the power BI Data model using multiple DAX expressions to build calculated columns and calculated measures
Good understanding and working knowledge of Python language
Technologies Used: SQL Database, Azure data factory, Azure data lake storage, Azure synapse analytics, Azure synapse workspace, Synapse sql pool, Power BI, Python

Data Engineer

, Connecticut

09.2017 - 12.2021

Designed a service for calculating customer costs of AWS Snapshot usage in an incremental way at various scales
Developed prototypes using latest technologies like GraphX, Graphframes and techniques like Sharding to solve scaling issues in traditional systems
Installed Kerberos secured kafka cluster with no encryption on Dev and Prod
Also set up Kafka ACLs into it
Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage
Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener
Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user
Implemented a solution using AWS EMR, Scala, Apache Spark and Pyspark handling millions of S3 data, to finish the job within the required SLA of 15-20 hrs daily
Experience in managing and reviewing Hadoop log files
Installed Ranger in all environments for Second Level of security in Kafka Broker
Responsible for building scalable distributed data solutions using Hadoop, Scala, Pyspark and Spark
Implemented workflows using SWF, to orchestrate scheduling of daily jobs, run as a set of activities
Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MapReduce jobs
Wrote activities that would unload data from Redshift data warehouse, and spawn EMR jobs to meter customer snapshot usage
Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc
Contributed to code documentation with all the experimental data, research and design workflow for the end-to-end system
Technologies Used: Apache Graphx, Graphframes, RDD, Dataset, Scala, Pyspark, Spark, Kafka, migration, Hadoop, AWS EMR, SWF (Simple workflow service), Redshift, Java, Parquet, Deflate

Data Engineer

Aetna

06.2015 - 11.2016

Examined transaction data, identified outliers, inconsistencies and manipulated data to ensure data quality and integration
Developed data pipeline using ETL, Spark, Pyspark and Hive to ingest, transform and analyze operational data, for AETNA’s health care insurance management system
Used SparkSQL with Scala for creating data frames and performed transformations on data frames
Responsible for leading scrum calls and gathering and understanding client requirements
Handled daily deployments of changes into the pipeline to reflect changes into the prod environment
Developing Oracle PL/SQL stored procedures, Functions, Packages, Pyspark, SQL scripts
Load and transform large sets of structured and semi-structured data
Implemented solutions using Hadoop
Wrote complex SQL queries and stored procedures, to extract relevant customer data, that was to be displayed on the portal in a structured format
Directly worked with the client to understand business needs so as to provide a convenient portal for customers to file an insurance claim, using Asp .Net MVC along with Pyspark, Javascript, Jquery, Ajax with coding in C#
Design, develop, test, deploy, maintain, and improve data integration pipeline objects developed using Apache Spark / Pyspark / Python
Also developed an admin portal to help AETNA for its internal operations like adding content to the web page, presenting new data to users, etc
Worked with system end users directly, to understand the manual operations and pain points on their day-to-day finance operations
Upported the extraction, transformation and load process (ETL) for a Data Warehouse from their legacy systems using Informatica
Designed and delivered a desktop application using WPF, to automate the tedious, time consuming and error-prone manual processing of user data in excel
Eventually helped the client to save around 10 hours a week of employee work
Technologies Used: Apache Spark, Hadoop, MapReduce, Spark SQL, Java script, Python, PL/SQL, Stored Procedures, WPF, C#, Pyspark, Rest API, ETL/BI, WPF, MS SQL server, Oracle 10g, Excel, Shell scripting, Build & Deployment, Scrum

Data Engineer

Win It Solutions

06.2014 - 05.2015

Worked as Data Analyst for requirements gathering, business analysis and project coordination
Worked with other Data Analysis team to gathering the Data Profiling information
Responsible for the analysis of business requirements and design implementation of the business solution
Using SQL, performed data analysis and data validation from multiple data sources of T-Mobile and Orange customer databases to build a single optimized process for combined customer base
Wrote complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process
Worked with multiple development teams to integrate, upgrade, and monitor multiple Orange tools to streamline porting process
Analyzed, improved, and created documentation for multiple processes for customer’s account changes, which led to improving customer loss rate by 7%
Successfully increased customer insights and conversations by 36% by developing root cause analysis reports by using previous customer conversations and associated CRM data
Performed AB testing on recently developed and integrated tools using training and testing datasets
Experience in Agile, Waterfall technologies
Sets and runs effective recurring status meetings with the product, portfolio, and delivery managers
Identifying and providing remediation on vulnerabilities for specific applications in use by various teams

Skills

TECHNICAL SKILLS:
Programming Languages / Frameworks:
Java, Scala, Python, C, C#, JavaScript, C, Net
Apache
Spark - RDD, Dataframes, Spark SQL, GraphX
Graphframes, Spark Streaming, Hive, Hadoop, HDFS, Solr - Indexing, Kafka, Search
AWS and Azure
EMR, Glue, EC2, S3, SWF, IAM, CloudWatch, Redshift, Aurora, Jira, SNS, SQS, Lambda, Cloudformation, DynamoDB, Route53, VPC, Subnets
Amazon Web Services (AWS), Amazon Redshift, MS Azure, Azure blob storage, Azure Data Factory, Azure Synapse & Google cloud Platform (Big Query, Big Table, Dataproc)
Databases
PL/SQL, MS SQL Server, DynamoDB, Oracle 10g, ADONet, Entity Framework, RDS, MySQL workbench
Web Technologies
JavaScript, React, Firebase, REST API, Nodejs, HTML, CSS, Ajax, Asp Net MVC
Development / Build Tools
BackEnd - Eclipse, Intellij, Visual Studio, Front End - VS Code, Misc - Jupyter Notebook, Google Colab, Maven, JUnit, Mockito, log4j
Operating Systems
Linux/Unix, MacOS, Windows
Methodologies
Agile Scrum, Waterfall Model
Machine learning
Regression (Linear, Logistic), Classification, Scikit-learn, Neural Network (Keras), Spark MLib, Clustering (K-Neighbors)
Miscellaneous
Shell Scripting - bash/zsh, Git, TFS, ETL, Load Balancer, Spring, Spring MVC, Pipelines, WPF, Asp Net MVC, Automation
Environment:

SQL, Databricks, Excel, Power BI, Azure SQL Data Warehouse, Azure Data Lake, Microsoft PowerPoint

Kafka streaming

Big data processing

ETL development

Python programming

Data pipeline design

Data modeling

Hadoop ecosystem

Data warehousing

Data security

Scala programming

Data integration

Java development

SQL and databases

Database design

SQL programming

RDBMS

Data migration

Advanced analytics

Relational databases

Storage virtualization

Risk analysis

Data analysis

Accomplishments

Experience working on administering various AWS Services using Amazon AWS Console, CLI in Linux and windows environment and by using Amazon API in java and Python
Queried petabytes of customer data using S3 Select with HIVE to extract required subset of relevant analytical information
Written multiple MapReduce Jobs using Java API, for data extraction, transformation and Aggregation of large-scale data stored in AWS Redshift data warehouse
Experience in validating and cleansing the raw input data into desirable format so as to further process and gather insights on it
Worked with different types of SQL and NoSQL databases like MySQL, PL/SQL, Oracle 10g, RDS, DynamoDB, Firebase, Aurora
Aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV and other compressed file format codecs like Zip, Snappy, deflate
Triggered lambda functions for S3 upload events using SQS message queue
Created slack bots using AWS Chatbot and SNS pub/sub model to notify intended users
Good knowledge on build tools like Maven, Log4j and internal tools
Proficient in developing, deploying and managing the Apache Solr Search engine from development to production
Used various Project Management services like JIRA for tracking issues, GitHub for various code reviews and worked on various version control tools like CVS, GIT, and SVN
Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O
Multi-Threading, Serialization and deserialization of streaming applications
Experience in Software Design, Development and Implementation of Client/Server Web based applications using Javascript, React.js, Node.js and Firebase
Experience in moving data between CP and Azure using Azure Data Factory
Great experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioning to deliver the best results for the large datasets
Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced and Independent Decisions
Done clustering, regression and classification using Machine learning libraries like Spark MLlib, as well as neural network algorithms like Keras
Experience with various ML python libraries like numpy, pandas and scikit-learn.

Education

Master of Science - Computer Science

Northern Arizona University

Flagstaff, AZ

12.2016 - 08.2018

Timeline

Sr. Data Engineer

Best Buy

01.2024 - Current

Data Engineer

Walmart

01.2022 - 12.2023

Data Engineer

09.2017 - 12.2021

Master of Science - Computer Science

Northern Arizona University

12.2016 - 08.2018

Data Engineer

Aetna

06.2015 - 11.2016

Data Engineer

Win It Solutions

06.2014 - 05.2015

Gowtham Reddy

Summary

Overview

Work History

Sr. Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Skills

Accomplishments

Education

Master of Science - Computer Science

Timeline

Sr. Data Engineer

Data Engineer

Data Engineer

Master of Science - Computer Science

Data Engineer

Data Engineer

Similar Profiles

Joseph ElgueraJoseph Elguera

Claire MelhemClaire Melhem

Jeremy FortJeremy Fort

Samuel PalmeroSamuel Palmero

AMIT KUMARAMIT KUMAR