Summary
Overview
Work History
Skills
Accomplishments
Education
Timeline
CustomerServiceRepresentative

Gowtham Reddy

Summary

Having around 10+ years of Software developer experience in Application development/Architecture and Data Analytics with specialization in Web Applications and Client-Server, Big data applications with Java and Big Data Technologies and expertise in Java, Scala, Python, Spark, Pyspark, Hadoop Map Reduce, in various industrial sectors including but not limited to banking, insurance, education, cloud and 3+ year of experience in AWS, Kafka, ElasticSearch, Devops and Linux Administration. Strong Big data analytics experience using the Hadoop Ecosystem tools Map-Reduce, HDFS, Apache Spark, JupyterLab, AWS EMR, Jira, Glue. Site Reliability Engineering responsibilities for Kafka platform that scales 2 GB/Sec and 20 million messages/sec. Hands-on cloud platform experience with vast AWS services like AWS EMR, Glue, Redshift, SWF, S3, IAM, SQS, SNS, IAM, Cloudformation, Cloudwatch, DynamoDB. Experienced with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Developed Big data applications using Apache Spark and Apache Hadoop, handling 200 million records at the smallest scale. Hands-on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Dataframes, Spark Set Operations and Spark SQL. Excellent Programming skills at a higher level of abstraction using Java and Python. Experience in using D-Streams, Accumulator, Broadcast variables, JIRA, Rally, Remedy, RDD caching for Spark. Working knowledge of Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.

Overview

11
11
years of professional experience
2
2
years of post-secondary education

Work History

Sr. Data Engineer

Best Buy
Philadelphia, Pennsylvania
01.2024 - Current
  • Leveraged the microservices architecture to build large and complex projects
  • Wrote Spark jobs using Java APIs to read data from S3 and perform map-reduce operations on those datasets for different use cases on the AWS EBS Snapshots data
  • Uploaded and processed more than 10 terabytes of data from various structured and unstructured sources into HDFS
  • Integrated multiple AWS services (AWS EMR migration, Glue, Redshift, SWF, S3) using Apache Spark, Pyspark and coding in JAVA, to deliver a final product to solve complex business use cases
  • Successfully led the migration of 2 AWS EMR services and migrations, resulting in a reduction of operating costs by $2.3M per service
  • Leveraged AWS Glue, to execute serverless ad-hoc jobs to analyze snapshot data and monitor respective logs using CloudWatch
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment
  • Successfully secured the Kafka cluster with Kerberos Implemented Kafka Security Features using SSL and without Kerberos
  • Utilizing Entity Framework, ADO.Net and LINQ is connecting to Data Access Management with SQL Server
  • Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features
  • Aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV and other compressed file format codecs like Zip, Snappy, deflate
  • Worked on ADO.Net components SQL Connection Object, SQL Command Object, Data Reader, Data Adapter, Data Set and Data View to provide communication to the database
  • Migrated EMR jobs from Hadoop to Spark, reducing its runtime by around 25% on average across all region scales, thereby reducing instance requirements to achieve the sought SLA
  • Explored for possible storage optimization opportunities for data stored in S3, and implemented solutions to bring down storage footprint around 3.4 exabytes (reduced by 27%) annually
  • Lead efforts on various security campaigns to make access to critical resources logged and secured
  • Designed and provisioned the platform architecture to execute Hadoop and Machine Learning use cases under Cloud infrastructure, AWS, EMR migrations, and S3
  • Wrote queries to perform COPY and UNLOAD operations on AWS Redshift data
  • Built Self-service data pipelines using AWS Services like, SNS, Step-Function, Lambda, Glue, EMR, EC2, Athena, Quick Sight, Redshift etc
  • Presented design ideas to solve complex business problems in an innovative way
  • Mentored multiple interns in designing and developing prototypes to solve some major problems related to handling data at scales growing exponentially
  • Developed and deployed various Lambda functions in AWS with in-built AWS Lambda Libraries
  • Expertise understanding of AWS DNS Services through Route53
  • Understanding of Simple, Weighted, Latency, Failover & Geolocational Route types
  • Prepared projects, dashboards, reports and questions for all JIRA related services
  • Responsible for day-to-day metrics generation and monitoring for maintaining systems by handling scaling issues or code bugs to keep the system availability as high as possible
  • Well versed with Rally and Jira
  • Owned multiple services with responsibilities ranging from handling build and deployment into those pipelines between beta, gamma and prod environments
  • Implemented various performance and durability improvements in services that managed storage and cost calculation operations for billions of snapshots, in a correct, faster and durable way
  • Performed 24x7 on-call duties to detect any issues in the services and resolved those quickly in a proactive manner, to minimize the impact and control the blast radius successfully
  • Technologies Used: Apache Hadoop, HDFS, Pyspark, Spark, MapReduce, Java, Python, JSON, AWS - EMR, Migration, Glue, Kafka, Redshift, Jira, Step-Function, Aurora, ADO.Net, DynamoDB, Cloudformation, Cloudwatch, SNS, lambda, SQS, ETL, VPC, Subnets, Pipelines, Shell scripting, Jupyter Hub, SSH, Security Initiatives

Data Engineer

Walmart
, Arkansas
01.2022 - 12.2023
  • Understand requirements, build codes, and guide other developers during development activities to develop high standard stable codes within the limits of Confidential and client processes, standards and guidelines
  • Develop Informatica mappings to be implemented based on client requirements and for analytics team
  • Perform end to end system integration testing
  • Involve in functional testing and regression testing
  • Review and write SQL scripts to verify data from source systems to target
  • Worked on transformations to transform the data required by analytics team for visualization and business decisions
  • Review plan and provide feedback on gaps, timeline and execution feasibility etc
  • As required in the project
  • Participate in KT sessions by customer/other business teams and provide feedback on requirements
  • Involved in migrating the client data warehouse architecture from on-premises into Azure cloud
  • Create pipelines in ADF using linked services to extract, transform and load data from multiple sources like Azure SQL, Blob storage and Azure SQL Data warehouse
  • Creating storage accounts which involved with end-to-end environment for running jobs
  • Implement Azure Data Factory operations and deployment into Azure for moving data from on-premises into cloud
  • Design data auditing and data masking for security purpose
  • Monitoring end to end integration using Azure monitor
  • Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and Table calculations
  • Implementation of data movements from on-premises to cloud in Azure
  • Develop batch processing solutions by using Data Factory and Azure Data bricks
  • Implement Azure Data bricks clusters, notebooks, jobs, and auto scaling
  • Design for data auditing and data masking
  • Design for data encryption for data at rest and in transit
  • Design relational and non-relational data stores on Azure
  • Preparing ETL test strategy, designs and test plans to execute test cases for ETL and BI systems
  • Designed and developed ETL workflows and datasets in Alteryx
  • Creating ETL test scenarios and test cases and plans to execute test cases
  • Interacting with business users and understanding their requirements
  • Good understanding of data warehouse concepts
  • Good exposure and understanding of Hadoop Ecosystem
  • Proficient in SQL and other relational databases
  • Good exposure to Microsoft Power BI
  • Managing data privacy and security in Power BI
  • Extensively involved in designing and developing the power BI Data model using multiple DAX expressions to build calculated columns and calculated measures
  • Good understanding and working knowledge of Python language
  • Technologies Used: SQL Database, Azure data factory, Azure data lake storage, Azure synapse analytics, Azure synapse workspace, Synapse sql pool, Power BI, Python

Data Engineer

, Connecticut
09.2017 - 12.2021
  • Designed a service for calculating customer costs of AWS Snapshot usage in an incremental way at various scales
  • Developed prototypes using latest technologies like GraphX, Graphframes and techniques like Sharding to solve scaling issues in traditional systems
  • Installed Kerberos secured kafka cluster with no encryption on Dev and Prod
  • Also set up Kafka ACLs into it
  • Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage
  • Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener
  • Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user
  • Implemented a solution using AWS EMR, Scala, Apache Spark and Pyspark handling millions of S3 data, to finish the job within the required SLA of 15-20 hrs daily
  • Experience in managing and reviewing Hadoop log files
  • Installed Ranger in all environments for Second Level of security in Kafka Broker
  • Responsible for building scalable distributed data solutions using Hadoop, Scala, Pyspark and Spark
  • Implemented workflows using SWF, to orchestrate scheduling of daily jobs, run as a set of activities
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MapReduce jobs
  • Wrote activities that would unload data from Redshift data warehouse, and spawn EMR jobs to meter customer snapshot usage
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc
  • Contributed to code documentation with all the experimental data, research and design workflow for the end-to-end system
  • Technologies Used: Apache Graphx, Graphframes, RDD, Dataset, Scala, Pyspark, Spark, Kafka, migration, Hadoop, AWS EMR, SWF (Simple workflow service), Redshift, Java, Parquet, Deflate

Data Engineer

Aetna
06.2015 - 11.2016
  • Examined transaction data, identified outliers, inconsistencies and manipulated data to ensure data quality and integration
  • Developed data pipeline using ETL, Spark, Pyspark and Hive to ingest, transform and analyze operational data, for AETNA’s health care insurance management system
  • Used SparkSQL with Scala for creating data frames and performed transformations on data frames
  • Responsible for leading scrum calls and gathering and understanding client requirements
  • Handled daily deployments of changes into the pipeline to reflect changes into the prod environment
  • Developing Oracle PL/SQL stored procedures, Functions, Packages, Pyspark, SQL scripts
  • Load and transform large sets of structured and semi-structured data
  • Implemented solutions using Hadoop
  • Wrote complex SQL queries and stored procedures, to extract relevant customer data, that was to be displayed on the portal in a structured format
  • Directly worked with the client to understand business needs so as to provide a convenient portal for customers to file an insurance claim, using Asp .Net MVC along with Pyspark, Javascript, Jquery, Ajax with coding in C#
  • Design, develop, test, deploy, maintain, and improve data integration pipeline objects developed using Apache Spark / Pyspark / Python
  • Also developed an admin portal to help AETNA for its internal operations like adding content to the web page, presenting new data to users, etc
  • Worked with system end users directly, to understand the manual operations and pain points on their day-to-day finance operations
  • Upported the extraction, transformation and load process (ETL) for a Data Warehouse from their legacy systems using Informatica
  • Designed and delivered a desktop application using WPF, to automate the tedious, time consuming and error-prone manual processing of user data in excel
  • Eventually helped the client to save around 10 hours a week of employee work
  • Technologies Used: Apache Spark, Hadoop, MapReduce, Spark SQL, Java script, Python, PL/SQL, Stored Procedures, WPF, C#, Pyspark, Rest API, ETL/BI, WPF, MS SQL server, Oracle 10g, Excel, Shell scripting, Build & Deployment, Scrum

Data Engineer

Win It Solutions
06.2014 - 05.2015
  • Worked as Data Analyst for requirements gathering, business analysis and project coordination
  • Worked with other Data Analysis team to gathering the Data Profiling information
  • Responsible for the analysis of business requirements and design implementation of the business solution
  • Using SQL, performed data analysis and data validation from multiple data sources of T-Mobile and Orange customer databases to build a single optimized process for combined customer base
  • Wrote complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process
  • Worked with multiple development teams to integrate, upgrade, and monitor multiple Orange tools to streamline porting process
  • Analyzed, improved, and created documentation for multiple processes for customer’s account changes, which led to improving customer loss rate by 7%
  • Successfully increased customer insights and conversations by 36% by developing root cause analysis reports by using previous customer conversations and associated CRM data
  • Performed AB testing on recently developed and integrated tools using training and testing datasets
  • Experience in Agile, Waterfall technologies
  • Sets and runs effective recurring status meetings with the product, portfolio, and delivery managers
  • Identifying and providing remediation on vulnerabilities for specific applications in use by various teams

Skills

  • TECHNICAL SKILLS:
  • Programming Languages / Frameworks:
  • Java, Scala, Python, C, C#, JavaScript, C, Net
  • Apache
  • Spark - RDD, Dataframes, Spark SQL, GraphX
  • Graphframes, Spark Streaming, Hive, Hadoop, HDFS, Solr - Indexing, Kafka, Search
  • AWS and Azure
  • EMR, Glue, EC2, S3, SWF, IAM, CloudWatch, Redshift, Aurora, Jira, SNS, SQS, Lambda, Cloudformation, DynamoDB, Route53, VPC, Subnets
  • Amazon Web Services (AWS), Amazon Redshift, MS Azure, Azure blob storage, Azure Data Factory, Azure Synapse & Google cloud Platform (Big Query, Big Table, Dataproc)
  • Databases
  • PL/SQL, MS SQL Server, DynamoDB, Oracle 10g, ADONet, Entity Framework, RDS, MySQL workbench
  • Web Technologies
  • JavaScript, React, Firebase, REST API, Nodejs, HTML, CSS, Ajax, Asp Net MVC
  • Development / Build Tools
  • BackEnd - Eclipse, Intellij, Visual Studio, Front End - VS Code, Misc - Jupyter Notebook, Google Colab, Maven, JUnit, Mockito, log4j
  • Operating Systems
  • Linux/Unix, MacOS, Windows
  • Methodologies
  • Agile Scrum, Waterfall Model
  • Machine learning
  • Regression (Linear, Logistic), Classification, Scikit-learn, Neural Network (Keras), Spark MLib, Clustering (K-Neighbors)
  • Miscellaneous
  • Shell Scripting - bash/zsh, Git, TFS, ETL, Load Balancer, Spring, Spring MVC, Pipelines, WPF, Asp Net MVC, Automation
  • Environment:
  • SQL, Databricks, Excel, Power BI, Azure SQL Data Warehouse, Azure Data Lake, Microsoft PowerPoint

Kafka streaming

Big data processing

ETL development

Python programming

Data pipeline design

Data modeling

Hadoop ecosystem

Data warehousing

Data security

Scala programming

Data integration

Java development

SQL and databases

Database design

SQL programming

RDBMS

Data migration

Advanced analytics

Relational databases

Storage virtualization

Risk analysis

Data analysis

Accomplishments

  • Experience working on administering various AWS Services using Amazon AWS Console, CLI in Linux and windows environment and by using Amazon API in java and Python
  • Queried petabytes of customer data using S3 Select with HIVE to extract required subset of relevant analytical information
  • Written multiple MapReduce Jobs using Java API, for data extraction, transformation and Aggregation of large-scale data stored in AWS Redshift data warehouse
  • Experience in validating and cleansing the raw input data into desirable format so as to further process and gather insights on it
  • Worked with different types of SQL and NoSQL databases like MySQL, PL/SQL, Oracle 10g, RDS, DynamoDB, Firebase, Aurora
  • Aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV and other compressed file format codecs like Zip, Snappy, deflate
  • Triggered lambda functions for S3 upload events using SQS message queue
  • Created slack bots using AWS Chatbot and SNS pub/sub model to notify intended users
  • Good knowledge on build tools like Maven, Log4j and internal tools
  • Proficient in developing, deploying and managing the Apache Solr Search engine from development to production
  • Used various Project Management services like JIRA for tracking issues, GitHub for various code reviews and worked on various version control tools like CVS, GIT, and SVN
  • Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O
  • Multi-Threading, Serialization and deserialization of streaming applications
  • Experience in Software Design, Development and Implementation of Client/Server Web based applications using Javascript, React.js, Node.js and Firebase
  • Experience in moving data between CP and Azure using Azure Data Factory
  • Great experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioning to deliver the best results for the large datasets
  • Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced and Independent Decisions
  • Done clustering, regression and classification using Machine learning libraries like Spark MLlib, as well as neural network algorithms like Keras
  • Experience with various ML python libraries like numpy, pandas and scikit-learn.

Education

Master of Science - Computer Science

Northern Arizona University
Flagstaff, AZ
12.2016 - 08.2018

Timeline

Sr. Data Engineer

Best Buy
01.2024 - Current

Data Engineer

Walmart
01.2022 - 12.2023

Data Engineer

09.2017 - 12.2021

Master of Science - Computer Science

Northern Arizona University
12.2016 - 08.2018

Data Engineer

Aetna
06.2015 - 11.2016

Data Engineer

Win It Solutions
06.2014 - 05.2015
Gowtham Reddy