Summary

Overview

Work History

Education

Skills

Accomplishments

Timeline

Sahithi V

Westborough,MA

Summary

" Experienced data engineer with a strong foundation as a .NET full stack developer, bringing five years of proven expertise in software development and a demonstrated trajectory of career growth. Eager to apply my technical proficiency in data engineering, ETL processes, and data warehousing to architect and optimize data solutions that empower businesses to extract actionable insights. Committed to leveraging my diverse background to bridge the gap between software development and data engineering, while continuously expanding my knowledge and contributing to innovative data-driven projects." Summary of Experience: Over 5 Years of strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification, and Testing as per Cycle in both Waterfall and Agile methodologies. Strong Experience with Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, EMR, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Auto Scaling, and Security Groups. Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, and Azure Data Factory. Hands-on experience on Google Cloud Platform (GCP) in all the big data products BigQuery, Cloud Data Proc, Google Cloud Storage, and Composer (Air Flow as a service). Strong experience in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue. Excellent programming skills with experience in Java, PL/SQL, SQL, Scala, and Python Programming. Hands-on experience in writing Map Reduce programs using Java to handle data sets using Map and Reduce tasks. Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirements using Hive and working on HiveQL queries for required data extraction, joining operations, writing custom UDFs as required, and having good experience in optimizing Hive Queries. Experience with ETL concepts using Informatica Power Center, AB Initio. Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and loading into Hive tables, which are partitioned. Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics. Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala. Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming.

Overview

years of professional experience

Work History

Big Data Engineer

Wipro

11.2019 - 09.2021

Worked on Big data on AWS cloud services i.e
EC2, S3, EMR, and DynamoDB
Migrated on-premise database structure to Confidential Redshift data warehouse
Was responsible for ETL and data validation using SQL Server Integration Services
Defined and deployed monitoring, metrics, and logging systems on AWS
Connected to Amazon Redshift through Tableau to extract live data for real-time analysis
Used Hive SQL, Presto SQL, and Spark SQL for ETL jobs and used the right technology for the job to get done
Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints, and created logical and physical models using Erwin
Measured Efficiency of the Hadoop/Hive environment ensuring SLA is met
Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS)
Defined facts, and dimensions and designed the data marts using Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin
Worked on publishing interactive data visualizations dashboards, reports /workbooks on Tableau, and SAS Visual Analytics
Advanced knowledge of Confidential Redshift and MPP database concepts
Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large-scale data handling Millions of records every day
Designed and implemented big data ingestion pipelines to ingest multi TB data from various data sources using Kafka, and Spark streaming including data quality checks, transformation, and storage as efficient storage formats Performing data wrangling on Multi-Terabyte datasets from various data sources for a variety of downstream purposes such as analytics using PySpark
Wrote build/integration/installation scripts in Python and bash as needed
Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
Exception handling in Python to add logs to the application
Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes
Migrated ETL solutions from Redshift to run on the Snowflake database
Hands-on working experience with Snowflake in implementing the row level security
Managed security groups on AWS, focusing on high availability, fault tolerance, and auto-scaling using Terraform templates
Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline
Compiled data from various sources to perform complex analysis for actionable results
Built performant, scalable ETL processes to load, cleanse and validate data
Analyse the existing application programs and tune SQL queries using an execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance
Got involved in migrating the on-prem Hadoop system to using GCP (Google Cloud Platform)
Wrote various data normalization jobs for new data ingested into Redshift
Collaborate with team members and stakeholders in the design and development of the data environment
Preparing associated documentation for specifications, requirements, and testing
Develop and deploy the outcome using spark and Scala code in the Hadoop cluster running on GCP
Developed Unix shell scripts to load large numbers of files into HDFS from Linux File System
Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies
Environment: Oracle, Kafka, Python, Redshift, Informatica, AWS, EC2, S3, SQL Server, Erwin, RDS, NOSQL, SnowFlake Schema, MySQL, Bash, Dynamo DB, PostgreSQL, Tableau, Git Hub, Linux/Unix.

Data Engineer

Care Health Insurance Ltd

04.2018 - 11.2019

Developed data pipeline using Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis
Delivered de-normalized data for Power BI consumers for modeling and visualization from the layer in Data Lake
Written Kafka REST API to collect events from the front end
Involved in creating HDInsight cluster in Microsoft Azure Portal also created Events hub and Azure SQL Databases
Hands-on experience working with Azure Data Factory (ADF)., Used ADF to ingest the data to move the data from the on-prem SQL server to Blob Storage container
Worked with ADF to submit the jobs to Databricks and Snowflake clusters
Tested Apache TEZ, an extensible framework for building high-performance batch and interactive data processing applications, on Pig and Hive jobs
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables
Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers
Wrote installation scripts in Python and Bash as needed
Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage
Enhanced and optimized product Spark code to aggregate, group, and run data mining tasks using the Spark framework
Involved in running all the hive scripts through the hive
Hive on Spark and some through Spark SQL
Involved in importing real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports
Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks
Involved in complete big data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS
Troubleshooting the Azure Development, configuration, and Performance issues
Interacted with multiple teams who are responsible for Azure Platform to fix the Azure Platform Bugs
Used Jira for bug tracking and Bitbucket to check in and checkout code changes
Environment: Scala, Azure, HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, Kafka, Impala, Bash, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting.

Full Stack Developer

L & T Finance Ltd

06.2016 - 04.2018

Developed Windows pages using WPF, Custom user Controls, User controls in C#
Created event driven Web Forms using ASP.NET and implemented form validation using Validation controls, both Custom validation and JavaScript was written for Client-side validations and used the new validation controls in VS
Developed XML Web Services and WCF services common for various applications using .NET Framework
Used Microsoft Teams & AWS Code Commit for the purpose of Version Control and Source Code maintenance needs
Used Microservice architecture, based services interacting through a combination of REST and leveraging AWS to build, test and deploy Identity Microservices
Involved in complete application stack in Amazon Web Services (AWS) like EC2, S3 Bucket
Responsible for maintaining and expanding AWS (Cloud Services) S3 infrastructure using AWS (SNS, SQS)
Create and develop new features for the Auto loop/Service Book/ASR3 products Developed cross browser compatible, customer facing online application based on n -Tier architecture
Hands on experience in Visual Studio.NET IDE to design the forms, develop and debug the application
Involved in building a rich web experience using JavaScript
Used Ajax tool kit, Multi views, regular expressions, regular expression validates, user search controls
Extensively used ADO.NET, XML in Order to get the high-level performance for the web controls
Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views using PL-SQL in Development and Production environment for SQL
Worked on Web API for interacting the Angular App to the business logic layer of MVC
Worked on creating the database and developed multiple T-SQL Procedures, Functions and SQL Queries that can handle the business rules
Utilized Data Access Components, Web Services and Business Layers
Implemented Forms-based Authentication in ASP.NET to authenticate the users
Extensively worked with Datasets to reduce hits to database server
Developed test cases and performed unit testing
Have done unit testing and integration testing after completion of module coding
Developed SQL queries, T-SQL procedures, Functions, triggers to handle the business rules data integrity and for various data transactions
Performed extensive Unit testing using N Unit, developed Test Plans, Test Cases
Team Foundation Server (TFS) was used to maintain the version and source control.

Education

Master’s - Data Science

UMBC

Bachelor’s - ECE

JNTUK

Skills

Technical Skills:
Hadoop Eco System
Hadoop, MapReduce, Spark, HDFS, Sqoop, YARN, Oozie, Hive, Impala, Apache Airflow, HBase
Programming Languages
PL/SQL, SQL, Python, Scala, Java
Data Bases
MySQL, SQL Server, Oracle, MS Access, Snowflake
NoSQL Data Bases
Cassandra, HBase, Dynamo DB
Workflow mgmt tools

Oozie, Autosys, Apache Airflow, Jira
Visualization & ETL tools
Tableau, Power BI, Informatica, Talend
Cloud Technologies
Azure, AWS, GCP
IDE’s
Eclipse, Jupyter notebook, Spyder, PyCharm, IntelliJ
Environment:
C#NET, ASPNET, ADONET, visual basic, WPF, ASPNET, CSS, SQL Server 2008R2 and 2012, XML, Packages, Visual Studio 2013, TFS, JavaScript, SSRS, SSIS

Accomplishments

Knowledge of ETL methods for data extraction, transformation, and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis
Used Spark Data Frames API over the Cloudera platform to perform analytics on Hive data and Used Spark Data Frame Operations to perform required Validations in the data
Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka
Worked on reading multiple data formats on HDFS using Scala
Good understanding and knowledge of NoSQL databases like MongoDB, PostgreSQL, HBase, and Cassandra
Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, and XML Files
Mastered in using different columnar file formats like RC, ORC, and Parquet formats
Has a good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO, etc
Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processing
Had extensive working experience on RDBMS such as Oracle, DevOps, Microsoft SQL Server, and MYSQL and Worked with NoSQL databases like HBase, Dynamo DB, and Cassandra
Extensive experience working on various databases and database script development using SQL and PL/SQL
Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT
Have very strong interpersonal skills and the ability to work independently and with a group, can learn quickly, and is easily adaptable to the working environment.

Timeline

Big Data Engineer

Wipro

11.2019 - 09.2021

Data Engineer

Care Health Insurance Ltd

04.2018 - 11.2019

Full Stack Developer

L & T Finance Ltd

06.2016 - 04.2018

Master’s - Data Science

UMBC

Bachelor’s - ECE

JNTUK

Sahithi V

Summary

Overview

Work History

Big Data Engineer

Data Engineer

Full Stack Developer

Education

Master’s - Data Science

Bachelor’s - ECE

Skills

Accomplishments

Timeline

Big Data Engineer

Data Engineer

Full Stack Developer

Master’s - Data Science

Bachelor’s - ECE

Similar Profiles

Ankan BeraAnkan Bera

Dhanasekaran GDhanasekaran G

Jenna LegerJenna Leger

Yogesh YadavYogesh Yadav

DENISE SUMMERS, PharmD, BCGP, CDCESDENISE SUMMERS, PharmD, BCGP, CDCES