Summary
Overview
Work History
Education
Skills
Accomplishments
Timeline
Generic

Sahithi V

Westborough,MA

Summary

" Experienced data engineer with a strong foundation as a .NET full stack developer, bringing five years of proven expertise in software development and a demonstrated trajectory of career growth. Eager to apply my technical proficiency in data engineering, ETL processes, and data warehousing to architect and optimize data solutions that empower businesses to extract actionable insights. Committed to leveraging my diverse background to bridge the gap between software development and data engineering, while continuously expanding my knowledge and contributing to innovative data-driven projects." Summary of Experience: Over 5 Years of strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification, and Testing as per Cycle in both Waterfall and Agile methodologies. Strong Experience with Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, EMR, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Auto Scaling, and Security Groups. Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, and Azure Data Factory. Hands-on experience on Google Cloud Platform (GCP) in all the big data products BigQuery, Cloud Data Proc, Google Cloud Storage, and Composer (Air Flow as a service). Strong experience in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue. Excellent programming skills with experience in Java, PL/SQL, SQL, Scala, and Python Programming. Hands-on experience in writing Map Reduce programs using Java to handle data sets using Map and Reduce tasks. Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirements using Hive and working on HiveQL queries for required data extraction, joining operations, writing custom UDFs as required, and having good experience in optimizing Hive Queries. Experience with ETL concepts using Informatica Power Center, AB Initio. Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and loading into Hive tables, which are partitioned. Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics. Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala. Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming.

Overview

5
5
years of professional experience

Work History

Big Data Engineer

Wipro
11.2019 - 09.2021
  • Worked on Big data on AWS cloud services i.e
  • EC2, S3, EMR, and DynamoDB
  • Migrated on-premise database structure to Confidential Redshift data warehouse
  • Was responsible for ETL and data validation using SQL Server Integration Services
  • Defined and deployed monitoring, metrics, and logging systems on AWS
  • Connected to Amazon Redshift through Tableau to extract live data for real-time analysis
  • Used Hive SQL, Presto SQL, and Spark SQL for ETL jobs and used the right technology for the job to get done
  • Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints, and created logical and physical models using Erwin
  • Measured Efficiency of the Hadoop/Hive environment ensuring SLA is met
  • Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS)
  • Defined facts, and dimensions and designed the data marts using Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin
  • Worked on publishing interactive data visualizations dashboards, reports /workbooks on Tableau, and SAS Visual Analytics
  • Advanced knowledge of Confidential Redshift and MPP database concepts
  • Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large-scale data handling Millions of records every day
  • Designed and implemented big data ingestion pipelines to ingest multi TB data from various data sources using Kafka, and Spark streaming including data quality checks, transformation, and storage as efficient storage formats Performing data wrangling on Multi-Terabyte datasets from various data sources for a variety of downstream purposes such as analytics using PySpark
  • Wrote build/integration/installation scripts in Python and bash as needed
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
  • Exception handling in Python to add logs to the application
  • Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes
  • Migrated ETL solutions from Redshift to run on the Snowflake database
  • Hands-on working experience with Snowflake in implementing the row level security
  • Managed security groups on AWS, focusing on high availability, fault tolerance, and auto-scaling using Terraform templates
  • Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline
  • Compiled data from various sources to perform complex analysis for actionable results
  • Built performant, scalable ETL processes to load, cleanse and validate data
  • Analyse the existing application programs and tune SQL queries using an execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance
  • Got involved in migrating the on-prem Hadoop system to using GCP (Google Cloud Platform)
  • Wrote various data normalization jobs for new data ingested into Redshift
  • Collaborate with team members and stakeholders in the design and development of the data environment
  • Preparing associated documentation for specifications, requirements, and testing
  • Develop and deploy the outcome using spark and Scala code in the Hadoop cluster running on GCP
  • Developed Unix shell scripts to load large numbers of files into HDFS from Linux File System
  • Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies
  • Environment: Oracle, Kafka, Python, Redshift, Informatica, AWS, EC2, S3, SQL Server, Erwin, RDS, NOSQL, SnowFlake Schema, MySQL, Bash, Dynamo DB, PostgreSQL, Tableau, Git Hub, Linux/Unix.

Data Engineer

Care Health Insurance Ltd
04.2018 - 11.2019
  • Developed data pipeline using Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis
  • Delivered de-normalized data for Power BI consumers for modeling and visualization from the layer in Data Lake
  • Written Kafka REST API to collect events from the front end
  • Involved in creating HDInsight cluster in Microsoft Azure Portal also created Events hub and Azure SQL Databases
  • Hands-on experience working with Azure Data Factory (ADF)., Used ADF to ingest the data to move the data from the on-prem SQL server to Blob Storage container
  • Worked with ADF to submit the jobs to Databricks and Snowflake clusters
  • Tested Apache TEZ, an extensible framework for building high-performance batch and interactive data processing applications, on Pig and Hive jobs
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers
  • Wrote installation scripts in Python and Bash as needed
  • Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage
  • Enhanced and optimized product Spark code to aggregate, group, and run data mining tasks using the Spark framework
  • Involved in running all the hive scripts through the hive
  • Hive on Spark and some through Spark SQL
  • Involved in importing real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports
  • Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks
  • Involved in complete big data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS
  • Troubleshooting the Azure Development, configuration, and Performance issues
  • Interacted with multiple teams who are responsible for Azure Platform to fix the Azure Platform Bugs
  • Used Jira for bug tracking and Bitbucket to check in and checkout code changes
  • Environment: Scala, Azure, HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, Kafka, Impala, Bash, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting.

Full Stack Developer

L & T Finance Ltd
06.2016 - 04.2018
  • Developed Windows pages using WPF, Custom user Controls, User controls in C#
  • Created event driven Web Forms using ASP.NET and implemented form validation using Validation controls, both Custom validation and JavaScript was written for Client-side validations and used the new validation controls in VS
  • Developed XML Web Services and WCF services common for various applications using .NET Framework
  • Used Microsoft Teams & AWS Code Commit for the purpose of Version Control and Source Code maintenance needs
  • Used Microservice architecture, based services interacting through a combination of REST and leveraging AWS to build, test and deploy Identity Microservices
  • Involved in complete application stack in Amazon Web Services (AWS) like EC2, S3 Bucket
  • Responsible for maintaining and expanding AWS (Cloud Services) S3 infrastructure using AWS (SNS, SQS)
  • Create and develop new features for the Auto loop/Service Book/ASR3 products Developed cross browser compatible, customer facing online application based on n -Tier architecture
  • Hands on experience in Visual Studio.NET IDE to design the forms, develop and debug the application
  • Involved in building a rich web experience using JavaScript
  • Used Ajax tool kit, Multi views, regular expressions, regular expression validates, user search controls
  • Extensively used ADO.NET, XML in Order to get the high-level performance for the web controls
  • Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views using PL-SQL in Development and Production environment for SQL
  • Worked on Web API for interacting the Angular App to the business logic layer of MVC
  • Worked on creating the database and developed multiple T-SQL Procedures, Functions and SQL Queries that can handle the business rules
  • Utilized Data Access Components, Web Services and Business Layers
  • Implemented Forms-based Authentication in ASP.NET to authenticate the users
  • Extensively worked with Datasets to reduce hits to database server
  • Developed test cases and performed unit testing
  • Have done unit testing and integration testing after completion of module coding
  • Developed SQL queries, T-SQL procedures, Functions, triggers to handle the business rules data integrity and for various data transactions
  • Performed extensive Unit testing using N Unit, developed Test Plans, Test Cases
  • Team Foundation Server (TFS) was used to maintain the version and source control.

Education

Master’s - Data Science

UMBC

Bachelor’s - ECE

JNTUK

Skills

  • Technical Skills:
  • Hadoop Eco System
  • Hadoop, MapReduce, Spark, HDFS, Sqoop, YARN, Oozie, Hive, Impala, Apache Airflow, HBase
  • Programming Languages
  • PL/SQL, SQL, Python, Scala, Java
  • Data Bases
  • MySQL, SQL Server, Oracle, MS Access, Snowflake
  • NoSQL Data Bases
  • Cassandra, HBase, Dynamo DB
  • Workflow mgmt tools
  • Oozie, Autosys, Apache Airflow, Jira
  • Visualization & ETL tools
  • Tableau, Power BI, Informatica, Talend
  • Cloud Technologies
  • Azure, AWS, GCP
  • IDE’s
  • Eclipse, Jupyter notebook, Spyder, PyCharm, IntelliJ
  • Environment:
  • C#NET, ASPNET, ADONET, visual basic, WPF, ASPNET, CSS, SQL Server 2008R2 and 2012, XML, Packages, Visual Studio 2013, TFS, JavaScript, SSRS, SSIS

Accomplishments

  • Knowledge of ETL methods for data extraction, transformation, and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis
  • Used Spark Data Frames API over the Cloudera platform to perform analytics on Hive data and Used Spark Data Frame Operations to perform required Validations in the data
  • Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka
  • Worked on reading multiple data formats on HDFS using Scala
  • Good understanding and knowledge of NoSQL databases like MongoDB, PostgreSQL, HBase, and Cassandra
  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, and XML Files
  • Mastered in using different columnar file formats like RC, ORC, and Parquet formats
  • Has a good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO, etc
  • Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processing
  • Had extensive working experience on RDBMS such as Oracle, DevOps, Microsoft SQL Server, and MYSQL and Worked with NoSQL databases like HBase, Dynamo DB, and Cassandra
  • Extensive experience working on various databases and database script development using SQL and PL/SQL
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT
  • Have very strong interpersonal skills and the ability to work independently and with a group, can learn quickly, and is easily adaptable to the working environment.

Timeline

Big Data Engineer

Wipro
11.2019 - 09.2021

Data Engineer

Care Health Insurance Ltd
04.2018 - 11.2019

Full Stack Developer

L & T Finance Ltd
06.2016 - 04.2018

Master’s - Data Science

UMBC

Bachelor’s - ECE

JNTUK
Sahithi V