Dynamic Sr Data/ETL Engineer with extensive experience at Vanguard Group, specializing in AWS services and data lake architecture. Proven track record in optimizing data ingestion processes, achieving a 50% reduction in costs. Adept at leading teams and implementing innovative solutions, leveraging strong analytical skills and expertise in Informatica Power Center.
Overview
20
20
years of professional experience
Work History
Sr Data/ETL Engineer
Vanguard Group
Plano
11.2020 - Current
Leading the Retail marketing Data Lake Building using AWS S3, Redshift, AWS ETL which is consumed by the downstream machine learning models for targeted Client marketing and Campaigning in UNICA.
Worked on Solution Architecture for Various Data Ingestion pipelines to pull/push data from Campaign Marts into AWS Cloud and enable to query through hive external tables using Glue.
Leading Data Products for Vendor Data Ingestions into Vanguard Systems from Cloud-Cloud and On Perm-Cloud using Dell Boomi, Data Synch, EMR, ECS and AWS ETL Glue Services.
Performing Data Ingestion and Data Analytics on Vanguard’s Data Lake using AWS platform where used PySpark on EMR.
Improved Performance monitoring through Ganglia and resizing cluster helped to reduce ingestion cost by 50%, strong understanding of EMR instance fleet.
Created Spark Application to load data into Dynamic Partition Enabled Hive Table and Created Oozie Jobs for workflow of PySpark, and Shell scripts.
Developed code modules in Git and thereby deployed the CI/CD pipelines into production using tools like Bamboo.
Standardized the entire flow of data from source to the target tables simplifying the entire process resulting in data load, extraction and transformation time reduced by 21% using Spark-SQL and Python code.
Automated processes for data quality checks in the production environment by writing complex python and shell scripts and scheduled them using LSF, thus decreasing manual effort by 30%.
--WORKING AS LEAD DATA ENGINEER FOR RETAIL MARKETING SYSTEMS WHERE LEAD THE RETAIL DATA LAKE DEVELOPMENT USING AWS SERVICES
--IMPROVISED PERFORMANCE OF LONGRUNNING JOBS
--CREATED SPARK APPLICATIONS TO LOAD DATA FROM EXTERNAL SOURCE SYSTEMS INTO VANGUARD AWS S3
--AUTOMATED PROCESSES FOR DATA QUALITY CHECKS
Environment: Informatica Power Center 10.1, AWS (S3, RedShift, EC2, ECS, ETL Glue, Crawler, Athena, Redshift, Lambda, Data Synch), Tableau, Control-M.
Lead Data Engineer
Davidson Kempner Capital Management
Philadelphia
12.2019 - 10.2020
Started the Team - Data Analyst as Lead, hired couple people along with sharing 5 other employees from other team where now responsible for any of the data Analysis and Profiling to be completed for the DK Databases.
Responsible for Managing and Leading the Redshift Platform for Davidson Kempner where we have 12 Nodes with each 2.5 TB of data and manages a user base of 200.
Worked on migration of data from On-Prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
Successfully Implemented the AWS Redshift Spectrum Services where Converted all existing Redshift loads to have the External Tables created using Spectrum.
Currently Responsible for Unloading the External Vendor Data, (close to 85 vendors) into Redshift.
Responsible for Daily, weekly, Monthly, Quarterly and Yearly data loads where automates majority of the scripts using Python
Framework for pulling the data from multiple source locations and loading it into SQL Server and Redshift.
Environment: Python, AWS (S3, RedShift, EC2, ECS, ETL Glue, Athena, Redshift Spectrum, lambda, Data Synch and Tableau
--STARTED THE TEAM FOR DATA ENGINEERING AS LEAD AND RESPONSIBLE FOR ANY KIND OF DATA ENGINEERING DESIGN AND DEVELOPMENT
--RESPONSIBLE FOR MANAGING AND LEADING THE REDSHIFT PLATFORM WHICH HAS 12 NODES
--SUCCESSFULLY IMPLEMENTED REDSHIFT SPECTRUM INSTALLATION AND USAGE
--BUILD ETL FRAMEWORK FOR RE-USABILITY
Lead Data Engineer
Comcast Cable Communications
West Chester
10.2015 - 11.2019
Managed and delivered complicated ETL Solutions Using Technologies Teradata, Informatica, Bigdata, Business Objects and dealt with Huge Volume of Data (Billions of Rows Upton 16 billion in single Table).
Managed and supported the onboarding of products to the Next Gen Arcadia Platform for already established agile methodology principles and the Next Gen Arcadia Portfolio Framework.
Implemented Widespread Data Quality and major data improvement efforts which included process redesigning, data management policy changes, and large-scale data clean-up as required through National Data Warehouse.
Designed and Developed Hive queries to transform data for further downstream processing of 12 Subject Areas of AMDOCS Billing.
Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
Used the Combination of Teradata BTQ and Informatica Suites (Power Center and IDQ Developer) to load Entire Comcast Customer historical data to determine the direction of future trends to help determine the Customer Behavior.
Environment: Informatica Power Center, AWS Services (S3, EMR, EC2, Glue, Redshift), UC4, and Shell Scripting, Teradata, AWS (S3, AC2, EMR), DIFA, Informatica Power Center 10.1/9.6.1.
Sr ETL Developer
CareFirst BCBS
Washington D.C
10.2014 - 10.2015
Managed and successfully deployed the Facets Data model design and Facets Upgrades from versions 4.8 to 5.4 where lead the Design and Development Teams for data Flow from Facets to ODS and downstream Systems.
Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
Was Part of DGS Automation Project Involved in Design, Development of overall Process Automation of Source to Target Data Validation for Multiple Source systems
Defined Target Load Order Plan for loading data into Target Tables where Created a Heartbeat Table to Capture all the Recordings for Daily Runs.
Used SQL Minus Queries to Check the Data Differences for Source and Target Systems for Validation.
Performed Unit and Integration testing to validate reports and mapping functionality.
Implemented the SVN for Migrating Informatica Code from Environment to Another and closely worked with Informatica Admins in Automation for Future Facets Deployments.
Environment: Informatica PowerCenter 9.5.1, HIVE, Facets 5.2/5.1, Informatica Power Exchange 9.5.1, Business Objects 4, Shell Scripting MS SQL Server 2005/2000, Hummingbird, Toad, Putty.
Sr. Data Consultant
Tenet HealthCare
Dallas
01.2014 - 10.2014
Managed, Initiated and Lead Cerner Power Insight EDW Project from scratch where involved, guided, and coordinated with multiple Cerner Associates and successfully Installed/Applied Informatica PowerCenter ETL jobs and implemented in Production.
Participated in the design and implementation of logical and physical technical architecture for all tiers of the Data Warehouse developed for Meaningful Use.
Clearly documented existing and future state requirements, steps to configure extracts and Loads, Validation and troubleshooting docs, Universe Import Steps and Post Package Installation Steps Docs.
Environment: Informatica PowerCenter 9.1.1, Oracle 11g, Business Objects XI 3.1/3.0, Discern Visual Developer, Shell Scripting (Bourne and Korn Shell), SSIS, MS SQL Server , Toad.
Sr. ETL Consultant
Catholic Health Initiatives
Englewood
06.2013 - 01.2014
Extensively used Informatica Power center for extracting, transforming, and loading data from relational sources and non-relational sources.
Expert in Project management methodology and IS standards, change control, quality and system performance methods and metrics.
Developed and maintained appropriate documentation for all Designs solutions for various IS applications.
Attended meetings with stakeholders to define system data and reporting needs where identified the gaps in existing systems and resolved.
Expertise in Identifying communicating and resolving data quality and data reconciliation issues.
Created Test Cases, Test plans and Documented Steps for Data validation of EDW against Cerner Millennium. Validated EDW Data by pulling reports using EDW Universes and SQL queries.
Environment: AIX 6.1 Informatica PowerCenter 9.1.1, Oracle 11g, Business Objects XI 3.1/3.0, Discern Visual Developer, Shell Scripting (Bourne and Korn Shell), SSIS, MS SQL Server , Toad.
Sr. ETL Application Engineer
Baystate Health
Springfield
12.2010 - 06.2013
Managed, Initiated and Lead Cerner Power Insight EDW Project from scratch where involved, guided, and coordinated with multiple Cerner Associates and successfully implemented in Production.
Governed the EDW Team and Facilitated Cerner’s PIEDW design sessions and organized knowledge share sessions to make Team understand PIEDW Solution.
Expertise in using EDW_DM_INFO_CONFIG tool, Configured ETL [Extract, transform, and Load] processes, edw_config.exe for extracts and loads.
Created Test Cases, Test plans and Documented Steps for Data validation of EDW against Cerner Millennium. Validated EDW Data by pulling reports using EDW Universes and SQL queries.
Used Informatica to Configure ETL [Extract, transform, and Load] processes, Jobs, Workflows, and designed shell scripts to execute Daily/weekly and Monthly Loads.
Environment: Informatica Power Center 9.0.1, Business Objects 4, XI 3.1/3.0, Shell Scripting (Bourne and Korn Shell), SSIS, MS SQL Server 2005/2000, Oracle 11g/10g, Hummingbird, Toad.
ETL Analyst
Pfizer Inc.
Groton
02.2010 - 12.2010
User Requirement Gathering and converting the same into Functional and Technical Specifications.
Lead the Major Implementation for two Change Requests to implement the restructured domain and organizations, new jobs, and revised security roles in the Power2Learn (P2L – Pfizer LMS System).
Worked on Stored Procedures in fixing the bugs, modifying the PL/SQL code for Small Enhancements.
ETL Datawarehouse Analyst
AES Corporation Inc.
Arlington
01.2009 - 01.2010
Involved in detail design and development of mappings using Informatica power center.
Analyzed the Sources and Targets and designed complex mappings and Transformations.
Worked with pre and post sessions and extracted data from different Vendor System into Staging Area.
Setting up Batches and sessions to schedule the loads at required frequency using PowerCenter Workflow manager.
Created reports using Informatica for Different marketing to evaluate the fund performance.
Environment: Informatica Power Center 8.5/, Oracle 8i/9i, MS SQL Server 2005/2000 , Shell Scripting (Bourne and Korn), Sun Solaris 2.6 and Windows NT and Autosys.
Data Analyst
TIAA – CREF
Charlotte
11.2007 - 12.2008
Worked in ITPM (IT Product Management) and Individual Client Services teams managed existing products and designed and built new products.
Translated Business requirements into Functional Specifications.
Designed mappings to pull Daily/Weekly/Monthly/Yearly data and write it to Data warehouse.
Managed a team of 2 developers, helped them understand TDD and build mappings.
Designed the Data Warehouse to store VUL (Variable Universal Life) and VA (Variable Annuity) data, which included Policy Holders Information and daily Transactions from Tiaa-Cref Funds and annuities.
Software Engineer
Infosys
Bangalore
07.2005 - 11.2006
Followed rigorous SDLC and best practices of Project Management. Developed ETL design
12 plus years of Experience in Teradata, Oracle, MS SQL Server, and PL/SQL Scripting.
12 plus years of Experience in Strong Data Warehousing ETL using Informatica PowerCenter and different databases for extracting, transforming, cleansing, and loading data in batch & real time.
Lead Various Developments for Building Data Lakes and Worked as Sr/Lead Data Analyst for several AWS Data Products where lead the team to move the data between on-Prem and AWS cloud.
Experience in Designing and development of automation scripts and batch jobs to create data pipelines between multiple data sources, Amazon S3, Redshift.
Experience in designing and developing Frameworks for Type1 and Type II for ETL Data Ingestions for DataMart’s.
Strong understanding of the principles of Data warehousing including Dimensional Data Modeling experience on Data modeling, (Logical, Physical, Star Schemas) Star/Snowflake Modeling, Physical & Logical data modeling.
Perform Design/Architecture reviews, Data Model Reviews, Code Reviews and help the solution delivery team in maintaining quality standards, consult/educate/mentoring developers in Teradata DDL/DML best practices.
Experienced in leading technical teams, including onshore/offshore project teams, managed client interactions, and handled multiple roles.