· Overall 12+ years of hands-on experience with building product ionized data ingestion and processing ETL pipelines using Spark, Scala, Python, PL/SQL, Informatica, IICS, Azure Data Factory, AWS Glue, Snowflake and also experience in designing and implementing production grade data warehousing solutions on large scale data Organizations.
· Experience with Snowflake Technology and building Snowflake Multi- Cluster Warehouses using AWS S3 and Azure Blob storage for integrating data from multiple source system including JSON Data formats.
· Hands-On experience with Snowflake utilities, SnowSQL, Snowpipe and Data model techniques using Python.
· Hands-On experience in Azure Development, Worked on Azure-App services, Azure storage, Azure SQL Database, Virtual Machines, Azure AD, Azure Functions, Notification Hub and AKS.
· Worked on setting up of Jenkins CI/CD for the Team as a CI tool.
· Expertise in RDBMS including Oracle SQL, MS SQL Server with thorough knowledge in writing SQL queries, Stored Procedures, Views, Functions, Packages, Triggers, Testing exception handlers, Cursors & Tables and Object Types.
· Have extensively worked developing ETL program for Supporting Data Extraction, transformations and loading using Informatica Power Center and Informatica Intelligent Cloud Services.
· Extensive experience in designing and implementation of continuous integration, continuous delivery, continuous deployment through Jenkins.
· Extensively used Software Version Control GitHub to deploy the code and follow the SDLC.
· Hands On experience in Python, PySpark, Scala and Bash scripts.
· Experience on Migrating SQL database to Azure Data Lake, Azure Data Analytics, Azure SQL Database, Azure Synapse, Azure Data Bricks and Azure SQL Datawarehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake Storage using Azure Data factory.
· Analyze, Design and build Modern data solutions using Azure PaaS service to support Visualization of Data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
· Extract Transform and Load data from Source Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL data analytics. Data Ingestion to Azure Services and processing the Data in Azure Databricks.
· Strong Experience and Knowledge of real-time data analytics using Spark Streaming and Kafka.
· Experience working with IICS tool effectively using it for Data Integration and Data Migration from multiple source systems.
· Expertise in Informatica Power Center 10.2/10.1/9.5.1/9.0.1/8.6.
· Performance monitoring and Optimizing indexes tasks by using Performance Monitor, SQL Profiler, Database tuning Advisor and Index tuning Wizard.
· Worked in determining various strategies related to data security.
· Created common reusable objects for the ETL team and Review coding standards.
· Involved in fixing various issues related to data quality, data availability and data stability.
· Participated in Design meetings for creation of Data Model and provide guidance on best architecture practices.
· Good Knowledge in various phases of SDLC Requirement Analysis, Design, Development, and Testing on Various Development and Enhancement projects.
· Experience in various methodologies like Agile and Waterfall.
· Leading the team members with the challenges faced during the project development process and problem solving.
· Interacted with end customers and gathering requirements for Designing and developing common architecture for storing Clinical Data within Enterprise and building data lake in Azure Cloud.
· Developed Applications using PySpark to integrate data coming from other sources like ftp, csv files processed using Azure Databricks and written into Snowflake
· Developed Spark Applications for data extraction, transformation and aggregation from multiple systems and stored on Azure Data Lake Storage using Azure Databricks notebooks.
· Written Unzip and decode functions using Spark and Scala and parsing the xml files into Azure blob storage.
· Analysed SQL scripts and designed the solution using Scala.
· Developed analytical component using Scala, Spark and Spark Stream.
· Developed PySpark scripts from source systems like Azure Event Hub to ingest data in reload, append, and merge mode into Delta tables in Databricks.
· Created Pipelines in ADF to copy parquet files from ADLS Gen2 location to Azure Synapse Analytics Data Warehouse.
· Implemented Azure Resources like AKS, Storage Accounts, VMs, Data Bases, Functions, Event Hubs, Apps, Key Vaults, Firewalls and Resource Alerts using Terraform Scripts.
· Developed Python Scripts for Transforming Data, Streaming Data, Writing APIs, Sending Emails, Connected to Snowflake using python and loading data to Snowflake.
· Worked on creating SnowSQL, Snowpipe, Snowflake Procedures, DDLs, SQLs to implement ETL processes within Snowflake Environment.
· Created Snowpipe for continuous data load.
· Used COPY to bulk load the data.
· Evaluated Snowflake Design considerations for any change in the application.
· Experience with Source Code Management tool like GIT.
· Set Up Continuous Integration with Jenkins and make use of wide range of plugins available to setup smooth developer friendly workflows.
· Worked on setting up of Full CI/CD pipelines so that each commit a developer makes will go through standard process of software lifecycle and gets tested well enough before it can deploy to production.
· Built ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL. Written SQL queries against Snowflake.
· Created ETL processes to extract files for the external vendors and coordinated with different teams for that effort.
· Scripts written in Python for extracting useful data from the design database.
· Creating the ETL process to build the DWH.
· Build SQLs in Teradata to load create extract files to be loaded into Azure BLOB storage.
· Create Procedures to build ETL jobs to load Dimension and Fact Tables
· Create Data Warehouse, Database, Tables in Snowflake
· Cost Analyze Snowflake queries, storage and optimizing the cost of the snowflake subscription.
Environment: Azure ADF, Scala, Pyspark, Spark, SQL, Databricks, Azure Synapse, ADF Blob Storage, ADF Gen2, Snowflake, Azure Storage Services, Python, Terraform, Jenkins, GitHub, Teradata, Airflow, Shell Scripting.
· Understanding the business rules and sourcing the data from multiple source systems using IICS.
· Designed and customizing data models for Data warehouse supporting data from multiple sources.
· Implemented Type1, Type2, CDC and Incremental startegies.
· Extracted data from flat files, Excel files and transformed the data based on user requirement using Informatica Power Center.
· Developed Data Integration Platform components/processes using Informatica Cloud Platform, Azure SQL Datawarehouse, Azure Data lake Store and Azure Blob Storage Techniques.
· Created IICS connections using various cloud connectors in IICS administrator
· Extensively used performance techniques while loading data into Azure Synapse using IICS.
· Extensively used Push Down Optimization option to optimize processing and use limitless power of Azure Synapse
· Extracted Data from Snowflake to push the data into Azure warehouse instance to support reporting requirements.
· Performed loads into Snowflake using Snowflake connector in IICS to support data analytics and insight use case for sales team.
· Created Python Scripts to create On Demand Cloud Mapping Tasks using Informatica REST AP.
· Developed CDC load process for moving data from People soft to SQL Datawarehouse using “Informatica Cloud CDC for Oracle Platform”
· Developed complex Informatica Cloud Task flows(parallel) with multiple mapping task sand task flows.
· Developed MASS Ingestion tasks to ingest large datasets from on-prem to Azure Data lake store – File Ingestion
Environment: Informatica Intelligent Cloud Services, Informatica Power Center10.2.0, Snowflake, Azure Synapse, Azure Data Lake Sore, Teradata, Python, AWS, Oracle, SQL Server2014, Power Shell scripting, Tableau.
· Analyzing the Business Requirements with Business Analyst to develop ETL procedures that are consistent across all systems.
· Involving in data modelling for SQL server databases to store data retrieved from flat files.
· Normalizing and de-normalizing the flat files and loading them in SQL server.
· Creating Informatica mappings/sessions and workflows.
· Used various Informatica transformations like Lookups, Sorter, Aggregator, and joiner, Union, Transaction Control, SQL Transformation, XML Transformation, Stored Procedure, Router, Normalizer, Rank, Expression and Sequence Transformations.
· Worked on Creating Workflows using various tasks like Session task, Assignment, Command, Decision, Email, Control, timer, Event-Raise, Event-Wait.
· Analyzing the Interfaces requirements for designing & implementing the changes.
· Involving in designing the procedures for getting the data from all the source systems to data warehousing systems.
· Working closely with the Business on day-to-day basis.
· Profiling the data and comparing with the same data present in the Unite US system which is the old LOS meant for decommissioning next year.
· Used Informatica HTTP Transformation to make API calls.
· Guiding the Team on code build process and resolve technical challenges.
· Used Informatica, SQL Server for ETL Development.
· Running end to end impact analysis for different scope changes.
· Creating Unit test case documents.
· Working closely with the Interfaces teams.
· Researching / resolving issues, risks, action items
· Autosys jobs execution and monitoring.
· Research defects
· Help with performance test plan and implementation.
· Responsible for Release / Go-live plan.
· Making Sure all the Release documentation are accurate and approved.
Environment: Informatica 10.2.0, SQL Server 2008, Oracle, Power Shell scripting, Power BI.
Responsibilities:
· Analyzing data and requirements for designing & implementing the changes. Changes involved in Spark Jobs and Oracle.
· Working closely with the Business on day-to-day basis.
· Understand the requirements and in-turn working with my off-shore teams.
· Profiling Raw data integrating from different sources.
· Used Informatica, Spark and Oracle for incremental and history fixes.
· Worked on Informatica mappings/workflows creation.
· Worked on Several Informatica transformations and different input file types.
· Created Unix shell scripts for transforming and analyzing flat file data.
· Extracted data from various heterogeneous sources like Oracle, SQL Server, flat files and mainframe files
· Working with Data residing in Hive and HBase on day-to-day basis.
· Attending multiple work stream meetings like Core, Compensation, and Recruiting – to understand the requirements and driving the solutions.
· Running end to end impact analysis for different scope changes.
· Creating Unit test case documents.
· Working closely with the business, consumers and technical team.
· Researching / resolving issues, risks, action items
· Participates in Model changes.
· Support weekly / monthly status reporting to key stake holders.
· Influence senior leadership to adopt new ideas, products, and/or approaches.
· Owns SDLC workbook and process for all phases of project.
· Research defects
· Help with performance test plan and implementation.
· Responsible for Release / Go-live plan.
· Helping business team providing technical support and helping them providing signoffs.
· Participate in Project closure input.
· Onshore-Offshore Coordination
Environment: Oracle, Unix, Spark, Hive, HBase, TWS, Oozie, Enterprise Architect and IHR Interface.
Cloud Ecosystem: Snowflake Data Cloud, Azure(Databricks, Data Factory, Synapse, ADLS Gen2) and AWS(EC2, EMR, Lambda, Athena, Glue, Redshift and S3)
ETL: Informatica Power Center 10x, Informatica Power Exchange 8x, Informatica Intelligent Cloud Services (IICS)
Databases : Snowflake, Oracle 12c, MS-SQL Server, Teradata, Hive, DB2 UDB, MS Access
Operating Systems: Windows 10/ XP /NT 40/2000, Unix
Scheduling Tools: Airflow, Autosys, Control-M, TWS
Programming Languages: Python, PySpark, Scala, SQL, PL/SQL, Perl, Shell Scripting
Continuous Integration: Jenkins
Version Control: SVN, GIT, GitHub
Monitoring: Splunk
Scheduling Tools: Autosys, Apache Airflow and TWS
BigData Technologies: Spark, Hbase, HIVE, Sqoop, Pig and Oozie
Streaming: Spark Streaming and Kafka
Visualization/Reporting: Power BI, Tableau
Won Several STAR performance awards for continuous quality performance.
Won SPOT awards for on-spot fixing of Production Defects.