As a Data enthusiast with 8+ years of experience in the field of Data Engineering/Business Intelligence who loves to harness the power of data and translating solutions to business problem through code, statistical and quantitative data analysis and gain meaningful insights to make key business decisions. Working in different positions with different clients gained the ability to identify and translate business requirement by paying attention to the specifics of the dataset recognizing the business issue and knowing exactly what to draw from the dataset, gathering all the information that is important for data analysis. Good grasper of domain knowledge regardless of the industry through effective research, learning and communication skills.
Overview
10
10
years of professional experience
Work History
Senior Data Quality Engineer
Salesforce
San Francisco, CA
06.2022 - Current
Performeddataprofiling and analysis of variousobjects in SalesForce.com (SFDC) and MS Access database tables for an in-depth understanding of source entities, attributes, relationships, domains, sourcedataquality, hidden and potentialdataissues, etc.
Worked with peers to implementfullAgilemethodology to manage projects and reduceddowntime by 28% in firstfourmonths.
Led complete lifecycle of visualanalytical applications, from designing mock ups and storyboards to developing and deploying a complete production-readyapplication using Ataccama.
Leaddataoperations with a thorough knowledge and understanding of relationaldatabases, datastructures, and ability to cleanse, analyze, and connect data from multiple sources.
ImplementedbulkAPI to extract huge volumes of data from the saleforceAPI's and address all performance issues encountered.
Managed all functions of the productcatalog, includinginitialupload, periodicupdates, and troubleshooting any dataqualityissues.
Perform analysis, design, development, and configuration to establish AtaccamaONEDatamanagementtool, and processes for the organization.
Perform dataprofiling, complexsampling, statisticaltesting, and testing of reliability on data.
Identify incompletedata, improvequality of data, and integratedata from severaldatasources.
Interact with the users to gather the requirements, and functional specifications to developETLprocedures that are consistent across application and systems.
Expertise in Salesforce-InformaticaIntegration. Experience in SalesForceCRMConfiguration, Customization, Testing of applications.
Designed and developedInformaticaETLInterfaces to load data incrementally from SFDC, MS Access databases and Flat files into Staging schema.
Designed and developedInformaticaETL/SCD Interfaces to load data from stagingschema into the CustomerReportingdimensiontables.
Designed and developedInformaticaETLInterfaces to load data from staging schema into the Customer Reporting DWFacttables.
ScheduledETLjobs for entireDW load process using TidalScheduler.
DevelopedSource-TargetMap documents to describe relationship between source and targetdata.
DevelopedETLSpecificationDesigndocument containing detailed information on ETLprocessing, mapping/workflowspecifications, exception handling process, staging and datawarehouseschemas, etc.
Configure the connections in informatica to extract the data from SFDC API’s.
Extracted data using informaticapowercenter and dataloaders by connecting to SFDC cloud.
Tuned the Synchronization and Replication jobs to reduce the runtimes of the jobs and eliminatedatacontention in bulk datajobs.
Working on Snowflakemodeling and highlyproficient in datawarehousing techniques for datacleansing, SlowlyChangingDimensionphenomenon, surrogatekeyassignment and change data capture.
Consulting on SnowflakeDataPlatform Solution Architecture, Design, Development and deployment focused to bring datadriven culture across the enterprises.
DevelopedTalendMDMjobs to populate the claimsdata to data warehouse - star schema, snowflake schema, Hybrid Schema
Lead Data Engineer - Data Analytics & Engineering
Collective Health
Chicago, IL
05.2021 - 06.2022
Engineered and orchestrate data flows & pipelines in cloudenvironment using a progressive tech stack.
Organized system operatingprocedures to strengthencontrols.
Identified needed businessimprovements and determined appropriate systems required to implementsolutions.
Ingested and integrateddata from large number of disparatedatasources.
Construction and optimization of datawarehouse and datapipelines.
Cleansing the data for normaldistribution by applying various techniques like missingvaluetreatment, outliertreatment, and hypothesistesting.
Perform DataCleaning, featuresscaling, featuresengineering using pandas and numpy packages in python.
Create several types of datavisualizations using Python and Looker.
Installed and configured apacheairflow for workflowmanagement and createdworkflows in python.
Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflowmanagement and automation using Airflowtool.
Implemented dataloadpipelinealgorithms in python and SQL.
Use Python,SQLprogramming on a daily basis to perform transformations for applyingbusinesslogic.
Created test scripts for regressiontests, smoketests and unittests.
Integrated Airflow to perform recurrent ETLbatch jobs.
Developed ETL pipelines for structured and unstructured data using Pandas, and FeatureTools.
Lead and implement master datamanagement system to include datagovernance, datadictionary and datacategory.
Used AWSS3Buckets to store file and injected files into Databricks using Autoloaders and run deltas using Datapipelines.
Set up scripts for creation of new snapshots and deletion of old snapshots in S3 using S3CLItools.
Created PySpark Scripts to improve performance of the application.
Generated ad-hoc SQL queries using joins, databaseconnections, and transformationrules to fetchdata from legacy DB2 and SQL Server database systems.
Establish singlesource of truth for variousdata to increaseavailability, accessibility and scalability.
Created and Maintained Logical Data Model (LDM) for project. Includes documentation of all Entities, Attributes, Data Relationships, Primary, and Foreign key Structures, Allowed Values, Codes, Business Rules, Glossary Terms, etc.
Developed and maintained a datadictionary to create metadata reports for technical and business purposes.
Worked closely with businessteam, developmentteam and the qualityassuranceteam to ensure that desired functionalities will be achieved by application.
Skilled at designing and implementingSQLqueries using joins (inner joins, outer joins), unions, select within select, orderby, groupby and aggregatefunctions to extractdata from differentdatasources.
Actively involved in walkthroughs and meetings with ProjectTeam to discuss relatedbusiness and projectissues.
ETL Developer
Cisco
Charlotte, NC
01.2019 - 04.2021
Involved as a keyteammember for RequirementAnalysisduringdesignphase and interaction with businessuser.
Adhered to timelines to meet qualityassurancetargets.
Performingtechnical analysis, ETLdesign, development, and deploying on data as per businessrequirement.
DevelopedTalend Bigdata jobs to load heavy volume of data into S3 data lake and then into Snowflake data warehouse.
ETLdevelopment using EMR/Hive/Spark,Lambda,Scala,DynamoDBStreams,AmazonKinesisFirehose, RedshiftandS3.
DevelopedDDLscripts for around 185tables in oracle and developedETL to bring data into oracleenvironment from Hadoop.
Used SparkAPIoverClouderaHadoopYarn to perform analytics on data in Hive.
Developed Scalascripts using both Dataframes/SQL and RDD/MapReduceinSparkforDataAggregation,queries and writingdata back into OLTPsystem through Sqoop.
Wrote various datanormalization jobs for new data ingested into Redshift.
Design dimensional model, data lake architecture, data vault 2.0 on Snowflake and used Snowflake logical data warehouse for compute.
Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
Worked with AWS cloud platform and its features which include EC2,IAM,EBSCloudWatchandAWSS3
Deployed application using AWSEC2standarddeploymenttechniques and worked on AWSinfrastructureandautomation. Worked on CI/CDenvironment on deploying application on Docker containers.
Used AWSS3Buckets to store file and injected files into Snowflake tables using SnowPipeandrundeltasusing
Data pipelines.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Experience in writing SQL queries using SQL Server Management Studio to validate data integrity after Extract, Transform, and Load (ETL) processes in Enterprise Data Warehouse (EDW). Knowledge of Informatica PowerCenter - Data Quality. Exposure to other functionalities like metadata reporting, advanced transformations, & partitioning.
Optimizing existing algorithms in Hadoop using SparkContext,Spark-SQL,DataFramesandPairRDD.
Used Spark Streaming APIs to perform necessary transformations and actions on fly for building common learner data model which gets data from Kafka in near real-time and Persists into Cassandra.
Implementing strategy to migrate Netezza based analytical systems to Snowflake on AWS.
Worked with Architect on final approach and streamlined integration - Informatica with Snowflake.
Created various reports using Tableau and Qlikview based on requirements with BIteam.
ETL Developer
Bank Of America
Dallas, TX
01.2017 - 07.2018
Designedintegration tools to combine data from multiple, varied data sources such as RDBMS, SQL and bigdata b.
Designed and createdETLcodeinstallations, aiding in transitions from one datawarehouse to another.
Collaborated with businessintelligence staff at customer facilities to produce customized ETLsolutions for specific goals.
Interpreted data models for conversion into ETLdiagrams and code.
Performeddataextraction, transformation, loading, and integration in datawarehouse, operationaldatastores and masterdatamanagement.
Involved in designing, developing, testing and documenting an application to combine personalloan, creditcard and mortgage from differentcountries and loaddata to Sybasedatabase from hivedatabase for Reporting insights.
Developed a architecture to move project from Abinitio to pyspark and scalaspark.
Implemented enterprise grade platform (Mark logic) for ETL from mainframe to NoSQL(cassandra).
Extract Transform and Load data from Sources Systems to AzureDataStorageservices using a combination of Azure DataFactory,T-SQL,SparkSQLandU-SQLAzureDataLakeAnalytics. Data Ingestion to one or moreAzureServices-(AzureDataLake,AzureStorage,AzureSQL,AzureDW) and processing data in In AzureDatabricks.
Buildingdistributed data scalable using Hadoop.
Using Sqoop to loaddata from HDFS,Hive,MySQL and many other sources on daily bases.
Used DeltaLakes for time travelling as Dataversioningenablesrollbacks, fullhistoricalaudittrails, and reproducible machine learning experiments.
DevelopSparkapplications using PySpark and sparkSQL for dataextraction, transformation, and aggregation from multiple file formats for analyzing and transformingdata uncover insight into customerusagepatterns.
Created Pipelines in ADF using LinkedServices/Datasets/Pipeline/toExtract,Transformandloaddata from different sources like AzureSQL, Blobstorage,AzureSQLDatawarehouse, write-back tool and backwards.
Using Enterprise data lake to support various use cases including Analytics, Storing and reporting of Voluminous,structuredandunstructured,rapidlychangingdata.
Exported analyzed data into relationaldatabases using Sqoop forvisualization and to generate reports for BI team.
Converting dataloadpipelinealgorithms written inpythonandSQLtoscalasparkandpyspark.
Mentor and support other members of team (both on-shore and off-shore) to assist in completing tasks and meetobjectives.
Big Data Developer
DaVita Dialysis Corporation
Nashville, TN
01.2015 - 07.2016
Wrote software that scaled to petabytes of data and supported millions of transactionsper second.
Worked in hybridenvironment where legacy and datawarehouseapplications and new big-dataapplications co-existed.
Partnered with infrastructureengineers and systemadministrators in designingbig-datainfrastructures.
Engaged with businessrepresentatives, businessanalysts and developers and delivered comprehensive business-facinganalyticssolutions.
Involved in building scalable distributed data lake system for Confidentialreal time and batch analytical needs.
Involved in designing,reviewing,optimizingdatatransformationprocesses using ApacheStorm.
Experience in job management using Fair Scheduling and Developed job processing scripts using Control-M workflow.
Used SparkAPI over ClouderaHadoopYARN to perform analyticsondatainHive.
Developed Scalascripts, UDFs using both Dataframes/SQLandRDD/MapReduceinSpark1.6forDataAggregation,queriesandwritingdatabackintoOLTPsystemthroughScoop.
Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Loaded dataintoSparkRDD and do inmemorydatacomputation to generate outputresponse.
Optimizing of existingalgorithmsinHadoopusingSparkContext,Spark-SQL,DataFramesandPairRDD’s.
Performed advanced procedures like text analytics and processing, using in-memory computing capacities of Spark using Scala.
Responsible for ingesting data from various source systems (RDBMS, Flat files, BigData) into Azure (Blob Storage) using framework model.
Hands on experience using Azure Data Factory (ADF) to perform data ingestion into Azure Data Lake Storage (ADLS).
Created Spark clusters and configured high concurrency clusters using Azure Databricks to speed up preparation of high-quality data.
Primarily involved in Data Migration process using SQL, Azure SQL, SQL Azure DW, Azure storage and Azure Data Factory (ADF) for Azure Subscribers and Customers.
Implemented Custom Azure Data Factory (ADF) pipeline Activities and SCOPE scripts.
Primarily responsible for creating new Azure Subscriptions, data factories, Virtual Machines, Sql Azure Instances, SQL Azure DW instances, HD Insight clusters and installing DMGs on VMs to connect to on premise servers.
Imported data from KafkaConsumer into HBase using Sparkstreaming.
Experienced in using Zookeeper and OozieOperational Services for coordinating cluster and schedulingworkflows.
Used Oozieworkflow engine to manage interdependentHadoop jobs and to automate several types of Hadoop jobs such as JavaMapReduce,HiveandSqoopaswellassystemspecificjobs.
Experienced in handling large datasets using partitions,SparkinMemorycapabilities,BroadcastsinSpark,EffectiveefficientJoins,Transformationandotherduringingestionprocessitself.
Worked on migratinglegacyMapReduceprograms into SparktransformationsusingSparkandScala.
Worked on a POC to compare processing time for ImpalawithApacheHiveforbatchapplications to implement former in project.
Worked extensively with Sqoop for importingmetadata from Oracle.
Data Analyst
Tech Mahindra
Bangalore, India
05.2013 - 12.2014
Created various Exceldocuments to assist with pulling metricsdata and presentinginformation to stakeholders for concise explanations of bestplacement for needed resources.
Producedmonthly reports using advanced Excel spreadsheet functions.
Documentedeffective and replicable methods for extractingdata and organizingdatasources.
Analyzed transactions to build logicalbusinessintelligence model for real-timereporting needs.
Worked with businessintelligencesoftware and various reports to glean insights into trends and prospects.
Performed as a DataAnalysis,DataModeling,DataMigrationanddataprofilingusingcomplexSQLonvarioussourcessystemsincludingOracleandTeradata.
Experienced in building applications based on large datasets in MarkLogic.
Translated business requirements into working logical and physical data models for Data warehouse, Data marts andOLAPapplications.
Analyzeddata lineage processes to identify vulnerable datapoints, control gaps, dataquality issues, and overalllackofdatagovernance.
Worked on data cleansing and standardization using cleanse functions in Informatica MDM.
DesignedStarandSnowflakeDataModels for Enterprise Data Warehouse using ERWIN.
Validated and updated the appropriate LDM’s to process mappings, screen designs, use cases, business object model, andsystemobjectmodelastheyevolveandchange.
Maintaineddatamodel and synchronized it with changes to database.
Designed and developed use cases,activitydiagrams,andsequencediagramsusingUML.
Extensively involved in modeling and development of ReportingDataWarehousingSystem.
Designeddatabase tables createdtable and column level constraints using suggested namingconventions for constraintkeys.
Implementedenterprise grade platform (Mark logic) for ETL from mainframe to NOSQL (cassandra).
Used ETLtoolBODStoextract,transformandloaddataintodatawarehouses from various sources likerelationaldatabases,applicationsystems,temptables,flatfilesetc.
Wrote packages,procedures,functions,exceptionsusingPL/SQL.
Reviewed database programming for triggers, exceptions, functions, packages, procedure
Skills
SQL
undefined
Work Availability
monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse
Quote
Judge a man by his questions rather than his answers.