To work in a demanding workplace by showcasing my efficiency, displaying my intellect, and utilizing my software professional talents and adept IT professional with 7+ years of professional IT experience with Data Warehousing/Big Data which includes experience in Big Data ecosystem related technologies like Hadoop, Map Reduce Pig, Hive and Spark, Data Visualization, Reporting, and data quality solutions
Overview
7
7
years of professional experience
1
1
Certification
Work History
Sr Data Engineer
Insight
Chandler, AZ
08.2021 - Current
Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems, and suggest Big Data-based analytical solutions
Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
Implemented Azure data lake, Azure Data Factory, and Azure data bricks to move and conform data from on-premises to cloud to serve the analytical needs of the company
Developed Spark applications using Scala and Spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming data uncover insight into customer usage patterns, and even Responsible for estimating cluster size, monitoring, and troubleshooting of Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session
Worked on Spark Architecture for performance tuning including spark core, spark SQL, Data Frame, Spark streaming, Driver Node, Worker Node, Stages, Executors and Tasks, Deployment modes, the Execution hierarchy, fault tolerance, and collection
Created Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW)
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity and creating UNIX shell scripts for database connectivity and executing queries in parallel job execution
Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored data into HDFS for analysis
Implemented Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product-level forecast
Extracted data from Teradata into HDFS using Sqoop
Controlling and granting database access and migrating on-premise databases to Azure data lake store using Azure Data Factory
Worked on Kafka REST API to collect and load data on Hadoop file system and used Sqoop to load data from relational databases which extracts Real-time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save data as Parquet format in HDFS
Analyzed existing systems and propose improvements in processes and systems for the usage of modern scheduling tools like Airflow and migrating legacy systems into Enterprise data lake built on Azure Cloud
Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications
Worked on various automation tools like GIT, Terraform, Ansible
Developed JSON Scripts for deploying Pipeline in Azure Data Factory (ADF) that process data using the Sql Activity
Created data pipeline package to move data from Blob Storage to MYSQL database and executed MySQL stored procedure using events to load data into tables
Environment: Databricks, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache Spark, Azure ADF V2, ADLS, Spark SQL, Python/Scala, Ansible Scripts, Azure SQL DW(Synopsis), Azure SQL DB
Prepared documentation and analytic reports effectively and efficiently delivering summarized results, analysis, and conclusions to stakeholders.
Sr Data Engineer
Lessen
Scottsdale, AZ
01.2021 - 08.2021
Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing, and Reporting of voluminous, rapidly changing data
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect
Constructed AWS Data pipelines using various resources in AWS including AWS API Gateway to receives responses from AWS lambda and retrieve data from snowflake using lambda function and convert the response into JSON format using Database as Snowflake, DynamoDB, AWS Lambda function, and AWS S3
Developed and implemented data acquisition of Jobs using Scala that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow
Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift
Designed and Developed Spark workflows using Scala for data pulled from AWS S3 bucket and Snowflake applying transformations on it
Migrated existing on-premises application to AWS
Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining Hadoop cluster on AWS EMR
Analyzed large and critical datasets using Cloudera, HDFS, MapReduce, Hive, Hive UDF, Pig, Sqoop, and Spark
Used Git version control to manage source code and integrated Git with Jenkins to support build automation and integrated with Jira to monitor the commits
Written Terraform scripts to automate AWS services which include ELB, CloudFront distribution, RDS, EC2, database security groups, Route 53, VPC, Subnets, Security Groups, and S3 Bucket, and converted existing AWS infrastructure to AWS Lambda deployed via Terraform and AWS CloudFormation
Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
Responsible for the design, development, and administration of complex T-SQL queries (DDL / DML), Stored Procedures, Views& functions for transactional and analytical data structures
Developed Hive queries for analysts by loading and transforming large sets of structured, semi-structured data using hive
Designed data models to be used in data-intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, and definition of key business elements from Aurora
Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau
Collaborated with Data engineers and operation team to implement ETL process, Snowflake models wrote and optimized SQL queries to perform data extraction to fit the analytical requirements
Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as Restful Web services
Involved in converting MapReduce programs into Spark transformations using Spark RDDs on Scala
Interfacing with business customers, gathering requirements, and creating data sets/data to be used by business users for visualization
Developed Kibana Dashboards based on Log stash data and Integrated different source and target systems into Elasticsearch for near real-time log analysis of monitoring End to End transactions
Implemented AWS Step Functions to automate and orchestrate the Amazon Sage Maker related tasks such as publishing data to S3, training ML model, and deploying it for prediction
Involved in various phases of Software Development Lifecycle (SDLC) of application, like gathering requirements, design, development, deployment, and analysis of the application
Worked on creating MapReduce programs to parse data for claim report generation and running Jars in Hadoop
Co-ordinated with Java team in creating MapReduce programs
Designed and Developed Spark Workflows using Scala for data pulled from AWS S3 bucket and snowflake applying transformations on it
Defining, designing, and developing Java applications, especially using Hadoop [Map/Reduce] by leveraging frameworks such as Cascading and Hive
Handled importing of data from various data sources, performed transformations using Hive, and MapReduce, loaded data into HDFS, and Extracted data from SQL into HDFS using Sqoop
Developed analytical components using Scala, Spark, Apache Mesos, and Spark Stream and Installed Hadoop, Map Reduce, and HDFS, and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm, and web Methods
Worked on CI/CD tools like Jenkins, Docker in DevOps Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between
Integrated Hadoop with Oracle to load and then cleanse raw unstructured data in the Hadoop ecosystem to make it suitable for processing in Oracle using stored procedures and functions
Developed workflow using Oozie for running MapReduce jobs and Hive Queries
Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop
Implemented AJAX, JSON, and Javascript to create interactive web screens
Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB
Involved in loading and transforming large sets of Structured, Semi-Structured, and Unstructured data and analyzed them by running Hive queries
Processed image data through Hadoop distributed system by using Map and Reduce then stored into HDFS
Created Session Beans and controller Servlets for handling HTTP requests from Talend
Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders
Developed end to end ETL batch and streaming data integration into Hadoop (MapR), transforming data
Developed custom UDFs in Pig Latin using Python scripts to extract the data from sensor devices' output files to load into HDFS
Developed Python code to gather data from HBase (Cornerstone) and designed solution to implement using PySpark
Designed and develop JAVA API (Commerce API) which provides functionality to connect to Cassandra through Java services
Successfully designed and developed Java Multi-Threading based collector parser and distributor process, when requirement was to collect, parse and distribute the data coming at a speed of thousand messages per seconds
Used Pig as ETL tool to do Transformations with joins and pre-aggregations before storing data onto HDFS and assisted Manager by providing automation strategies, Selenium/Cucumber Automation and JIRA reports
Worked on Java Message Service JMS API for developing message-oriented middleware MOM layer for handling various asynchronous requests
Performed Data Engineering including Glue Sync of Semantic Layers, Data Cleansing, Data joins and calculations based on the User Stories defined
Implemented Google Big Query which adds to data layer between Google Analytics and PowerBI
We have a lot of Web behavior data being tracked in Google Analytics which needs to be pulled into a BI system for better reporting
With the native PowerBI connector for GA, we were getting sampled data not giving us accurate results
Consulted with application development business analysts to translate business requirements into data design requirements used for driving innovative data designs that meets business objectives
Involved in information-gathering meetings and JAD sessions to gather business requirements, deliver business requirements document and preliminary logical data model
Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3
Compared the data in a leaf level process from various databases when data transformation or data loading takes place
I need to analyze and investigate the data quality when these types of loads are done
As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake
Developed SQL scripts to Upload, Retrieve, Manipulate, and handle sensitive data in Teradata, SQL Server Management Studio and Snowflake Databases for the Project
Used Git, GitHub, and Amazon EC2 and deployment using Heroku and Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy
Incorporated predictive modelling (rule engine) to evaluate the Customer/Seller health score using python scripts, performed computations, and integrated with the Tableau viz
Analyzed marketing campaigns from various perspectives including CTR, conversion rates, seasonal/geographical trends, search queries, landing page, conversion funnel, quality score, competitors, distribution channel, etc
To achieve maximum ROI for clients
Cleansing, mapping, and transforming data, create the job stream, add, and delete the components to the job stream on data manager based on the requirement
Developed Teradata SQL scripts using RANK functions to improve the query performance while pulling the data from large tables
Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems
Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the database
Used Star Schema methodologies in building and designing the logical data model into Dimensional Models extensively
Designed and deployed reports with Drill Down, Drill Through and Drop-down menu option and Parameterized and Linked reports using Tableau
Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions
Conducted Statistical Analysis to validate data and interpretations using Python and R, as well as presented Research findings, status reports and assisted with collecting user feedback to improve the processes and tools
Applied concepts of probability, distribution, and statistical inference on given dataset to unearth interesting findings through these of comparison, T-test, F-test, R-squared, P-value etc
Reported and created dashboards for Global Services & Technical Services using SSRS, Oracle BI, and Excel
Deployed Excel VLOOKUP, PivotTable, and Access Query functionalities to research data issues
Environment: Informatica Power Center v 8.6.1, Power Exchange, IBM Rational Data Architect, MS SQL Server, Teradata, PL/SQL, IBM Control Center, TOAD, Microsoft Project Plan, Repository Manager, Workflow Manager, ERWIN 3.0, Oracle 10g/9i, Teradata, TOAD, UNIX, and Shell scripting
Python Developer
Asian Technology Solutions
Hyderabad, India
05.2015 - 04.2016
Worked on windows server VM to run reporting scripts each day and make the reports available via http using an iis virtual directory
Created OnDemand daily and weekly reports by invoking scripts with custom arguments and parameters to provide actionable data to the test owners
Managed datasets using pandas’ data frames and MySQL database queries from python using python pyodbc connector package to retrieve information
Developed a portal to manage and entities in a content management system using flask and added several options to the application to choose algorithm for data and address generation
Used ajax and jquery for transmitting json data objects between frontend and controllers
Developed the required XML schema documents and implemented the framework for parsing XML document for building application and database servers using AWS EC2 and creating AMIS also using RDS for Oracle DB
Developed web applications in DJango frameworks Model View Control (MVC) architecture and using custom tags to simplify the template code
Designed UI screens using templates, Ajax, Html, and JSon
Used Javascript for client-side validation
Involved in the analysis, design, and development and testing phases of software development life cycle (SDLC)
Scheduling tasks on windows task scheduler to run the Python scripts to generate reportsfor frequent interval of times and send email alerts
Actively participated in Agile development process, including daily stand-ups, sprint planning, and retrospectives
Built a RESTful API using Flask (Python) on top of MySQL/MariaDB (SQL) for integrating with third party services
Collaborated with DevOps team to deploy new version of app in production environment (~1M monthly users)
Implemented unit tests using pytest and mock objects for improved code quality and maintainability.
RAD, JAD, UML, System Development Life Cycle (SDLC), Jira, Confluence, Agile, Waterfall Model
Accomplishments
Experience in Big Data analytics, Data manipulation, using Hadoop Ecosystem tools Map - Reduce, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro, and Zookeeper.
Experience in installation, configuration, supporting, and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
Experience in implementing Azure data solutions, provisioning storage account, Azure Data Factory, SQL server, SQL Databases, SQL Data warehouse, Azure Data Bricks and Azure Cosmos DB.
Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Managing Clusters in Databricks, Managing the Machine Learning Lifecycle.
Experience in Data extraction (extract, Schemas, corrupt record handling and parallelized code), transformations and loads (user - defined functions, join optimizations) and Production (optimize and automate Extract, Transform and Load).
Have good experience in designing cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs and design tabular models in Azure analysis services.
Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
Experience in using Snowflake Clone, Time Travel and building snow pipe.
Worked with Matillion which Leverage Snowflake’s separate compute and storage resources for rapid transformation and get the Get the most from Snowflake-specific features, such as Alter Warehouse and Flatten Variant, Object, and Array.
Experience in working with RIVERY ELT platform which performs data integration, data orchestration, data cleansing, and other vital data functionalities
Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer.
Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.
Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Hive, Pig, Sqoop, Job Tracker, Task Tracker, Name Node, Data.
Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase, and SQL Server databases.
Have experience in installing, configuring, and administrating Hadoop clusters for major Hadoop distributions like CDH4, and CDH5.
Involved with the Design and Development of ETL process related to benefits and offers data into the data warehouse from different sources.
Possess strong Documentation skills and knowledge sharing among Team, conducted data modeling sessions for different user groups, facilitated common data models between different applications, participated in requirement sessions to identify logical entities.
Extensive experience in relational Data modeling, Dimensional data modeling, logical/Physical Design, ER Diagrams, and OLTP and OLAP System Study and Analysis.
Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX, and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve the efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD, and Spark YARN.
Extensive knowledge and experience in producing tables, reports, graphs, and listings using various procedures and handling large databases to perform complex data manipulations.
Excellent knowledge in preparing required project documentation and trackingand reporting regularly on the status of projects to all project stakeholders.
Experience in UNIX shell scripting for processing large volumes of data from varied sources and loading them into databases like Teradata.
Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka, and Flume.
Proficient in Data Modeling Techniques using Star Schema, Snowflake Schema, Fact and Dimension tables, RDBMS, Physical and Logical data modeling for Data Warehouse and Data Mart.
Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
Highly skilled in using visualization tools like Tableau, ggplot2, dash, PowerBI, flask for creating dashboards.
SQL Reference Mapper, using Regular Expressions Successfully mapped over a hundred thousand SQL references inside of SQL Object source code, SSRS reports, and DT Packages.
· Good experience in developing web applications and implementing Model View Control (MVC) architecture using server-side applications like Django, Flask, and Pyramid.
Experience in application development using Java, RDBMS, TALEND and Linux shell scripting and DB2.
Experienced in Software Development Lifecycle (SDLC) using SCRUM, Agile methodologies.
Certification
Microsoft Certified: Azure Data Engineer Associate
Certification number: I194-4422