Summary

Overview

Work History

Education

Skills

Timeline

Praveen Kumar

Austin

Summary

9 years of experience as Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise-level applications and systems.
Hands on experience in installing, configuring, and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), Map Reduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, Kafka, Storm.
Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
Hands on working knowledge of Linux operating system, Unix, Windows OS, AWS, Azure and Google cloud platform for machine learning applications to create and manage databases on cloud platform and analyze data sets
Experience in deploying major software solutions for various high-end clients meeting the business requirements such as data Processing, Ingestion, Analytics and Cloud Migration from On-prem to Azure Cloud.
Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMWare and Amazon Web Services (AWS) using an EC2 instances.
Experienced in Splunk, ELK (Elastic, Logstash, and Kibana) for centralized logging and then store logs and metrics into an S3 bucket using Lambda function and Used AWS Lambda to manage the servers and run the code in the AWS.
Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
Good understanding and knowledge of Microsoft Azure services like HDInsight Clusters, BLOB, ADLS & Data Factory
Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries.
Hands on experience in ELK (Elasticsearch, Logstash, and Kibana) stack.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, and MongoDB.
Experience in various Teradata utilities like Fastload, Multiload, BTEQ, and Teradata SQL Assistant.
Extensively worked with Teradata utilities BTEQ, Fast Export and Multi Load to export and load data to/from different source systems including flat files.
Experience in HDI Spark, HDI Hadoop, HDI Hive Interactive and HDI Kafka
Design, build and manage the ELK (Elasticsearch, Logstash, and Kibana) cluster for ... SQL/NoSQL Databases: Oracle, Teradata, MongoDB, Dynamo DB.
Good in System analysis, ER Dimensional Modeling, Database design and implementing RDBMS specific features.
Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
Experience in designing the Conceptual, Logical and Physical data modeling using Erwin and E/R Studio Data modeling tools.
Good in System analysis, ER Dimensional Modeling, Database design and implementing RDBMS specific features.
Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
Strong experience in writing SQL and PL/SQL, Transact SQL programs for Stored Procedures, Triggers and Functions.
Expertise in designing complex Mappings and have expertise in performance tuning and slowly-changing Dimension Tables and Fact tables.
Experienced in Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation in ETL.
Usage of different Talent Hadoop Component like Hive, Pig and Sqoop.
Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
Good understanding of NoSQL databases like Cassandra and HBase.
Good experience with continuous Integration of application using Jenkins.
Extensive experience in SQL Server 2016/2012/2008 Business Intelligence tools - SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS).
Experienced in Agile Methodologies like attending daily Scrums, maintaining User Stories and Burn down Charts, Backlog Grooming and Retrospective.

Overview

years of professional experience

Work History

Sr. Data Engineer

Wayside publishing ltd

01.2023 - Current

As a Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics
Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive
Led end-to-end migration project, successfully transferring terabytes of data from on-premises databases to Amazon Redshift, optimizing data storage and query performance
Designed and implemented ETL pipelines using AWS Glue and Apache Spark, extracting data from various sources, transforming and cleaning it, and loading it into Redshift for analysis
Developed optimized schema design for Redshift tables, leveraging columnar storage and compression techniques, resulting in improved query execution times
Created dimension and fact tables following Kimball methodology, ensuring accurate representation of business data and facilitating reporting
Automated data extraction and transformation processes using AWS Data Pipeline, reducing manual intervention and ensuring consistent and timely data updates
Designed and implemented AWS CloudFormation templates to provision and manage infrastructure resources for data processing pipelines, resulting in a more efficient and consistent deployment process
Developed CI/CD pipelines using Jenkins and YAML-based Jenkinsfiles to automate the deployment of CloudFormation stacks, ensuring reliable and repeatable infrastructure changes
Developed custom Python scripts to orchestrate ETL workflows, enhancing data reliability and scalability
Collaborated with cross-functional teams including data scientists and analysts to define data requirements, ensuring the delivered solutions met their analytical needs
Worked closely with stakeholders to gather business requirements, translating them into technical solutions that aligned with the overall data strategy
Created comprehensive technical documentation for ETL processes, ensuring knowledge transfer and providing a resource for troubleshooting and maintenance
Set up monitoring and alerting mechanisms using Amazon CloudWatch and AWS CloudTrail, proactively identifying and resolving issues to minimize downtime and data loss
Implemented data security measures such as encryption at rest and in transit for sensitive data, ensuring compliance with industry standards and regulations
Stayed updated with the latest AWS services and best practices through online courses, webinars, and conferences, applying new knowledge to improve data engineering processes
Conducted a thorough analysis of client requirements and existing case studies to design and implement tailored predictive models using GCP AI/Vertex and AI/ML skills
Led the development of data pipelines using GCP Data Pipeline and GCP Glue, ensuring seamless ETL processes for efficient data processing and model training
Worked in an Agile project environment, participating in daily stand-ups, sprint planning, and sprint retrospective meetings to align with project goals and timelines
Independently switched technical skills based on project needs, demonstrating adaptability and versatility in handling diverse tasks
Ensured the successful deployment of software packages into a fully automated environment, utilizing expertise in software packaging and deployment
Collaborated with the DevSecOps team to implement CI/CD pipelines, Blue-Green deployments, and feature toggles using tools such as Git, Jenkins, and uDeploy
Documented and designed proposals for the predictive analytics implementation, providing clear communication to both technical and non-technical stakeholders
Environment: Hadoop 3.0, MapReduce, Hive 2.3, Agile, HDI Insight, Apache Kafka, Azure,,AWS, Oozie 5.1, Pig 0.17, HDFS, Spark 2.4, Python, HBase 1.2, OLAP, OLTP, Scala, SSIS, SSRS

Sr. Data Engineer

CVS

06.2021 - 01.2023

As a Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics
Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive
Configured Azure SQL database with Azure storage Explorer and with SQL server
Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL
Led the development and optimization of data pipelines on Azure Databricks
Involved in setting up and configuring Azure Databricks clusters
Contributed to creating and maintaining comprehensive documentation for known issues and solutions, enabling support teams to resolve issues more efficiently
Set up and managed monitoring and alerting systems to promptly identify and respond to issues, ensuring minimal downtime and improved system reliability
Provided prompt and courteous support to end-users, addressing access issues, permissions, and basic inquiries, and effectively guided them through common usage challenges
Developed automation scripts to streamline support tasks, including log analysis, cluster scaling, and error detection, reducing manual effort and improving efficiency
Designed and maintained data lake architecture on Azure for storage and retrieval of large datasets
Implemented data security measures and access controls for sensitive data within Azure Databricks
Created automated data ingestion and transformation processes, reducing manual effort
Familiarity with DevOps practices and CI/CD pipelines for deploying Databricks solutions
Knowledge of best practices in data warehousing, data modeling, and data governance
Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory
Developed ADF Pipelines to load data from on prem to AZURE cloud Storage and databases
Created data integration and technical solutions for Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory, Azure SQL databases and Azure SQL Data Warehouse for providing analytics
Worked on multiple tools like HDI Kafka, HDI Hive, Apache NiFi and Spark to create the flow
Worked in Azure environment for development and deployment of Custom Hadoop Applications
Used Agile (SCRUM) methodologies for Software Development
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS
Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks
Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization
Worked on machine learning on large size data using Spark and MapReduce
Primarily responsible for creating new Azure Subscriptions, data factories, Virtual Machines, SQL Azure Instances, SQL Azure DW instances, HDInsight clusters
Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure
Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
Responsible in loading and transforming huge sets of structured, semi structured and unstructured data
Developed Spark scripts by using python and bash Shell commands as per the requirement
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems
Implemented Security in Web Applications using Azure and deployed Web Applications to Azure
Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures
Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala
Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2
Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory
Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic
Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools
Worked on Spark Streaming and Apache Kafka to fetch live stream data
Solution architecting BIG Data solution for Projects & Proposal using Hadoop, Spark, ELK Stack, Kafka, Tensor flow
Responsible for importing and exporting data from different sources like MySQL, Teradata databases into HDFS using SQOOP to save in file formats AVRO, JSON and ORC file formats
Experience in administration and configuration of ELK Stack (Elasticsearch, Logstash, Kibana) on AWS and performed Log Analysis
Environment: Hadoop 3.0, MapReduce, Hive 2.3, Agile, HDI Insight, Apache Kafka, Azure, Oozie 5.1, Pig 0.17, HDFS, Spark 2.4, Python, HBase 1.2, OLAP, OLTP, Scala, SSIS, SSRS

Sr. Data Engineer

Johnson & Johnson

04.2020 - 05.2021

Worked as a Data Engineer to review business requirement and compose source to target data mapping documents
Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
Developed Big Data solutions focused on pattern matching and predictive modeling
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system
Responsible for loading, extracting and validation of client data
Used Python programs for data manipulation, automation process of generating reports of multiple data sources or dashboards
Deployed and configured Elastic Search, Log stash and Kibana (ELK) for log analytics, full text search, application monitoring in integration with AWS Lambda and Cloud Watch
Worked on NoSQL databases including Cassandra
Implemented multi-data center and multi-rack Cassandra cluster
Coordinated with Data Architects on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing
Used pyspark to create, load, transform using spark context, RDD, dataframes
Design and develop, data pipeline and ETL integration patterns using PySpark (Python On Spark), databricks
Created sheet selector to accommodate multiple chart types (Pie, Bar, Line) in a single dashboard by using parameters
Performed Reverse Engineering of the current application using Erwin, and developed Logical and Physical data models for Central Model consolidation
Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules
Developed data pipelines to consume data from Enterprise Data Lake (MapR Hadoop distribution - Hive tables/HDFS) for analytics solution
Created Hive External tables to stage data and then move the data from Staging to main tables
Developed incremental and complete load Python processes to ingest data into Elastic Search from Hive
Created data models for AWS Redshift and Hive from dimensional data models
Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations
Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability
Developed Rest services to write data into Elastic Search index using Python Flask specifications
Worked in data management performing data analysis, gap analysis, and data mapping
Environment: Agile, AWS, Hadoop 3.0, Hive 2.3, HDFS, Python, pyspark, Cassandra 3.11, NoSQL

Data Analyst/ Data Modeler

Fission Labs

09.2016 - 12.2019

Created Oozie workflow jobs for query scheduling and actions and automated the entire CI/CD using Scripts, Git and Jenkins
Written Hive queries to transform data for downstream processing
Created schemas in Hive with performance optimization using indexing, bucketing and partitioning
Implementing the big data pipeline with real-time processing using Python, Pyspark and Hadoop ecosystem
Developed Spark code in Python using Spark SQL and Data Frames
Worked on dimensional modeling and maintained dimensions like Product, Customer, and Region as a part of Snowflake schema and loaded the data in the member fact tables
Used Sqoop to insert and retrieve data from various RDBMS like Oracle and SQL Server
Managed full SDLC processes involving requirements management, workflow analysis, source data analysis, data mapping, metadata management, data quality, testing strategy and maintenance of the model
Used SDLC (System Development Life Cycle) methodologies like the RUP and the waterfall
Comprehensive knowledge and experience in Software Development Life Cycle (SDLC) with business process models: Waterfall & Agile (Scrum) methodologies, Scaled Agile Framework (SAFe)
Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Data Stage, and creating ETL jobs in Data Stage
Performed analysis on enterprise data/report integration & provided functional specification to development team to build Enterprise Reporting Systems
Collected business requirements to set rules for proper data transfer from Data Source to Data Target in Data Mapping
Created data mapping documents mapping Logical Data Elements to Physical Data Elements and Source Data Elements to Destination Data Elements
Responsible for different Data mapping activities from Source systems to Teradata and Assisted in the oversight for compliance to the Enterprise Data Standards
Worked with data investigation, discovery and mapping tools to scan every single data record from many sources
Performed data mining on Claims data using very complex SQL queries and discovered claims pattern
Written complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2
Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data
Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata
Performing data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel and SQL
Delivered a foundation of comprehensive requirements & high-level design to transition a paper-based legacy mortgage foreclosure system into a modernized efficient application as part of project
Good understanding of the AS-IS and TO-BE business processes (GAP analysis) and experience in converting these requirements into technical specifications for preparing test plans
Extracted data from multiple structured and unstructured data sources, transformed, processed and loaded into relational and non-relational databases
Performed data analysis, outlier detection, anomaly detection, data profiling, trend analysis, financial/statistical analysis & reporting for new and legacy data sources
Automated reconciliation process between 7 different systems to validate the flow of data between them using Shell scripting, Python and Databases
Developed Informatica solutions for complex and large volumes of data using SQL Server, Oracle fixed width and delimited files
Developed complex SQL code using partitioning, materialized views, stored procedures, functions, cursors, and arrays
Data Replatforming from RDBMS system to Hadoop ecosystem using Python and native Hadoop tools such as Sqoop, Hive, Map Reduce, YARN
Developed data pipeline using Flume, Sqoop and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis
Responsible for importing and exporting data from different sources like MySQL, Teradata databases into HDFS using SQOOP to save in file formats AVRO, JSON and ORC file formats
Extensively used MS Access to pull the data from various databases and integrate the data
Involved in HDFS maintenance and loading of structured and unstructured data
Develop and run Map-Reduce jobs on a multi Peta byte YARN and Hadoop clusters which processes billions of events every day, to generate daily and monthly reports as per user's need
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers
Used to manage and review the Hadoop log files, and Responsible to manage data coming from different sources
Performed data processing like collecting, aggregating, moving data from various sources using Apache Flume and Kafka
Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Delivering tuned, efficient and error free codes for new Big Data requirements using my technical knowledge in Hadoop and its Eco-system
Environment: Hadoop, MapReduce, SDLC, HDFS, Hive, Pig, Sqoop, Java, Red Hat Linux, AWS, XML, MySQL, Eclipse, Kafka, ETL, Python, Pyspark, SQL, Teradata.

Data Analyst

Efftronics Systems Pvt. Ltd

06.2014 - 08.2016

Worked with Data Analyst for requirements gathering, business analysis and project coordination
Responsible for the analysis of business requirements and design implementation of the business solution
Translated business concepts into XML vocabularies by designing XML Schemas with UML
Gathered business requirements through interviews, surveys with users and Business analysts
Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL
Performed Data analysis of existing database to understand the data flow and business rules applied to Different databases using SQL
Performed data analysis and data profiling using complex SQL on various sources systems and answered complex business questions by providing data to business users
Used MS Visio and Rational Rose to represent system under development in a graphical form by defining use case diagrams, activity and workflow diagrams
Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process
Worked in generating and documenting Metadata while designing OLTP and OLAP systems environment
Established a business analysis methodology around the RUP (Rational Unified Process)
Developed stored procedures in SQL Server to standardize DML transactions such as insert, update and delete from the database
Created SSIS package to load data from Flat files, Excel and Access to SQL server using connection manager
Developed all the required stored procedures, user defined functions and triggers using T-SQL and SQL
Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports
Environment: XML, T-SQL, SQL, PL/SQL, OLTP, OLAP, SSIS, SSRS

Education

Master of Science - Data Science

Stevens Institute of Technology

Bachelor of Technology - Computer Science

SRM UNIVERSITY

Skills

Advanced SQL
Data Warehousing
Hadoop Ecosystem
Scala Programming
Agile Methodologies
Git Version Control
NoSQL Databases
Machine Learning
Spark Development
Python Programming
ETL Development
Performance Tuning

Kafka Streaming
Continuous Integration
Data Visualization
Big Data Processing
Data Pipeline Design
Advanced Analytics
Data Science Research Methods
AWS,(Redshift,Sagemaker,S3,EC2Instance)
Azure
GCP
Databricks

Timeline

Sr. Data Engineer

Wayside publishing ltd

01.2023 - Current

Sr. Data Engineer

CVS

06.2021 - 01.2023

Sr. Data Engineer

Johnson & Johnson

04.2020 - 05.2021

Data Analyst/ Data Modeler

Fission Labs

09.2016 - 12.2019

Data Analyst

Efftronics Systems Pvt. Ltd

06.2014 - 08.2016

Master of Science - Data Science

Stevens Institute of Technology

Bachelor of Technology - Computer Science

SRM UNIVERSITY

Praveen Kumar

Summary

Overview

Work History

Sr. Data Engineer

Sr. Data Engineer

Sr. Data Engineer

Data Analyst/ Data Modeler

Data Analyst

Education

Master of Science - Data Science

Bachelor of Technology - Computer Science

Skills

Timeline

Sr. Data Engineer

Sr. Data Engineer

Sr. Data Engineer

Data Analyst/ Data Modeler

Data Analyst

Master of Science - Data Science

Bachelor of Technology - Computer Science

Similar Profiles

Amanda MaliniakAmanda Maliniak

Paul WilliamsPaul Williams

Regan RichardsonRegan Richardson

CHRISTINE JONESCHRISTINE JONES

Lori RobinsonLori Robinson