Data Engineer with Around 4 years of professional experience in Data Extraction, Data Modelling, Data Mining, and Data Visualization and 2+ years of Research oriented experience in Data Analysis on Real time datasets of state government.
Extensive experience with Informatica (ETL Tool) for Data Extraction, Transformation and Loading. Skilled in importing and exporting data using Sqoop between HDFS and RDBMS and adapting the process according to client's requirements.
Experienced in developing data marts and warehousing with advanced transformation for ETL (Extract, Transform & Load Process) using SQL, PostgreSQL, HiveQL, SAS, Python (Pandas, NumPy) and PySpark.
Extensively used Azure Databricks for data validations and analysis on Cosmos structured steams.
Knowledge of Big Data tools and Hadoop ecosystem components like Map Reduce, HDFS, Hive, Sqoop, Apache Spark, and Kafka.
Hands-on experience in implementing LDA, and Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, and Principal Component Analysis.
Overview
4
4
years of professional experience
2
2
Research Exeperience
Work History
Data Engineer
APSIS Technologies Pvt, LTD.
Bangalore, India
07.2019 - 06.2021
Produced and maintained Tableau data sources and data extracts, improving data accuracy by 15% and reducing data processing time by 20%
Developing the Sqoop scripts to make the interaction between Pig and MySQL Database
Utilizing Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Working with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS
Responsible for operations and support of big data Analytics platform, Splunk, and Tableau visualization
Building predictive models including ensemble models using machine learning algorithms such as Logistic Regression, Random Forests, and KNN to predict customer churn
Developed, deployed, and maintained ETL pipelines in Azure Data Factory, ensuring timely and accurate data availability across the organization
Utilized Azure Databricks to perform data analytics, employing languages such as SQL and Python to derive insights and facilitate data-driven decision-making
Leveraged Informatica Intelligent Cloud Services (IICS) to design and implement scalable and efficient cloud-based integration solutions, enabling seamless data synchronization, transformation, and connectivity between diverse cloud and on-premises systems
Developed and deployed ETL workflows in IICS, optimizing data integration processes and ensuring data accuracy and integrity across multiple platforms
Collaborated with cross-functional teams to gather requirements and provide IICS-based solutions, enhancing overall system efficiency and business productivity
Create and implement data warehousing solutions using Python and data warehousing technologies, optimizing data storage and retrieval for complex analytical queries
Designing various Jenkins jobs to continuously integrate the processes and executed CI/CD pipeline using Jenkins
Using Snowflake functions to perform semi-structured data parsing entirely with SQL statements
Performing Code release from one environment to another environment using release management in Azure DevOps.
SE - Intern
Cybermatic Systems Pvt Ltd
, India
05.2017 - 04.2019
Performed unit testing and integration testing to ensure quality of the product before releasing it to customers.
Conducted research on the latest trends in software engineering best practices.
Resolved customer issues by establishing workarounds and solutions to debug and create defect fixes.
Produced supporting reports and documentation to help development team members complete project work.
Created technical documentation such as user manuals, flowcharts, and diagrams.
Education
Master of Science - Computer Science
Southern Illinois University, Carbondale, IL
05.2023
3.6 GPA
Research Graduate Assistant:
Developed and implemented Python scripts for data collection, cleaning, and analysis using libraries like pandas and NumPy.
Utilized machine learning techniques to analyze large datasets and build predictive models, achieving an accuracy of 97%.
Assisted research papers and presented findings at conferences, demonstrating strong communication and presentation skills.
Collaborated effectively with a research team to design and execute complex research projects, demonstrating teamwork and problem-solving ability.
Skills
MS SQL Server
SQL, PostgreSQL, MySQL
MongoDB
Jupyter Notebook, PyCharm
Scala, SSIS
Python & Libraries
Power BI, Tableau
Kafka
PySpark
Informatica Cloud (IICS)
AWS
Azure (Azure DevOps pipelines, Databricks)
Timeline
Data Engineer - APSIS Technologies Pvt, LTD.
07.2019 - 06.2021
SE - Intern - Cybermatic Systems Pvt Ltd
05.2017 - 04.2019
Southern Illinois University - Master of Science, Computer Science
Similar Profiles
Md MoneebMd Moneeb
Front-End Web Developer at Apsis TechnologiesFront-End Web Developer at Apsis Technologies