Summary

Overview

Work History

Education

Skills

Timeline

ABHIJITH

Summary

· 7+ years of experience as a Machine Learning Engineer, specializing in developing and implementing advanced algorithms and models.

· Proficient in Python and R programming languages, with expertise in machine learning frameworks like TensorFlow and PyTorch.

· Skilled in designing and deploying scalable machine learning solutions across diverse industries, including finance, healthcare, and e-commerce.

· Strong background in data analysis, pattern recognition, and deep learning techniques.

· Demonstrated ability to preprocess and transform raw data, perform feature engineering, and select relevant features for improved model performance.

· Experienced in evaluating and validating models using various metrics, cross-validation, and hyperparameter tuning techniques.

· Knowledgeable in big data technologies such as Apache Spark for processing large-scale datasets and building scalable machine learning pipelines.

· Familiarity with cloud platforms like GCP and Azure for deploying machine learning models in production environments.

· Proven track record of collaborating effectively with cross-functional teams and stakeholders to deliver successful projects.

· Excellent problem-solving skills and a passion for staying updated on the latest advancements in the machine learning field.

· expertise in building and deploying end-to-end data pipelines on Google Cloud Platform (GCP) as well as in Azure cloud.

· Proficient in utilizing GCP services such as Big Query, Dataflow, and Cloud Storage for data ingestion, transformation, and loading. Experienced in working with structured and semi-structured datasets, employing Hive queries for analysis.

· Strong background in metadata management using Google Cloud Data Catalog, including custom Python program development and adherence to CI/CD guidelines.

· Experienced in constructing end-to-end data pipelines for batch processing, utilizing Spark, Scala, and Hadoop clusters on GCP and proficient in creating pipelines using Flume, Kafka, and Spark Streaming for real-time data ingestion.

· Skilled in utilizing PySpark's Spark SQL API for data import, extraction, and SQL querying. Proficient in implementing data encryption using hashing algorithms and developing efficient ETL processes using Apache Spark and Python.

· Demonstrated proficiency in Sqoop for incremental and batch data ingestion from various databases. Skilled in data lake processing, utilizing DISTCP for data loading, Scala, Spark, Spark SQL, Hive, Impala Query, and Hive tables for data processing and ML algorithm integration.

· Strong expertise in developing Big Query authorized views for data exposure and security. Skilled in using Cloud Shell for various tasks and service deployment on GCP.

· Proficient with Azure components such as HDInsight, Databricks, Data Lake, and Blob storage and competent in real-time data processing with Azure Synapse Analytics with business solution deployment using Azure Analysis Services.

· Expertise in building end-to-end data pipelines for data input, transformation, and loading utilizing Azure Data Factory, Azure Databricks, and Azure Storage.

· Administered Azure Data Lake Storage, Databricks, and Data Lake components, as well as delivering structured data to Azure Blob Storage via Synapse Pipelines.

· Expertise in using Apache Spark pools to clean, convert, and analyze streaming data, as well as integrate it with structured data from operational databases or data warehouses.

· Contributed to the creation of Power BI reports and data visualizations to provide insights to stakeholders.

· Skilled in conducting data analysis and exploratory data analysis (EDA) to uncover patterns, trends, and relationships.

· Proficient in applying statistical techniques for insights generation, correlation identification, and hypothesis testing to support business decision-making.

· Experienced in developing predictive models using machine learning algorithms, optimizing model performance through feature engineering and selection.

· Worked closely with cross-functional teams and presenting analysis findings using data visualization tools.

· Experienced with ETL development using Informatica PowerCenter and IBM DataStage for designing and creating end-to-end ETL solutions.

· Skilled in data extraction, transformation, and loading, collaborating with business stakeholders to gather requirements and translate them into technical specifications.

· Proficient in developing ETL mappings, processes, and sessions to ensure data integrity and performance.

· Capable of writing complex SQL queries, stored procedures, and scripts for data extraction, transformation, and loading.

· Skilled in data cleansing, validation, and error handling techniques to maintain data quality.

· Strong analytical skills for data profiling, analysis, and documentation of ETL processes, mappings, and transformations.

· Key Technologies:

· Programming Languages: Python, R

· Machine Learning Frameworks: TensorFlow, PyTorch, Keras, scikit-learn, XGBoost

· Deep Learning: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs)

· Natural Language Processing (NLP): Text classification, Named Entity Recognition (NER), Sentiment Analysis

· Natural Language Understanding (NLU): Intent recognition, Language modeling

· Data Preprocessing and Feature Engineering: Data cleaning, feature selection, dimensionality reduction

· Model Evaluation and Validation: Cross-validation, hyperparameter optimization, evaluation metrics (accuracy, precision, recall, F1 score)

· Big Data Technologies: Apache Spark

· Software Development: Git, Agile methodologies

Cloud Platforms: GCP, Azure

Overview

years of professional experience

Work History

MACHINE LEARNING ENGINEER

MASTERCARD

03.2021 - Current

Designed and implemented data pipelines for batch and real-time data processing, enhancing data efficiency by 30%
Developed machine learning models using Python and Scala, contributing to a 15% increase in data-driven insights
Developed and deployed complete data pipelines on GCP, utilizing Big Query, Dataflow, and Cloud Storage for data intake, transformation, and loading
Design and execute well-engineered, easy-to-maintain, reliable, and bug-free code for various company applications in collaboration with other Al engineers, data scientists, programmers, and software personnel
Continuously improve existing applications applications by working with the existing code base and enhancing its different phases
Coordinate all technical and developmental issues/risks identified during the process to the team of 10+ Al engineers to formulate and initiate appropriate course of actions to ensure the timely completion of the version upgrade
Design, test, and deploy new artificial intelligence functionalities for major projects of the company with Author sophisticated and optimized codes to boost the reusability of standard modules
Experience in developing image to text conversion algorithms for automatic processing of application and loading them into data base after the processing for further use
Experience in developing chat bot for customer interaction using cutting edge technologies
Experience in natural language processing (NLP) text summarization techniques to identify best summarize the regular customer complains and use it for further analysis by the sales team
Spearheaded the development of natural language processing (NLP) models for large-scale datasets, resulting in a 25% improvement in text summarization accuracy
Applied machine learning techniques to analyze user-generated content, enhancing the recommendation system and increasing user engagement by 20%
Engineered and fine-tuned NLU models for intent recognition, entity extraction, and language understanding, resulting in 10% improvement in accuracy
Worked closely with clients to understand their business goals and tailored NLU models to meet specific industry requirements
Experience in developing extensive knowledge base of text summarization techniques, including extractive and abstractive summarization methods, as well as unsupervised and supervised learning approaches
Conducted data profiling and mining, addressing issues related to data completeness and quality, resulting in improved data accuracy
Extracted data from multiple databases like prostgres, microsoft sql, dynamo DB
Experience in developing API for the ETL pipeline using Fastapi and RestAPI services
Supported the deployment, monitoring, and maintenance of production use cases, optimizing data efficiency
Proficiently used SQL (DB2) for database management and MongoDB for NoSQL data querying
Utilized JIRA for project management and communication, ensuring timely and efficient project delivery
Exposure to AWS cloud-based systems and API integrations, contributing to the seamless integration of data solutions
Knowledgeable in version control systems like Git for managing database code and tracking changes
Provided fact dimensional modeling and provided a way to load them
Create a spark job with Pyspark to construct an end-to-end data pipeline for batch processing
Use a Hadoop cluster running on GCP to develop and launch the outcome using Spark and Scala code
Used Pyspark's Spark SQL API to import and extract data as well as run SQL queries
I worked on creating a Pyspark script to encrypt raw data using the principles of hashing algorithms on columns that the client provided
Developed efficient ETL processes using Apache Spark and Python, ensuring high scalability and performance
Optimized Hadoop and Spark architecture, improving data processing speed by 20%
Handle query APIs using JSON, Protocol Buffers, and XML, facilitating data access and retrieval for model training and evaluation
Created data models and performed data integration to support business intelligence and analytics initiatives
Optimized data processing and improved query performance by implementing partitioning strategies and indexing techniques
Developed Big Query authorized views for exposing the data to other teams or securing it at the row level
Expertise in using the cloud shell for a variety of tasks and service deployment.

MACHINE LEARNING ENGINEER

FORD

11.2019 - 03.2021

Experience with data ingestion, storage, analysis, and visualization using Azure cloud technologies, as well as the creation of data pipelines Solid understanding of Azure cloud components (HDInsight, Databricks, Data Lake, and Blob storage)
Experience with real-time data processing with Azure Synapse Analytics and using Azure Analysis Services for governing, deploying, testing, and delivering business solutions
Experienced in developing models using logistic regression, random forest, XGboost, etc on large scale data and deploying them in cloud production environment
Worked on real-time data processing solutions, reducing data latency by 15%
Spearheaded the architecture and implementation of Azure Bot Service solutions for customer feedback, achieving a 15% improvement in user engagement
Utilized Azure services such as Azure Functions, Azure App Service, and Azure Cognitive Services to enhance the chatbot's capabilities and responsiveness
Conducted in-depth analysis of system processing flows, identifying bottlenecks and implementing performance enhancements
Created and documented technical and functional specifications, providing a clear roadmap for development teams
Assisted in selecting appropriate AI/ML technologies and tools to ensure optimal system performance
Experienced in optimizing personalization algorithms for applications with 2M+ users
Experience in accurate prediction of sales within 2% by applying machine lrearnig algorithms
Developed and implemented end-to-end data pipelines using the Azure Data Factory, Azure Databricks, and Azure Storage for ingesting, transforming, and loading data
Utilizing Apache Spark, Python, and Azure Databricks notebooks, effective ETL workflows were developed, guaranteeing great performance and scalability
Python and SQL were utilized for carrying out data transformation and cleaning tasks to get the data ready for analysis
Assisted team members in locating and fixing problems with data quality and in streamlining data processing processes
Assisted in the development of Power BI reports and data visualizations to deliver insights to stakeholders
Data integration and model creation were done to support business intelligence and analytics projects
Implementing indexing and partitioning techniques enhanced query performance and optimized data processing
Scheduling and orchestration using Airflow, and Oozie, streamlining model training and deployment processes
To guarantee data accuracy and consistency, data governance procedures were put into place
Utilized Azure services like Azure Cosmos DB and Azure SQL Database for effective data storage
Developed monitoring and notification alerts to proactively identify data pipeline issues and ensure data availability
Actively participated in code reviews and mentored junior team members in best programming practices.

ETL developer

BANK OF AMERICA

07.2018 - 05.2019

Using Informatica PowerCenter, designed and created complete ETL solutions, including procedures for data extraction, transformation, and loading
Engaged with business stakeholders to collect requirements, assess the need for data integration, and translate those requirements into technical specifications
Developed and deployed Informatica PowerCenter ETL mappings, processes, and sessions to guarantee data integrity and boost performance
Created intricate SQL queries, stored procedures, and scripts to extract, modify, and load data into the data warehouse from a variety of sources
Implemented data cleansing and validation techniques to maintain data quality and consistency across different data sources
Developed and optimized SQL queries to extract and transform data, ensuring efficient data processing and adherence to business rules
Collaborated with cross-functional teams to gather requirements, analyze data integration needs, and design appropriate data models and schemas
Implemented data validation and error handling mechanisms to identify and resolve data quality issues during the ETL process
Optimized ETL processes for performance and scalability by incorporating parallel processing, partitioning strategies, and optimizing resource utilization
Conducted data profiling and analysis to identify data patterns, anomalies, and data quality improvement opportunities
Documented ETL processes, data mappings, and transformations for easy maintenance and future reference.

Data Analyst

COGNIZANT TECHNOLOGY SOLUTIONS

06.2016 - 07.2018

Conducted data analysis and performed exploratory data analysis (EDA) to understand data patterns, trends, and relationships
Applied statistical techniques to uncover insights, identify correlations, and perform hypothesis testing to support business decision-making
Developed predictive models using machine learning algorithms to solve business problems and make data-driven recommendations
Conducted feature engineering and selection to optimize model performance and improve accuracy
Collaborated with cross-functional teams to define project objectives, gather requirements, and translate business questions into analytical approaches
Utilized data visualization tools such as Matplotlib, Seaborn, or Tableau to present analysis findings and insights to stakeholders
Worked closely with data engineers and software developers to ensure data quality, data integrity, and proper integration of ML models into production systems
Collected, cleansed, and prepared data for analysis using SQL queries, Excel, and scripting languages
Performed data mining and statistical analysis to identify trends, patterns, and anomalies
Conducted model evaluation and performance metrics analysis to assess the effectiveness and reliability of ML models
Stayed updated with the latest trends and advancements in the field of machine learning and data analysis, exploring new techniques and methodologies.

Education

Master of Science - Computer Engineering

Arizona State University

Tempe, AZ

2021

Skills

Languages: SQL, Unix shell, Python, R
Python Libraries: Open CV , PyTorch , TensorFlow, scikit-learn ,Keras, Pyspark ,NLTK , Sumy ,matplotlib, transformers
Machine Learning Frameworks: TensorFlow, PyTorch, Keras, scikit-learn, XGBoost, CatboostLangChain, LangServe, LlamaIndex, Azure Open AI, Databricks, Huggingface
Machine Learning & Deep Learning: Support Vector Machines (SVM), k-Nearest Neighbors (kNN), Linear/Logistic Regression (regularized), Clustering techniques, Ensemble methods (Bagging, Boosting, Stacking), Time Series Forecasting, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTMs), Generative Adversarial Networks (GANs)

Databases: SQL Server 2008/2012/2014, MySQL, Oracle 10g, 11g,12c,19c
Azure cloud Technologies: Azure Synapse Analytics, Databricks, Data Lake, Blob Storage, Data Factory, Azure Analytic Service, Azure Bot Service, Azure Cognitive Services, Azure Machine Learningand other cloud tools
GCP technologies: Google Cloud Storage, Big Query, Dataproc , Dataflow, Pub/Sub , Composer , Cloud Monitoring and Data Fusion
Dashboard : Designing real time dashboards using PowerBI and Tableau

Timeline

MACHINE LEARNING ENGINEER

MASTERCARD

03.2021 - Current

MACHINE LEARNING ENGINEER

FORD

11.2019 - 03.2021

ETL developer

BANK OF AMERICA

07.2018 - 05.2019

Data Analyst

COGNIZANT TECHNOLOGY SOLUTIONS

06.2016 - 07.2018

Master of Science - Computer Engineering

Arizona State University