Summary
Overview
Work History
Education
Skills
Timeline
Generic

ABHIJITH

Summary

· 7+ years of experience as a Machine Learning Engineer, specializing in developing and implementing advanced algorithms and models.

· Proficient in Python and R programming languages, with expertise in machine learning frameworks like TensorFlow and PyTorch.

· Skilled in designing and deploying scalable machine learning solutions across diverse industries, including finance, healthcare, and e-commerce.

· Strong background in data analysis, pattern recognition, and deep learning techniques.

· Demonstrated ability to preprocess and transform raw data, perform feature engineering, and select relevant features for improved model performance.

· Experienced in evaluating and validating models using various metrics, cross-validation, and hyperparameter tuning techniques.

· Knowledgeable in big data technologies such as Apache Spark for processing large-scale datasets and building scalable machine learning pipelines.

· Familiarity with cloud platforms like GCP and Azure for deploying machine learning models in production environments.

· Proven track record of collaborating effectively with cross-functional teams and stakeholders to deliver successful projects.

· Excellent problem-solving skills and a passion for staying updated on the latest advancements in the machine learning field.

· expertise in building and deploying end-to-end data pipelines on Google Cloud Platform (GCP) as well as in Azure cloud.

· Proficient in utilizing GCP services such as Big Query, Dataflow, and Cloud Storage for data ingestion, transformation, and loading. Experienced in working with structured and semi-structured datasets, employing Hive queries for analysis.

· Strong background in metadata management using Google Cloud Data Catalog, including custom Python program development and adherence to CI/CD guidelines.

· Experienced in constructing end-to-end data pipelines for batch processing, utilizing Spark, Scala, and Hadoop clusters on GCP and proficient in creating pipelines using Flume, Kafka, and Spark Streaming for real-time data ingestion.

· Skilled in utilizing PySpark's Spark SQL API for data import, extraction, and SQL querying. Proficient in implementing data encryption using hashing algorithms and developing efficient ETL processes using Apache Spark and Python.

· Demonstrated proficiency in Sqoop for incremental and batch data ingestion from various databases. Skilled in data lake processing, utilizing DISTCP for data loading, Scala, Spark, Spark SQL, Hive, Impala Query, and Hive tables for data processing and ML algorithm integration.

· Strong expertise in developing Big Query authorized views for data exposure and security. Skilled in using Cloud Shell for various tasks and service deployment on GCP.

· Proficient with Azure components such as HDInsight, Databricks, Data Lake, and Blob storage and competent in real-time data processing with Azure Synapse Analytics with business solution deployment using Azure Analysis Services.

· Expertise in building end-to-end data pipelines for data input, transformation, and loading utilizing Azure Data Factory, Azure Databricks, and Azure Storage.

· Administered Azure Data Lake Storage, Databricks, and Data Lake components, as well as delivering structured data to Azure Blob Storage via Synapse Pipelines.

· Expertise in using Apache Spark pools to clean, convert, and analyze streaming data, as well as integrate it with structured data from operational databases or data warehouses.

· Contributed to the creation of Power BI reports and data visualizations to provide insights to stakeholders.

· Skilled in conducting data analysis and exploratory data analysis (EDA) to uncover patterns, trends, and relationships.

· Proficient in applying statistical techniques for insights generation, correlation identification, and hypothesis testing to support business decision-making.

· Experienced in developing predictive models using machine learning algorithms, optimizing model performance through feature engineering and selection.

· Worked closely with cross-functional teams and presenting analysis findings using data visualization tools.

· Experienced with ETL development using Informatica PowerCenter and IBM DataStage for designing and creating end-to-end ETL solutions.

· Skilled in data extraction, transformation, and loading, collaborating with business stakeholders to gather requirements and translate them into technical specifications.

· Proficient in developing ETL mappings, processes, and sessions to ensure data integrity and performance.

· Capable of writing complex SQL queries, stored procedures, and scripts for data extraction, transformation, and loading.

· Skilled in data cleansing, validation, and error handling techniques to maintain data quality.

· Strong analytical skills for data profiling, analysis, and documentation of ETL processes, mappings, and transformations.

· Key Technologies:

· Programming Languages: Python, R

· Machine Learning Frameworks: TensorFlow, PyTorch, Keras, scikit-learn, XGBoost

· Deep Learning: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs)

· Natural Language Processing (NLP): Text classification, Named Entity Recognition (NER), Sentiment Analysis

· Natural Language Understanding (NLU): Intent recognition, Language modeling

· Data Preprocessing and Feature Engineering: Data cleaning, feature selection, dimensionality reduction

· Model Evaluation and Validation: Cross-validation, hyperparameter optimization, evaluation metrics (accuracy, precision, recall, F1 score)

· Big Data Technologies: Apache Spark

· Software Development: Git, Agile methodologies

Cloud Platforms: GCP, Azure

Overview

8
8
years of professional experience

Work History

MACHINE LEARNING ENGINEER

MASTERCARD
03.2021 - Current
  • Designed and implemented data pipelines for batch and real-time data processing, enhancing data efficiency by 30%
  • Developed machine learning models using Python and Scala, contributing to a 15% increase in data-driven insights
  • Developed and deployed complete data pipelines on GCP, utilizing Big Query, Dataflow, and Cloud Storage for data intake, transformation, and loading
  • Design and execute well-engineered, easy-to-maintain, reliable, and bug-free code for various company applications in collaboration with other Al engineers, data scientists, programmers, and software personnel
  • Continuously improve existing applications applications by working with the existing code base and enhancing its different phases
  • Coordinate all technical and developmental issues/risks identified during the process to the team of 10+ Al engineers to formulate and initiate appropriate course of actions to ensure the timely completion of the version upgrade
  • Design, test, and deploy new artificial intelligence functionalities for major projects of the company with Author sophisticated and optimized codes to boost the reusability of standard modules
  • Experience in developing image to text conversion algorithms for automatic processing of application and loading them into data base after the processing for further use
  • Experience in developing chat bot for customer interaction using cutting edge technologies
  • Experience in natural language processing (NLP) text summarization techniques to identify best summarize the regular customer complains and use it for further analysis by the sales team
  • Spearheaded the development of natural language processing (NLP) models for large-scale datasets, resulting in a 25% improvement in text summarization accuracy
  • Applied machine learning techniques to analyze user-generated content, enhancing the recommendation system and increasing user engagement by 20%
  • Engineered and fine-tuned NLU models for intent recognition, entity extraction, and language understanding, resulting in 10% improvement in accuracy
  • Worked closely with clients to understand their business goals and tailored NLU models to meet specific industry requirements
  • Experience in developing extensive knowledge base of text summarization techniques, including extractive and abstractive summarization methods, as well as unsupervised and supervised learning approaches
  • Conducted data profiling and mining, addressing issues related to data completeness and quality, resulting in improved data accuracy
  • Extracted data from multiple databases like prostgres, microsoft sql, dynamo DB
  • Experience in developing API for the ETL pipeline using Fastapi and RestAPI services
  • Supported the deployment, monitoring, and maintenance of production use cases, optimizing data efficiency
  • Proficiently used SQL (DB2) for database management and MongoDB for NoSQL data querying
  • Utilized JIRA for project management and communication, ensuring timely and efficient project delivery
  • Exposure to AWS cloud-based systems and API integrations, contributing to the seamless integration of data solutions
  • Knowledgeable in version control systems like Git for managing database code and tracking changes
  • Provided fact dimensional modeling and provided a way to load them
  • Create a spark job with Pyspark to construct an end-to-end data pipeline for batch processing
  • Use a Hadoop cluster running on GCP to develop and launch the outcome using Spark and Scala code
  • Used Pyspark's Spark SQL API to import and extract data as well as run SQL queries
  • I worked on creating a Pyspark script to encrypt raw data using the principles of hashing algorithms on columns that the client provided
  • Developed efficient ETL processes using Apache Spark and Python, ensuring high scalability and performance
  • Optimized Hadoop and Spark architecture, improving data processing speed by 20%
  • Handle query APIs using JSON, Protocol Buffers, and XML, facilitating data access and retrieval for model training and evaluation
  • Created data models and performed data integration to support business intelligence and analytics initiatives
  • Optimized data processing and improved query performance by implementing partitioning strategies and indexing techniques
  • Developed Big Query authorized views for exposing the data to other teams or securing it at the row level
  • Expertise in using the cloud shell for a variety of tasks and service deployment.

MACHINE LEARNING ENGINEER

FORD
11.2019 - 03.2021
  • Experience with data ingestion, storage, analysis, and visualization using Azure cloud technologies, as well as the creation of data pipelines Solid understanding of Azure cloud components (HDInsight, Databricks, Data Lake, and Blob storage)
  • Experience with real-time data processing with Azure Synapse Analytics and using Azure Analysis Services for governing, deploying, testing, and delivering business solutions
  • Experienced in developing models using logistic regression, random forest, XGboost, etc on large scale data and deploying them in cloud production environment
  • Worked on real-time data processing solutions, reducing data latency by 15%
  • Spearheaded the architecture and implementation of Azure Bot Service solutions for customer feedback, achieving a 15% improvement in user engagement
  • Utilized Azure services such as Azure Functions, Azure App Service, and Azure Cognitive Services to enhance the chatbot's capabilities and responsiveness
  • Conducted in-depth analysis of system processing flows, identifying bottlenecks and implementing performance enhancements
  • Created and documented technical and functional specifications, providing a clear roadmap for development teams
  • Assisted in selecting appropriate AI/ML technologies and tools to ensure optimal system performance
  • Experienced in optimizing personalization algorithms for applications with 2M+ users
  • Experience in accurate prediction of sales within 2% by applying machine lrearnig algorithms
  • Developed and implemented end-to-end data pipelines using the Azure Data Factory, Azure Databricks, and Azure Storage for ingesting, transforming, and loading data
  • Utilizing Apache Spark, Python, and Azure Databricks notebooks, effective ETL workflows were developed, guaranteeing great performance and scalability
  • Python and SQL were utilized for carrying out data transformation and cleaning tasks to get the data ready for analysis
  • Assisted team members in locating and fixing problems with data quality and in streamlining data processing processes
  • Assisted in the development of Power BI reports and data visualizations to deliver insights to stakeholders
  • Data integration and model creation were done to support business intelligence and analytics projects
  • Implementing indexing and partitioning techniques enhanced query performance and optimized data processing
  • Scheduling and orchestration using Airflow, and Oozie, streamlining model training and deployment processes
  • To guarantee data accuracy and consistency, data governance procedures were put into place
  • Utilized Azure services like Azure Cosmos DB and Azure SQL Database for effective data storage
  • Developed monitoring and notification alerts to proactively identify data pipeline issues and ensure data availability
  • Actively participated in code reviews and mentored junior team members in best programming practices.

ETL developer

BANK OF AMERICA
07.2018 - 05.2019
  • Using Informatica PowerCenter, designed and created complete ETL solutions, including procedures for data extraction, transformation, and loading
  • Engaged with business stakeholders to collect requirements, assess the need for data integration, and translate those requirements into technical specifications
  • Developed and deployed Informatica PowerCenter ETL mappings, processes, and sessions to guarantee data integrity and boost performance
  • Created intricate SQL queries, stored procedures, and scripts to extract, modify, and load data into the data warehouse from a variety of sources
  • Implemented data cleansing and validation techniques to maintain data quality and consistency across different data sources
  • Developed and optimized SQL queries to extract and transform data, ensuring efficient data processing and adherence to business rules
  • Collaborated with cross-functional teams to gather requirements, analyze data integration needs, and design appropriate data models and schemas
  • Implemented data validation and error handling mechanisms to identify and resolve data quality issues during the ETL process
  • Optimized ETL processes for performance and scalability by incorporating parallel processing, partitioning strategies, and optimizing resource utilization
  • Conducted data profiling and analysis to identify data patterns, anomalies, and data quality improvement opportunities
  • Documented ETL processes, data mappings, and transformations for easy maintenance and future reference.

Data Analyst

COGNIZANT TECHNOLOGY SOLUTIONS
06.2016 - 07.2018
  • Conducted data analysis and performed exploratory data analysis (EDA) to understand data patterns, trends, and relationships
  • Applied statistical techniques to uncover insights, identify correlations, and perform hypothesis testing to support business decision-making
  • Developed predictive models using machine learning algorithms to solve business problems and make data-driven recommendations
  • Conducted feature engineering and selection to optimize model performance and improve accuracy
  • Collaborated with cross-functional teams to define project objectives, gather requirements, and translate business questions into analytical approaches
  • Utilized data visualization tools such as Matplotlib, Seaborn, or Tableau to present analysis findings and insights to stakeholders
  • Worked closely with data engineers and software developers to ensure data quality, data integrity, and proper integration of ML models into production systems
  • Collected, cleansed, and prepared data for analysis using SQL queries, Excel, and scripting languages
  • Performed data mining and statistical analysis to identify trends, patterns, and anomalies
  • Conducted model evaluation and performance metrics analysis to assess the effectiveness and reliability of ML models
  • Stayed updated with the latest trends and advancements in the field of machine learning and data analysis, exploring new techniques and methodologies.

Education

Master of Science - Computer Engineering

Arizona State University
Tempe, AZ
2021

Skills

  • Languages: SQL, Unix shell, Python, R
  • Python Libraries: Open CV , PyTorch , TensorFlow, scikit-learn ,Keras, Pyspark ,NLTK , Sumy ,matplotlib, transformers
  • Machine Learning Frameworks: TensorFlow, PyTorch, Keras, scikit-learn, XGBoost, CatboostLangChain, LangServe, LlamaIndex, Azure Open AI, Databricks, Huggingface
  • Machine Learning & Deep Learning: Support Vector Machines (SVM), k-Nearest Neighbors (kNN), Linear/Logistic Regression (regularized), Clustering techniques, Ensemble methods (Bagging, Boosting, Stacking), Time Series Forecasting, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTMs), Generative Adversarial Networks (GANs)
  • Databases: SQL Server 2008/2012/2014, MySQL, Oracle 10g, 11g,12c,19c
  • Azure cloud Technologies: Azure Synapse Analytics, Databricks, Data Lake, Blob Storage, Data Factory, Azure Analytic Service, Azure Bot Service, Azure Cognitive Services, Azure Machine Learningand other cloud tools
  • GCP technologies: Google Cloud Storage, Big Query, Dataproc , Dataflow, Pub/Sub , Composer , Cloud Monitoring and Data Fusion
  • Dashboard : Designing real time dashboards using PowerBI and Tableau

Timeline

MACHINE LEARNING ENGINEER

MASTERCARD
03.2021 - Current

MACHINE LEARNING ENGINEER

FORD
11.2019 - 03.2021

ETL developer

BANK OF AMERICA
07.2018 - 05.2019

Data Analyst

COGNIZANT TECHNOLOGY SOLUTIONS
06.2016 - 07.2018

Master of Science - Computer Engineering

Arizona State University
ABHIJITH