
· 7+ years of experience as a Machine Learning Engineer, specializing in developing and implementing advanced algorithms and models.
· Proficient in Python and R programming languages, with expertise in machine learning frameworks like TensorFlow and PyTorch.
· Skilled in designing and deploying scalable machine learning solutions across diverse industries, including finance, healthcare, and e-commerce.
· Strong background in data analysis, pattern recognition, and deep learning techniques.
· Demonstrated ability to preprocess and transform raw data, perform feature engineering, and select relevant features for improved model performance.
· Experienced in evaluating and validating models using various metrics, cross-validation, and hyperparameter tuning techniques.
· Knowledgeable in big data technologies such as Apache Spark for processing large-scale datasets and building scalable machine learning pipelines.
· Familiarity with cloud platforms like GCP and Azure for deploying machine learning models in production environments.
· Proven track record of collaborating effectively with cross-functional teams and stakeholders to deliver successful projects.
· Excellent problem-solving skills and a passion for staying updated on the latest advancements in the machine learning field.
· expertise in building and deploying end-to-end data pipelines on Google Cloud Platform (GCP) as well as in Azure cloud.
· Proficient in utilizing GCP services such as Big Query, Dataflow, and Cloud Storage for data ingestion, transformation, and loading. Experienced in working with structured and semi-structured datasets, employing Hive queries for analysis.
· Strong background in metadata management using Google Cloud Data Catalog, including custom Python program development and adherence to CI/CD guidelines.
· Experienced in constructing end-to-end data pipelines for batch processing, utilizing Spark, Scala, and Hadoop clusters on GCP and proficient in creating pipelines using Flume, Kafka, and Spark Streaming for real-time data ingestion.
· Skilled in utilizing PySpark's Spark SQL API for data import, extraction, and SQL querying. Proficient in implementing data encryption using hashing algorithms and developing efficient ETL processes using Apache Spark and Python.
· Demonstrated proficiency in Sqoop for incremental and batch data ingestion from various databases. Skilled in data lake processing, utilizing DISTCP for data loading, Scala, Spark, Spark SQL, Hive, Impala Query, and Hive tables for data processing and ML algorithm integration.
· Strong expertise in developing Big Query authorized views for data exposure and security. Skilled in using Cloud Shell for various tasks and service deployment on GCP.
· Proficient with Azure components such as HDInsight, Databricks, Data Lake, and Blob storage and competent in real-time data processing with Azure Synapse Analytics with business solution deployment using Azure Analysis Services.
· Expertise in building end-to-end data pipelines for data input, transformation, and loading utilizing Azure Data Factory, Azure Databricks, and Azure Storage.
· Administered Azure Data Lake Storage, Databricks, and Data Lake components, as well as delivering structured data to Azure Blob Storage via Synapse Pipelines.
· Expertise in using Apache Spark pools to clean, convert, and analyze streaming data, as well as integrate it with structured data from operational databases or data warehouses.
· Contributed to the creation of Power BI reports and data visualizations to provide insights to stakeholders.
· Skilled in conducting data analysis and exploratory data analysis (EDA) to uncover patterns, trends, and relationships.
· Proficient in applying statistical techniques for insights generation, correlation identification, and hypothesis testing to support business decision-making.
· Experienced in developing predictive models using machine learning algorithms, optimizing model performance through feature engineering and selection.
· Worked closely with cross-functional teams and presenting analysis findings using data visualization tools.
· Experienced with ETL development using Informatica PowerCenter and IBM DataStage for designing and creating end-to-end ETL solutions.
· Skilled in data extraction, transformation, and loading, collaborating with business stakeholders to gather requirements and translate them into technical specifications.
· Proficient in developing ETL mappings, processes, and sessions to ensure data integrity and performance.
· Capable of writing complex SQL queries, stored procedures, and scripts for data extraction, transformation, and loading.
· Skilled in data cleansing, validation, and error handling techniques to maintain data quality.
· Strong analytical skills for data profiling, analysis, and documentation of ETL processes, mappings, and transformations.
· Key Technologies:
· Programming Languages: Python, R
· Machine Learning Frameworks: TensorFlow, PyTorch, Keras, scikit-learn, XGBoost
· Deep Learning: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs)
· Natural Language Processing (NLP): Text classification, Named Entity Recognition (NER), Sentiment Analysis
· Natural Language Understanding (NLU): Intent recognition, Language modeling
· Data Preprocessing and Feature Engineering: Data cleaning, feature selection, dimensionality reduction
· Model Evaluation and Validation: Cross-validation, hyperparameter optimization, evaluation metrics (accuracy, precision, recall, F1 score)
· Big Data Technologies: Apache Spark
· Software Development: Git, Agile methodologies
Cloud Platforms: GCP, Azure