Seasoned Data Scientist with a proven track record at Infosys Technologies, enhancing data-driven decisions through advanced machine learning models and AI technologies. Expert in Python, SQL, and predictive analytics, achieving a 25% improvement in ATM cash demand forecasting. Demonstrates strong analytical skills and a knack for transforming complex data into actionable insights. Innovative data scientist with a robust background in machine learning, statistical analysis, and predictive modeling. Skilled in translating complex datasets into actionable insights that drive decision-making and business strategy improvements. Demonstrates strong problem-solving abilities and mastery of Python, R, SQL, and data visualization tools. Previous work has led to significant enhancements in operational efficiency and revenue growth through data-driven strategies.
- Participated in the analysis, design, and implementation of business user requirements.
- Analyzed data patterns to forecast customer behavior based on past transactions, leading to the development of a recommender system. Promoted the use of debit and credit cards by offering discounts at stores related to customers' historical transaction data.
- Utilized Python and deep learning algorithms to create an ATM cash demand prediction model, achieving 25% greater accuracy than the previous system, which helped reduce transportation, logistics, freezing, and insurance costs.
- Developed a machine learning model for predicting health insurance needs, enabling customers to find better health plans while assisting the organization in minimizing unexpected losses and liabilities.
- Created a model to identify fraudulent credit card transactions, improving detection accuracy by 30% for both fraudulent and legitimate transactions.
- Built predictive models using Python to estimate attendance probabilities for various campaigns and events.
- Conducted exploratory data analysis (EDA) to examine features, eliminating irrelevant columns, constant values, duplicates, and highly correlated features to simplify the model.
- Employed various machine learning algorithms and statistical methods, including decision trees, regression models, neural networks, SVM, and clustering, to assess volume using the scikit-learn library in Python and MATLAB.
- Leveraged libraries such as Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy, and NLTK in Python to develop various machine learning algorithms.
- Conducted data integrity checks, data cleansing, exploratory analysis, and feature engineering using Python libraries like Pandas and Matplotlib.
- Addressed missing values and identified outliers using statistical methods with Pandas and NumPy.
- Worked closely with data engineers and the operations team to establish the ET process, crafting and refining SQL queries for data extraction to meet analytical needs.
- Conducted data analysis using Hive to pull information from the Hadoop cluster and SQL to access data from RedShift.
- Investigated and assessed customer-specific features through Spark SQL.
- Carried out univariate and multivariate analyses to uncover patterns in the data and relationships between variables.
- Applied data imputation techniques using the Scikit-learn library in Python.
- Engaged in feature engineering, including feature intersection generation, normalization, and label encoding using Scikit-learn preprocessing tools.
- Leveraged Python 3.X (including libraries like NumPy, SciPy, pandas, scikit-learn, and seaborn) and Spark 2.0 (PySpark, MLlib) to create various models and algorithms for analytics.
- Developed and implemented predictive models employing machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA, and regularization techniques.
- Analyzed customer consumption patterns and assessed customer value using RMF analysis, applying customer segmentation through clustering methods like K-Means and Hierarchical Clustering.
- Built regression models, including Lasso, Ridge, SVR, and XGBoost, to forecast Customer Lifetime Value.
- Created classification models such as Logistic Regression, SVM, Decision Trees, and Random Forests to predict Customer Churn Rate.
- Evaluated model performance using metrics like F-Score, AUC/ROC, Confusion Matrix, MAE, and RMSE.
- Designed and implemented recommender systems utilizing collaborative filtering techniques to suggest courses to various customers, deployed on AWS EMR clusters.
- Employed natural language processing (NLP) techniques to enhance customer satisfaction.
- Developed comprehensive data visualizations to present data in an accessible format using Tableau and Matplotlib.
Languages and Frameworks: Python, PySpark, SQL, NoSQL, NumPy, Pandas, Matplotlib, NLTK, and spaCy
Database: MySQL, PostgreSQL, Microsoft SQL Server, Oracle, MongoDB, Cassandra, and Redis
Artificial Intelligence: Machine Learning Algorithms, Random Forest, Linear Regression, SVM, Decision Tree, Text Mining, NLP, ARIMA, TensorFlow, Keras, Sci-kit Learn, Deep Learning Algorithms, FCN, FRCNN, YOLO, ResNet50, CNN, VGG16, LSTM, GAN, LLMs, BERT, Transformers
Tools & Technology: Power BI, JIRA, AWS (RDS, EC2, S3, Redshift), Databricks, Azure Data Factory, Azure Data Lake, Apache Spark, MS Excel, Apache Kafka