· Around 5 years of extensive experience in Data science, concentrated in dealing with data using Python, R and SQL
· Experienced Data Scientist with expertise in Data Mining, Data Cleaning, Exploring and Visualizing Data, building and evaluating statistical models, preparing dashboard and ultimately implementing best suitable machine learning models to make strong decisions
· Hands on experience using Python 3.x undertaking data analytics and visualizations using various core analytical Python libraries such as Numpy, Scipy, Pandas and Scikit - learn
· Well-versed in data-wrangling, loading and working efficiently through SQL server and writing complex queries
· Strong Knowledge in Statistical methodologies such as Hypothesis Testing.
· Strong ability to use optimization Techniques like Gradient Descent, Stochastic Gradient Descent for regression models
· Profound knowledge of various Supervised and Unsupervised machine learning algorithms such as Ensemble Methods, Clustering algorithms, Classification algorithms and Time Series models (AR, MA, ARMA, ARIMA)
· Cleaned and manipulated complex datasets to create the data foundation for further analysis and the development of key insights (MS SQL server, R, Tableau, Excel)
· Incorporated various machine learning algorithms and advanced statistical analysis like decision trees, regression models, SVM, clustering using scikit-learn package in Python
· Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python
· Performed Exploratory Data Analysis (EDA) to visualize through various plots and graphs using matplotlib and seaborn library of python, and to understand and discover the patterns on the Data, understanding correlation in the features using heatmap, performed hypothesis testing to check significance of the features
· Developed analytical approaches to answer high-level questions and provided insightful recommendations
· Conducted various statistical analysis like linear regression.
· Involved in extracting customer's Big Data from various data sources (Excel, Flat Files, Oracle, SQL Server, MongoDB, Teradata, and also log data from servers) into Hadoop HDFS
· Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn in Python for developing various machine learning algorithms
· Assured data quality and data integrity, and optimized data collection procedures on a weekly and monthly basis
· Created Data Quality Scripts using SQL and Hive to validate successful data load and assured the quality of data.
· Worked on different data formats such as JSON, XML, CSV, .dat and exported the data into data visualization/ ETL platform
· Evaluated model performance using techniques like R square, adjusted R square, confusion matrix, AUC, ROC curve, Root mean squared error etc.
· Incorporated, Developed and applied metrics and prototypes that could be used to drive business decisions
· Participated in ongoing research, and evaluation of new technologies and analytical solutions to optimize the model performance
· Used problem-solving skills to find and correct the data problems, applied statistical methods to adjust and project results when necessary
· Worked across cross-functional teams to understand the data requirements and provided the detailed analytical reports to accomplish the business decisions
Machine Learning: Classification, Regression, Feature Engineering, One hot coding, Clustering, Regression analysis, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, KNN, Ensemble Methods, K-Means Clustering, Time Series Analysis, Confidence Intervals, Principal Component Analysis and Dimensionality Reduction
undefinedMicrosoft Certified: Azure Data Scientist Associate
Microsoft Certified: Azure Data Scientist Associate