A temporal and spatial study of crime committed in Chicago | CS 6010 (Data Science Programming ), Fall - 2019, BGSU
- Data cleaning, data manipulation, visualization, checking data quality
- Feature selection using Correlation Matrix, Chi-Squared Test, PCA, Extra Tree Classifier, Recursive Feature Elimination, etc., draw a map for the crime hot spots
- Fitting Neural Network, Random Forest, Logistic , SARIMA etc. Models, Use Over Sampling, compared Confusion Matrix, ROC, AUC, RMSE, etc., Predict the odds of Arrest, Predict the future Arrest counts
- Research paper review, develop the research methodology, research report writing, creating poster, presentation
Computational Issues and Hyper Parameter optimization in LSTM | CS 7200 (Machine Learning ), Spring - 2020, BGSU
- Data Preprocessing, dividing data into training & validation set for optimizing the hyper-parameter using k-fold cross validation.
- Implementing Single LSTM, Stacked LSTM, Bidirectional LSTM, and CNN-LSTM models separately for hyper-parameter searching.
- Compared model performance and computational time using different optimizer, hidden size, learning rate, dropout rate, batch size, embedding vector size etc, paper review, report writing, creating poster, presentation.
SARIMA Forecasting and Analysis (Chicago Crime Data) | STAT 758 ( Time Series Analysis ), Fall-2018, UNR
- Data collection, Data preprocess (e.g., fill the missing data, visualize the Arrests count to see the seasonal dip & trim the data for further analysis.), Split the data set into training and testing data set
- Autocorrelation (ACF) and Partial autocorrelation (PACF) plots, grid search, use the Akaike information criterion (AIC) to select the orders of different forecasting models
- Fit SARIMA, ARIMA, Simple Exponential Smoothing (SES) etc. forecasting models, Check and use further differencing to make the model stationary ( if the model is non-stationary)
- Short-term (one-week, two-weeks, etc.) & long-term forecasts (three months, six months, etc.) for Arrest counts
- Use the testing data, Pearson correlation coefficient (r2), mean absolute error (MAE), Residual plots to assess the forecast quality of fitted models
- Research paper review, develop the research methodology, research report writing, presentation
Variational Bayesian Inference for Multivariate Normal Distribution | MATH 629 (Topics Applied Analysis), Spring-2019, UNR
- Generate random data points from standard normal distribution with known covariance
- Generate mean and covariance for conjugate prior normal distribution, obtain the true posterior mean analytically
- Employ ADVI using statistical software, RStan
- Approximate posterior, calculate means and bias, compute statistics for bias
- Research paper review, develop the research methodology, poster presentation
How does Sepal Width, Petal Length, Petal Width explain Sepal Length in different species of iris plant? | STAT 757 (Applied Regression Analysis), Spring-2019, UNR
- Checking/Removing Outliers, Influential Observations, Collinearity, Residual Analysis, Sensitivity Analysis
- Finding the best model that can explain the maximum variability by R-Squared, Adjusted R-Squared
Factors that determine the Gasoline Consumption | STAT 652 (Intro:Regression/Linear Models) , Spring-2018, UNR
- Checked the assumptions of multiple linear regression model and transformed the data for violation of any assumptions
A Study on Improving Livelihood of Rural Women Through Income Generating Activities in Bangladesh | STAT H - 408 (Research Methodology and Survey Project) , Year - 2014, DU
- Hands-on experience in collecting primary data, designing the framework of the questionnaire and analyzing those data to deliver valid inferences for assessing the overall status of rural women