Unsupervised ML Algorithms
The Trade & Ahead Project uses Cluster Analysis by K-Means, PCA and t-SNE to group publicly traded companies based on financial metrics and fundamentals. The algorithm was able to reliably separate the companies into 5 distinct groups with very different business dynamics and likelihoods of providing an ROI.
Predictive Maintenance
The ReneWind project uses ML to predict failure of wind turbines based on a variety of onboard sensor data. Decision Tree, Bagging, Random Forest, Gradient Boosing, Adaboost and XGBoost were evaluated and a Pipeline was built for the Gradient Boost model as it had the best performance.
Supervised ML Algorithms with Hyperparameter Tuning
The EasyVisa model predicts an applicants chances of being approved for a work visa based on demographic, economic and career related data. Bagging, Boosting, Random Forest, Adaboost, Gradient Boost, XGBoost and Stacking models were evaluated with hyperparameter tuning to maximize performance.
Logistic Regression and Decision Tree
The INN Hotels project predicts the probability of customer cancellation for a Portugal based hotel chain given demographic, customer history and behavioral data using Logistic Regression with ROC-AUC to develop explainable coefficients and compare to Decision Tree.
Linear Regression, Data Cleaning & EDA
The ReCell Model uses OLS to predict the resale price of used cell phones based on screen size, 5g compatibility, camera resolution, RAM, weight, age and original sale price. The regression equation shows the influence of each variable and allows us to gain an edge by focusing on high margin items.
Hypothesis Test & Statistical Analysis
The E-News Express project performs an a/b test on a new website vs. the old website using the Two-Sample Independent t- and Proportions z-tests, Chi-Square test for independence and ANOVA. Sites were compared based on time spent on the page, their preferred language and whether they clicked one of the links.