In this post I explore an Austin Housing dataset and predict binned housing price. EDA includes static and interactive geospacial feature maps and feature engineering using natural language processing (NLP). After training/tuning multi-class XGBoost models , I run batch inference to predict the price of Austin, TX houses. I then submit predictions to the Kaggle competition which scrored 0.8876 (mlogloss), which would have placed 6th in the live competition. After submission, I generate SHapley Additive exPlanations (SHAP) plots to understand how XGBoost made predictions.
For this post, I experimented using AWS SageMaker with the AWS built-in XGBoost algorithm from within my local RStudio to predict whether a bank customer has churned. The data comes from the SLICED season 1 episode 7 Kaggle competition. SLICED is a data science competition where contestants are given a never-before-seen dataset and two-hours to code a solution to a prection challenge.
In my first blog post, I use the time-series modeling package {modeltime} and a Kaggle sales dataset to forecast 3 months of daily sales. This is a {tidymodels} approach with with a {modeltime} twist.