
Problem Statement
This study targets the hospitality industry's challenge of booking cancellations, aiming to mitigate their impact on revenue and operations. By developing a predictive model using extensive hotel booking data, we seek to improve forecasting accuracy, aiding strategic planning and decision-making.
Data Exploration
The lead time and seasonal trends significantly influence cancellation rates, with longer lead times and peak summer months seeing higher cancellations. The choice of booking channel is also a critical indicator. Guest demographics and preferences, such as the number of adults and chosen meal plans, also have impact.


Model Evaluation
Journey from logistic regression to Random Forest involved refining models and identifying key predictors, significantly enhancing predictive accuracy, precision, recall, and AUC, with Random Forest excelling in complex data interpretation and cancellation predictions.
Conclusion
We navigated the precision-sensitivity trade-off, adjusting thresholds to balance identifying potential cancellations against minimizing false positives.
In high-cancellation-cost scenarios, we favored Random Forest for its nuanced data understanding, optimizing for sensitivity. Where overbooking risks loomed, Logistic Regression's precision was preferred for its directness.

Toolkit
Programming Language: R;
Exploration: dplyr, ggplot2;
Models: MASS, randomForest, gbm, class, glmnet;
Evaluation: caret, pROC, performanceEstimation, boot.