top of page
aerial-shot-aria-hotel-las-vegas.jpg

Predicting Hotel Reservation Cancellation

By employing and comparing the effectiveness of various machine learning algorithms—including Logistic Regression, K-Nearest Neighbors, Random Forest, and XGBoost—in predicting cancellation, the goal is to develop a robust predictive framework.

Hotel 4.png

Problem Statement

This study targets the hospitality industry's challenge of booking cancellations, aiming to mitigate their impact on revenue and operations. By developing a predictive model using extensive hotel booking data, we seek to improve forecasting accuracy, aiding strategic planning and decision-making.

Data Exploration

The lead time and seasonal trends significantly influence cancellation rates, with longer lead times and peak summer months seeing higher cancellations. The choice of booking channel is also a critical indicator. Guest demographics and preferences, such as the number of adults and chosen meal plans, also have impact.

Hotel 1.png
Hotel 2.png

Model Evaluation

Journey from logistic regression to Random Forest involved refining models and identifying key predictors, significantly enhancing predictive accuracy, precision, recall, and AUC, with Random Forest excelling in complex data interpretation and cancellation predictions.

Conclusion

We navigated the precision-sensitivity trade-off, adjusting thresholds to balance identifying potential cancellations against minimizing false positives.

In high-cancellation-cost scenarios, we favored Random Forest for its nuanced data understanding, optimizing for sensitivity. Where overbooking risks loomed, Logistic Regression's precision was preferred for its directness.

Hotel 3.png

Toolkit

Programming Language: R;

Exploration: dplyr, ggplot2;

Models: MASS, randomForest, gbm, class, glmnet;

Evaluation: caret, pROC, performanceEstimation, boot.

bottom of page