Predicting Long-Term Commercialisation of Airbnb Properties

A qualitative study to understand how Airbnb has been disrupting the hotel business
TITLE
Understanding and Predicting the Long-Term Commercialization of Properties Through Airbnb Using SAS Enterprise Miner
Introduction
This study examines the impact of Airbnb on the housing market, hospitality industry, economy, and neighborhoods. It highlights how Airbnb, once touted as a platform for sharing and peer-to-peer economies, has transformed into a commercial business, leading to intensified gentrification and housing pressures on working-class communities. The study reveals a significant increase in short-term rentals, causing concerns about the diversion from the original sharing economy concept. It also explores the threats posed to the hotel industry, including reduced hotel prices and changing guest attitudes, leading to a decline in hotel revenue. Furthermore, the study showcases the economic contributions of hotels and the adverse effects of Airbnb on job creation and housing affordability. Lastly, it addresses the negative impact on neighborhoods, such as tenant expulsion and a loss of community cohesion. These findings suggest the need for stronger regulations and government policies to address the evolving landscape of the sharing economy.
Tools and technologies
SAS Enterprise Miner
Excel

Data Understanding

Focusing on the number of listings, hosts, and their trends over time. The data reveals a decline in the total number of listings and hosts between December 2020 and December 2021. However, the average number of listings per host has increased, indicating a rise in hosts with multiple listings. The report also highlights a significant increase in the number of entire home or apartment listings, comprising a majority of the listings, while hotel rooms, private rooms, and shared rooms have seen a decline. Moreover, the average cost of stay per night has increased, influenced by the impact of the COVID-19 pandemic on the tourism industry. From the customer's perspective, there has been a substantial increase in the average number of reviews, indicating higher bookings on Airbnb following the relaxation of COVID-19 restrictions. These findings shed light on the evolving dynamics of the Airbnb marketplace and its relationship with the tourism industry.

Data Understanding

This section of the case study report focuses on the data preparation process conducted to analyze the commercialization of properties on Airbnb. Since there is no direct variable indicating commercialization, the researchers make assumptions based on available data. They assume that hosts renting out entire homes or apartments are likely to have a long-term business intention. To analyze the data and create clusters, several variables are created.

The report mentions that variables have been sorted and rejected after careful examination of their distribution and correlations with other variables. Some variables containing personal identification information are removed, while others are replaced with counterparts to assess their importance. Additionally, an occupancy rate threshold of 70% is applied to avoid producing biased results based on the data.

Overall, this data preparation process helps establish relevant variables to investigate the commercialization of properties on Airbnb and enables subsequent analysis and clustering.

The Model

unsupervised learning

This section of the study report focuses on the unsupervised learning analysis conducted using centroid (standard) and average (range) clustering methods. The Centroid clustering method groups members with more similarities within a cluster than with members of other clusters, resulting in distinct clusters that help identify their characteristics. The Average clustering method determines the best pairs of clusters based on proximity, leading to more accurate analysis.

The Centroid (Standard) clustering resulted in two clusters, with one being larger than the other. Cluster 1 has a higher number of listings per host, higher prices, and primarily consists of entire houses and apartments with more bedrooms and beds. This suggests that cluster 1 is more commercially oriented, where hosts acquire multiple properties and let them out for higher profits. This contradicts Airbnb's concept of occasional letting and sharing economy. Surprisingly, the minimum number of nights for stays in cluster 1 is smaller compared to cluster 2.

Using the Average (Range) clustering method, five clusters were formed. Cluster 4 is the largest and has a high number of listings per host (>50). Clusters 2 and 4 also exhibit more commercial characteristics, including high listing counts per host, high occupancy, more bedrooms, high availability, high prices, and consisting entirely of houses and apartments. However, the average number of years of experience for hosts in these clusters is relatively low (<4.5) compared to the other three clusters. This indicates that most hosts in clusters 2 and 4 have recently joined Airbnb with a business intention.

Overall, these unsupervised learning analyses provide insights into the clustering patterns and help identify clusters that exhibit commercialization tendencies on Airbnb.

Supervised learning

This section evaluates three decision trees generated using different clustering methods to identify hosts on Airbnb with intentions of long-term commercialization. The first decision tree, derived from Centroid (Stand.) clustering with segment 1 as the target, achieved high accuracies and highlighted the significance of long-term guest occupancy in determining hosts' plans for business-oriented property letting. The second decision tree, resulting from Average (Range) clustering with segments 2 and 4 as the target, had relatively high accuracies but poorly selected defining variables, leading to low specificity and unreliability in identifying hosts interested in long-term commercialization. The third decision tree, obtained from K-means clustering with segments 2 and 3 as the target, also had high accuracies but suffered from similar issues of poorly selected defining variables and low specificity. Comparing the three trees, Decision Tree 3 had a better ROC curve performance but was chosen based on the misclassification rate, while Decision Tree 1 offered lower misclassification rates, more classification variables, higher cumulative lift, and potential for accurate targeting of hosts interested in long-term commercialization.

Decision Trees

Tree 1
Tree 2
Tree 3
conclusion
The increased number of listings and long-term letting on platforms like Airbnb have had significant impacts on the hotel industry and the UK economy, leading to rising rental prices and displacing locals. The analysis identified key characteristics of long-term commercial listings, such as entire house/apartment listings, higher numbers of listings per host, high prices, and more beds per listing. However, the analysis had limitations due to estimated variables and assumed understanding, which could affect reliability and lead to arguments with hosts. The decision tree based on the Centroid (Stand.) clustering showed promising defining variables, but improvements could be made by including more factors and data to enhance accuracy and efficiency. The analysis could assist regulatory bodies in predicting commercially intended listings and informing regulations to control the impact, such as limiting the number of listings, restricting long-term property listings, or implementing price caps.