top of page

Data & Analytics

New York City Airbnb Rentals  Project

Problem Statement: Short-term rentals on platforms like Airbnb contribute to housing crises in New York, causing soaring rents, a lack of affordable homes, and impacting traditional housing markets and neighborhood desirability. Understanding this relationship is essential for informed policymaking and effective regulation for urban sustainability. Hypothesis Alternative: Increased prevalence of short-term rentals on platforms like Airbnb contributes to a rise in rent prices and a reduction in housing supply in the boroughs of New York. Null: There is no significant relationship between the prevalence of short-term rentals on platforms like Airbnb and rent prices or housing 

Hypothesis

Alternative: Increased prevalence of short-term rentals on platforms like Airbnb contributes to a rise in rent prices and a reduction in housing supply in the boroughs of New York.

Null: There is no significant relationship between the prevalence of short-term rentals on platforms like Airbnb and rent prices or housing supply in the boroughs of New York.

Independent

  • Location: Neighborhood

  • Neighborhood Group

Dependent

  • Price

  • Number of Reviews

Step 1: Clean the Data

Deletion of Specific Rows & Columns:

• Deleted rows with IDs (20933849, 20624541, 21291569) due to zero price values, which could distort analyses.

• Removed rows (21291569, 20933849) lacking reviews and with unreliable or invalid data (e.g., minimum nights of stay).

• Removed latitude and longitude columns to streamline and simplify the dataset, as geographic information was not relevant for analysis.

Exclusion of Rows Without Crucial Information:

• Excluded rows with IDs (1615764, 2232600, 4209595, ...) due to missing name data deemed crucial for analysis.

Currency Symbol Removal:

• Removed currency symbols from the price column to facilitate numerical calculations.

Deletion of Duplicate Rows:

• Deleted 11,453 rows with duplicate "host id" values to ensure data accuracy and integrity. 

Formatting Column Names:

• Formatted column names for consistency and improved readability.

• Used proper functions to manage column names, including hiding unnecessary information (e.g., "name 2") to maintain clarity and functionality.

Step 2: Frequency Histogram

Independent Variables: "Neighborhood Group", "Room Type"

Dependent Variables: "Price", "Number of Reviews"

 

Data Transformation:

•Utilized the IFS function in Excel to convert non-numerical independent variables into a numerical format.

Defined Bins:

•Defined bins with specific intervals based on minimum and maximum values of the variables to determine the frequency of data points within each bin.

•Calculated relative frequency and cumulative frequency for each bin to understand distribution patterns.

 

Histogram Creation:

•Developed histogram to visually depict frequency distributions for each variable graphically.

Computed Descriptive Statistics:

•Mean, Standard Deviation, Sample Variance

Result:

•Concluded that the data did not exhibit a normal distribution.

•Observed high frequency in the initial bins, indicating deviation from normal distribution assumptions.

Step 3: Correlation

Dependent Variables: "Price", "Number of Reviews"
Qualitative variables were not tested*

Correlation Setup
• Ran the correlation table using "Correlation" in the Data Analysis Tool Pack.

Output
• Displayed a negative slope, indicating an inverse relationship between "Price“ and "Number of Reviews“. As a result, there was low collinearity.

Trendline
• A trendline was inserted to explain the model’s relationship between X and Y. A polynomial trendline was decided on because it represented the model the best.

Results
• The R2 value of our model was 0.0017, which is far less than the R2 value of 0.80 that we compared it to. These results show us that our variable are not related and are not similar items being measured.

Step 4: Chi-Squared Test

Independent Variables: "Neighborhood Group", “Room Type”

PivotTable
• Created a PivotTable to find observed and predicted data from our variables.

Chi-Square Test
​• Results Returned a P-Value of 9E-153 which is much smaller than 0.05. 

Results Explained
• A P-Value that is less than or equal to 0.05 is statistically significant. Our small P-Value suggests strong evidence to reject the null hypothesis and accept the alternative hypothesis.

Step 5: Multiple Linear Regression

Independent Variables: "Calculated Host Listing Count", "Availability 365" Dependent Variables: "Number of Reviews"

 

Model Significance:

• Both P-values and Significance F are less than 0.05, indicating the overall model is statistically significant.

Regression Statistics:

• R-Square: 0.01

• Adjusted R-Square: 0.01

Relationships Between Variables and Price:

• Number of Reviews: Weak negative relationship (-7E-05x); as reviews increase, price tends to decrease.

• Calculated Host Listing Count: Weak negative relationship (-0.0067x); as count increases, prices generally decrease.

• Availability 365: Weak positive relationship (0.0453); as availability increases, prices tend to rise.

Process and Tests Applied:

• Conducted multiple linear regression analysis.

• Focused on key regression statistics including R-Square, Adjusted R-Square, Observations, Significance F, and P-value to assess model significance and relationships.

• Determined coefficient strength and direction through scatter plot visualization of independent variables against the dependent variable.

Key Insights & Results

Key Insights:
• Statistical Significance: Multiple linear regression confirms the model is statistically significant.

• Weak Relationships: Airbnb’s impact on rent prices and housing supply is weak, suggesting minor direct influence.

• Localized Effects: Potential effects in specific NYC boroughs with unique rental market dynamics.

• Influential External Factors:
Economic conditions, zoning laws, and other housing demands are likely more impactful.

Hypothesis Testing Results:
• Outcome: Failed to reject the null hypothesis.
• Interpretation: Insufficient evidence that Airbnb significantly affects rent prices or housing supply.

© 2024 by Caroline Kornberg. 

bottom of page