Data & Analytics

New York City Airbnb Rentals Project
Problem Statement: Short-term rentals on platforms like Airbnb contribute to housing crises in New York, causing soaring rents, a lack of affordable homes, and impacting traditional housing markets and neighborhood desirability. Understanding this relationship is essential for informed policymaking and effective regulation for urban sustainability. Hypothesis Alternative: Increased prevalence of short-term rentals on platforms like Airbnb contributes to a rise in rent prices and a reduction in housing supply in the boroughs of New York. Null: There is no significant relationship between the prevalence of short-term rentals on platforms like Airbnb and rent prices or housing
Hypothesis
Alternative: Increased prevalence of short-term rentals on platforms like Airbnb contributes to a rise in rent prices and a reduction in housing supply in the boroughs of New York.
Null: There is no significant relationship between the prevalence of short-term rentals on platforms like Airbnb and rent prices or housing supply in the boroughs of New York.
Independent
-
Location: Neighborhood
-
Neighborhood Group
Dependent
-
Price
-
Number of Reviews
Step 1: Clean the Data
Deletion of Specific Rows & Columns:
• Deleted rows with IDs (20933849, 20624541, 21291569) due to zero price values, which could distort analyses.
• Removed rows (21291569, 20933849) lacking reviews and with unreliable or invalid data (e.g., minimum nights of stay).
• Removed latitude and longitude columns to streamline and simplify the dataset, as geographic information was not relevant for analysis.
Exclusion of Rows Without Crucial Information:
• Excluded rows with IDs (1615764, 2232600, 4209595, ...) due to missing name data deemed crucial for analysis.
Currency Symbol Removal:
• Removed currency symbols from the price column to facilitate numerical calculations.
Deletion of Duplicate Rows:
• Deleted 11,453 rows with duplicate "host id" values to ensure data accuracy and integrity.
Formatting Column Names:
• Formatted column names for consistency and improved readability.
• Used proper functions to manage column names, including hiding unnecessary information (e.g., "name 2") to maintain clarity and functionality.
Step 2: Frequency Histogram
Independent Variables: "Neighborhood Group", "Room Type"
Dependent Variables: "Price", "Number of Reviews"
Data Transformation:
•Utilized the IFS function in Excel to convert non-numerical independent variables into a numerical format.
Defined Bins:
•Defined bins with specific intervals based on minimum and maximum values of the variables to determine the frequency of data points within each bin.
•Calculated relative frequency and cumulative frequency for each bin to understand distribution patterns.
Histogram Creation:
•Developed histogram to visually depict frequency distributions for each variable graphically.
Computed Descriptive Statistics:
•Mean, Standard Deviation, Sample Variance
Result:
•Concluded that the data did not exhibit a normal distribution.
•Observed high frequency in the initial bins, indicating deviation from normal distribution assumptions.
.png)