Real Estate Prediction in NYC

Projects

Real Estate Prediction in NYC

Project Overview

This project leverages machine learning to predict property values in New York City by analyzing historical data from NYC Open Data. The implementation tackles common issues in real estate prediction, such as:

Incomplete datasets with missing values
Variances in data collected over different time periods
The need for robust cross-validation and hyperparameter tuning for accurate predictions
Visualization of spatial property value predictions over NYC

By integrating rigorous data preprocessing, model comparisons, and advanced visualization techniques, the project provides stakeholders with an informative tool to analyze market trends and make informed decisions in the NYC real estate market.

Key Features

Data Preprocessing: Aggregates multiple datasets covering different time ranges, performs data cleaning including imputation of missing values, outlier removal, and feature scaling.
Machine Learning Pipeline: Compares several regression models (Linear Regression, Random Forest Regressor, and XGBRegressor) and implements GridSearchCV for hyperparameter optimization.
Visualization & Analysis: Generates scatter plots comparing actual vs. predicted property values and creates geographic heat maps to illustrate spatial trends in property valuations.

Project Implementation & Results

Libraries and Tools

The project utilizes a variety of Python libraries for data processing, analysis, and machine learning:

Libraries and Tools

Data Collection and Preprocessing

Two primary datasets were used in this project:

Dataset 1 (2010-2019): Contains historical property valuation data

2010-2019 Dataset Categories
Dataset 2 (2021-2023): Contains more recent property valuation data

2021-2023 Dataset Categories