House Price Prediction

Second

Let's start by importing libraries we will be using.

Collecting Data

For this simple case study, we are going to use the California Housing Prices dataset from the StatLib repository; you can download this dataset from here.

As we can see from the above, the dataset contains 9 numerical variables and one categorical variable. Each row in the dataset represent a district in CA.

Summary Statistics

For numerical variables, Tukey's five-number summary statistics (minimum, first quartile, median, third quartile, maximum) can provide quick information about the dataset.

For categorical variables, we can simply check the count of each category.

Histograms

Visualization

Data Exploration

Feature-Target Relationship

Correlation Analysis

Data Processing

Handling categorical variables

Data Normalization

Splitting Data into train and test

Linear regression

Decision Trees