Exploration and Cleaning for Real Estate Data Set with SQL

In this study, we will work on a real estate data .

Click here for data sets.

We have a data set with 56000 rows of information about the properties of a big real estate company.

Our aim in this data set;

To separate address information by street and district
Populate property address data
To fix all the values in the SoldAsVacant column as ‘YES’ or ‘NO’
Delete duplicate values
Delete unused columns

  1. Standardize date format

select CONVERT(date,SaleDate) from homesales

ALTER TABLE homesales Add SaleDateConverted Date;

Update homesales SET SaleDateConverted = CONVERT(Date,SaleDate)

2. Populate property address data

select count(*) PropertyAddress from homesales

where PropertyAddress is null

select a.ParcelID, a.PropertyAddress, b.ParcelID, b.PropertyAddress, ISNULL(a.PropertyAddress, b.PropertyAddress)

from homesales a

join homesales b on a.ParcelID = b.ParcelID and a.[UniqueID ] != b.[UniqueID ]

where a.PropertyAddress is null

— Mean of ‘’ ISNULL (a.PropertyAddress, b.PropertyAddress) ‘’ is if a.PropertyAddress has empty value fill that with b.PropertyAddress

update a set a.PropertyAddress = ISNULL(a.PropertyAddress, b.PropertyAddress) from homesales a

join homesales b on a.ParcelID = b.ParcelID and a.[UniqueID ] != b.[UniqueID ] where a.PropertyAddress is null

select count(*) PropertyAddress from homesales

where PropertyAddress is null

3. Breaking out address into individual columns (address, city, state)

select PropertyAddress from homesales

select SUBSTRING(PropertyAddress, 1, CHARINDEX(‘,’, PropertyAddress) -1) Address, SUBSTRING(PropertyAddress, CHARINDEX(‘,’, PropertyAddress) +1, LEN(PropertyAddress)) AS Suburb from homesales

alter table homesales add adress varchar(255)

update homesales set adress = SUBSTRING(PropertyAddress, 1, CHARINDEX(‘,’, PropertyAddress) -1)

alter table homesales add suburb varchar(255)

update homesales set suburb = SUBSTRING(PropertyAddress, CHARINDEX(‘,’, PropertyAddress) +1, LEN(PropertyAddress))

select adress, suburb from homesales

4. Change y and n to yes and no in “sold as vacant” field.

SELECT SoldAsVacant FROM homesales

SELECT

CASE

WHEN SoldAsVacant = ‘Y’ then ‘YES’

WHEN SoldAsVacant = ’N’ THEN ‘NO’

ELSE SoldAsVacant

END

FROM homesales

UPDATE homesales SET SoldAsVacant = CASE

WHEN SoldAsVacant = ‘Y’ then ‘YES’

WHEN SoldAsVacant = ’N’ THEN ‘NO’

ELSE SoldAsVacant

END

SELECT COUNT(*) FROM homesales WHERE SoldAsVacant = ‘Y’

SELECT COUNT(*) FROM homesales WHERE SoldAsVacant = ‘N’

5. Removing duplicates

Select *,ROW_NUMBER() OVER (

PARTITION BY ParcelID,PropertyAddress,SalePrice,SaleDate,LegalReference

ORDER BY UniqueID ) row_num From homesales order by ParcelID

select * FROM (Select *,ROW_NUMBER() OVER (

PARTITION BY ParcelID,PropertyAddress,SalePrice,SaleDate,LegalReference

ORDER BY UniqueID ) row_num From homesales)T

WHERE row_num > 1

WITH RowNumCTE AS(Select *, ROW_NUMBER() OVER ( PARTITION BY ParcelID, PropertyAddress, SalePrice, SaleDate, LegalReference

ORDER BY

UniqueID ) row_num From homesales )

DELETE

From RowNumCTE

Where row_num > 1

6. Deleting unused columns

ALTER TABLE homesales DROP COLUMN PropertyAddress, LegalReference

[1]https://github.com/AlexTheAnalyst/PortfolioProjects/blob/main/Nashville%20Housing%20Data%20for%20Data%20Cleaning.xlsx

[2]https://github.com/AlexTheAnalyst/PortfolioProjects/blob/main/Nashville%20Housing%20Data%20for%20Data%20Cleaning.xlsx

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store