Exploration and Cleaning for Real Estate Data Set with SQL

  1. Standardize date format
  • As seen following section data format is ‘2013–02–08 00:00:00.000’
  • To convert this data format to ‘yyyy-mm-dd’ format;
  • After getting correct format to update our data set;
  • First lets see how many empty values there are in PropertyAddress.
  • To fill these with who has same parcelID rows;
  • To update the data set;
  • To check again;
  • As seen there is no null value anymore.
  • As seen above number , street name and suburb names are together. We will separate them as new columns.
  • After completing method to update data;
  • ‘’ CHARINDEX(,’, PropertyAddress) ‘’ The first element in it indicates which character we are looking for, and the second element in which variable we are looking for. The result is the number of the searched character from the left in the cell.
  • To check
  • As seen above PropertyAddress has been separated as address and suburb.
  • When we examine the data in detail, we see that some values in SoldAsVacant column are entered as YES, NO or Y, N. We will convert all these values to YES and NO.
  • To check the accuracy of our transaction;
  • When we examine data detailed we will see some rows have same value. If ParcelID, PropertyAddress, SalePrice, SaleDate and LegalReference values are same we will accept those values as duplicate.
  • With the code block below, a column that says 1 has been added next to the repeated lines, next to the non-repeating ones.
  • Now we will use the CTE method to sort by the ROW number column
  • As seen above we had just row_number = 2 which is duplicates.
  • To delete these ones;

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store