Nashville Housing Data

The Nashville Housing Data provides valuable insights into the real estate market of the city, which can be a valuable resource for buyers, sellers, and investors. However, the data can be overwhelming and difficult to navigate without proper cleaning and organization. In order to make the data more user-friendly and accessible, I utilized Microsoft SQL Server to clean and restructure the data. By removing duplicates, filling in missing values, and standardizing data formats, I was able to create a more comprehensive and organized database. The resulting data set is now easier to analyze and understand, making it a valuable tool for anyone interested in the Nashville housing market.

I cleaned data using ms sql server. To make it easy and more helpful.

Data source: Link

nashvilla.bmp

How it works:

I used MS SQL to clean the data.

CSV raw data file was imported to SQL and the process of the cleaning can be found in the querys below.

1.change date format :

I used convert() function to change data type . I used update statement to modify the existing records in a table but, it didn’t work. So,I used alter table.

**-- Standardize Date Format**

SELECT
  SaleDate, 
  CONVERT(Date, SaleDate) as date 
FROM
  [Nashville Housing] 
UPDATE
  [Nashville Housing] 
SET 
  SaleDate = CONVERT(Date, SaleDate)

--It didn't update then i used alter
ALTER TABLE [Nashville Housing]
ADD SaleDateConverted DATE;

update [Nashville Housing]
SET
 SaleDateConverted = CONVERT(Date,SaleDate)
--remove SaleDate column
ALTER TABLE [Nashville Housing]
DROP COLUMN SaleDate;

2. Populate Property Address data:

First I wrote a query to see where PropertyAddress is null to find if there is any missing data. Then I removed the filter to see if some PropertyAddress are duplicated or if there is any information to help get the missing addresses. I added a self join when ParcelID is the same and UniqueID is not, as UniqueID is an unrepeatable data. Then I found the missing data and used ISNULL() function to fill null addresses. Finally I used an updated table with the new completed data.

**-- Populate Property Address data**

SELECT PropertyAddress
FROM [Nashville Housing]
--where PropertyAddress is NULL
ORDER BY ParcelID
--Self join 
SELECT 
  A.ParcelID, 
  A.PropertyAddress, 
  B.ParcelID, 
  B.PropertyAddress, 
  ISNULL(
    A.PropertyAddress, B.PropertyAddress
  ) 
FROM [Nashville Housing] A 
  JOIN [Nashville Housing] B ON A.ParcelID = B.ParcelID 
  AND A.[UniqueID ] <> B.[UniqueID ] 
wHERE 
  A.PropertyAddress IS NULL

UPDATE 
  A 
SET 
  PropertyAddress = ISNULL(
    A.PropertyAddress, B.PropertyAddress
  ) 
FROM[Nashville Housing] A 
  JOIN [Nashville Housing] B ON A.ParcelID = B.ParcelID 
  AND A.[UniqueID ] <> B.[UniqueID ] 
WHERE 
  A.PropertyAddress IS NULL

3.Breaking out Address into Individual Columns (Address, City, State):

split coulmn to two columns with SUBSTRING() function. I used the comma ( , ) as the separator, 1 to start and CHARINDEX() function to get character until ‘,’. Then I used -1 to avoid comma, second column used SUBSTRING() function and CHARINDEX() function to get when start split the column but, +1 to start after ‘,’ and used LEN() function to get where to stop . Then I added the new columns and values (PropertySplitAddress ,PropertySplitCity ).

-- Breaking out Address into Individual Columns (Address, City, State)

SELECT PropertyAddress,
SUBSTRING(PropertyAddress, 1, CHARINDEX(',', PropertyAddress) -1) AS Address,
SUBSTRING(PropertyAddress,  CHARINDEX(',', PropertyAddress) +1, lEN(PropertyAddress)) AS city
FROM [Nashville Housing] 

ALTER TABLE [Nashville Housing] 
Add PropertySplitAddress Nvarchar(255);

UPDATE [Nashville Housing] 
SET 
PropertySplitAddress = SUBSTRING(PropertyAddress, 1, CHARINDEX(',', PropertyAddress) -1)

ALTER TABLE [Nashville Housing] 
Add PropertySplitCity Nvarchar(255);

UPDATE [Nashville Housing] 
SET 
PropertySplitCity = SUBSTRING(PropertyAddress,  CHARINDEX(',', PropertyAddress) +1, lEN(PropertyAddress))

4.Breaking out Owner address into Individual Columns (Address, City, State):

I used PARSENAME() function to split column ,but first I had to change the comma into period by using replace() function and finally I added the new columns and values (OwnerSplitAddress , OwnerSplitCity , OwnerSplitState ).

**-- Owner address**
SELECT
PARSENAME(REPLACE (OwnerAddress, ',','.'), 3),
PARSENAME(REPLACE (OwnerAddress, ',','.'), 2),
PARSENAME(REPLACE (OwnerAddress, ',','.'), 1)
FROM [Nashville Housing]

ALTER TABLE [Nashville Housing]
Add OwnerSplitAddress Nvarchar(255);

UPDATE [Nashville Housing]
SET
 OwnerSplitAddress = PARSENAME(REPLACE(OwnerAddress, ',', '.') , 3)

ALTER TABLE [Nashville Housing]
Add OwnerSplitCity Nvarchar(255);

UPDATE [Nashville Housing]
SET 
OwnerSplitCity = PARSENAME(REPLACE(OwnerAddress, ',', '.') , 2)

ALTER TABLE [Nashville Housing]
Add OwnerSplitState Nvarchar(255);

UPDATE [Nashville Housing]
SET
 OwnerSplitState = PARSENAME(REPLACE(OwnerAddress, ',', '.') , 1)