Believing in your Data
In order to try and predict what could happen with the data with linear regression, I needed to make sure my source was solid. My source in this case was a dataset of house sales from King County.
This dataset contained quite a few columns:
Although columns were different, many of them were very similar. There were 6 options for the square footage of the specific house. After looking at the dataset I began to ask questions about what is important. These questions are similar with any data analysis, but in this case if I am trying to predict the prices of house sales for King County I needed to focus on how it would affect any models.
After cleaning a bit of the data, I started looking at columns to see how I can understand the housing sales data. How will this impact my statistics and data later on? How can I set myself up here to understand what is next.
I started with looking to see how the plots looked with these columns. I used the Seaborn pairplot in order to see it all. I chose to dial in on price vs. square footage of the lot for the following image:
It is important to know how I learn in order to get the most out of my lessons and projects. So as a visual learner I could see how my dataset was going to work out for me.
Although this is where I started asking questions as well as trying to check out price vs. bedrooms, price vs. zip code, price vs. square footage, and price vs. the condition of the houses.
These subjects are definitely new to me, but the most challenging aspect is relating to the information. The best way to understand the information is to find a way to understand it in my life currently. I also believe this to be true for anything else in order to be focused on a particular project. Finding the common ground is what will lead to success.