I wrote this article for my Typepad Blog but wanted to share it with my readers here as well.
This is the sort of question I have pondered for quite some time. As
you see thousand dollar millionaires driving around in luxury automobiles in
South Florida and Southern California. A friend might ask you when propped up at
a bar on a Friday night if this is true. It’s the sort of unanswerable question
that had me intrigued when I first thought of it. It came to me as part of my daily
reverie. My line of thinking was that if you spend more time out in the sun and
generally have better weather, you’re going to want to spend more time outside
driving to places and going to the beach. A car is more like a home to someone
who drives a lot and so they are prepared to pay more for it.
Like a lot of theories, the difficulty lies in how you go about
proving it. It was all very well for Newton to have an apple fall on his head.
He could prove the idea of gravity from that (not that I am comparing my
theories to Newton, you understand)
First define the question.
Before attempting any analysis, I need to set the scope. So to be
clear, the actual question I am going attempt to answer is:
Do workers across a spectrum of employments from South Florida and upscale
communities on the West Coast of the United States spend more money on
purchasing cars in their price range compared to the same workers in say the
Midwest and Mid-Atlantic? We can assume there may be a different variety of
cars purchased depending on location. For example there will be more SUV’s and
All Wheel Drive vehicles in mountain towns that receive snow compared to
Seattle where there is precipitation but very little snow.
Where to start on this journey
then?
Well first off, I have to choose some counties to compare. I am not
going to be able to compare data from every county although I would be forever
grateful and willing to pay somebody to help me expand this data to encompass
purchasing data from all available U.S. zipcodes. Until then I chose zipcodes each
from the following States, Cities, and regions including Americas most famous
zipcode 90210
California
90210 (Beverly Hills, CA)
92651 (Laguna Beach, CA)
Florida
92651 (Laguna Beach, CA)
Florida
33401 (Palm Beach, Florida)
33134 (Coral Gables, Miami)
33134 (Coral Gables, Miami)
Maryland
20847 (Rockville,
MD)
Maryland
63005
(Chesterfield, MO)
Missouri
Which cars should I use for my thesis?
Like the states, the choice of cars is bewildering. I decided to choose
a small selection of popular models and group them by price.
Originally, I really wanted to find the actual prices for cars sold
but finding the data proved to be like finding a needle in a haystack. Whilst
there are automotive economic indicators there seems to be no historical car
sale records for this sort of data. It is possible through the big credit
agencies they may have some additional data that may be helpful for future case
studies conducted. I really thought the
project was doomed without that data though. As its been a theory I’ve been
talking about researching for the last four years I was glad that I was finally
moving forward and working with the data I had at this point.
It then occurred to me that I could use for sale prices instead. There
are a plethora of sites which offer car prices by county, so all I had to do
was run queries by zipcode and car on these databases and hey presto, the data
was available (and for free!)
Of course, I would have preferred to use real prices, but seeing as I
am applying the same technique to find prices for each zipcode and car type,
the approach holds up.
Not everyone will buy a new car.
Well, yes, there is that. Additionally cars depreciate at differing rates.
For the sake of argument, I could have assumed that everyone buys a new car,
but this really doesn’t represent reality. I have decided therefore to include
car prices from 2013-2015.
The following models were used to give us an understanding of different vehicle classes amongst different locations:
Entry Level (new price less than 25k)
Mazda Miata
Nissan Rogue
Nissan Altima
Honda Accord
Mid Range (new price less than
40k)
Toyota 4Runner
Ford F-150
Honda Odyssey
Kia Sedona
Prestige (new price greater than 40k)
Mercedes S-600
Audi R-8
Audi R-8
Chevy Corvette
So who is coming with me on my journey?
Like all the best road trip stories; Goldilocks, the Birth of Jesus
and the Blind mice, three was considered the appropriate number.
Similar to the car selection, I separated the occupations into
partially skilled, skilled, and highly skilled professions and that one
profession from each would be enough. However, I had to ensure that these
professions represented a large proportion of the working population for each
region. (For the sticklers amongst you, the word skilled refers to the amount
of formal education required to perform these jobs. Even someone working in retail
will require basic math skills taught at school)
Partially Skilled
Retail Sales Representative
Skilled
Nurse
Highly Skilled
Physician
You can be pretty certain that if you walk into any major town, you
will find a good proportion of the population employed by these sectors. Given
the growth of the aging baby-boomers, the number employed in these sorts of
jobs will only increase with time.
How will I find out about
salaries for each of these professions in each zip code?
In the same way I was concerned about finding proper car prices, I
didn’t think it would be possible to get average salary information by zip code
for each profession.
Luckily for me, Google really is your friend and the site Indeed
appears to calculate the average salary for each zip code by profession too.
The results are shown below.
Job
|
State
|
Salary $
|
Physician
|
33401
(Palm Beach, Florida)
|
86000
|
Physician
|
33134
(Coral Gables, Miami)
|
82000
|
Physician
|
90210
(Beverly Hills, CA)
|
97000
|
Physician
|
92651
(Laguna Beach, CA)
|
83000
|
Physician
|
63005
(Chesterfield, MO)
|
86000
|
Physician
|
20847
(Rockville, MD)
|
108000
|
Nurse
|
33401
(Palm Beach, Florida)
|
65000
|
Nurse
|
33134
(Coral Gables, Miami)
|
61000
|
Nurse
|
90210
(Beverly Hills, CA)
|
72000
|
Nurse
|
92651
(Laguna Beach, CA)
|
62000
|
Nurse
|
63005
(Chesterfield, MO)
|
64000
|
Nurse
|
20847
(Rockville, MD)
|
80000
|
Retail Salesperson
|
33401
(Palm Beach, Florida)
|
52000
|
Retail Salesperson
|
33134
(Coral Gables, Miami)
|
49000
|
Retail Salesperson
|
90210
(Beverly Hills, CA)
|
57000
|
Retail Salesperson
|
92651
(Laguna Beach, CA)
|
49000
|
Retail Salesperson
|
63005
(Chesterfield, MO)
|
51000
|
Retail Salesperson
|
20847
(Rockville, MD)
|
64000
|
I have to admit I was a little disappointed with the physician pay. I
really thought that given the high profile physicians have, their pay would be
significantly higher than the other two. However, this is not the case. Had I
chosen a specialist doctor such as an oncologist or plastic surgeon, the salary
would have been significantly higher.
On reflection, however, I felt that it would be “cheating” to do this.
Added to which there are far fewer specialists than generalist such as
physicians.
Assumptions about the data
Additionally, I have ignored any other type of cost variable such as
housing or food. There may well be regional differences which would impact what
a person can spend on a car, but their disposable income and those
considerations are not included in this data analysis.
Now I have the data, how do I figure out the answer?
As with any analysis it is the construction of the dataset which takes
the longest. The steps I need to perform are
1)
Filter all of the car data to only include cars
from the years 2013-2015
2)
Find the average price for each car group by postcode
3)
Link the car group average price per postcode to
the salary of the worker by postcode.
*Here
I will assume that
a.
a physician will buy a premium car
b.
a nurse will buy a middle market
c.
a retail worker will buy an entry level car
4)
**Percentage cost = Car price / (salary * 5)
5)
Produce a bar graph of percentage costs paid per
worker and postcode
* Of course, some doctors will buy an entry level car and retail
salesperson may fork out for a premium vehicle, but this exercise is only show
the theoretical cost of a car to a worker.
** to simulate paying back the car over 5 years which is the current
norm
By merging the data in Trifacta and then putting all this data into
Tableau, I produced the following results
Retail Salesperson
Nurse
Physician
Conclusive results
From all of the above comparisons, the Mid-Atlantic is the cheapest
area to purchase a car irrespective of your job. For the entry level and
mid-range cars, it seems that the Mid-West is the most expensive place to
purchase your car if you work there.
However, the most interesting was the fact that the prestigious cars
appear to be the most expensive in California and Florida, with the peculiar
exception of the Mercedes in the Mid-West. This is counterintuitive to my
original thinking as I believed that there was a much higher percentage of
income spent on luxury cars in these locations and were most likely cheaper to
purchase as more of these cars were sold in those regions.
Inconclusive results
What I originally set out to prove, namely that a higher percentage of
people in South Florida and Southern California spend a higher percentage of
their income on their automobiles, does not appear true when the data is
analyzed.
Possible explanations About Car
Pricing
Given the higher wealth that is generally associated with southern
coastal states, perhaps the margins for premium vehicles are higher, simply
because they are more affordable.
However for the mid-level vehicles, it is possible that people value
space and comfort above status and knowing this, the mid-level cars are more
expensive.
References
Car loan length
Salaries
Car prices & Car Stuff
Employment populations