If you are employed, unless you work from home, you probably have to commute to work. Many commute from the suburbs to the nearest urban center, and for some, those urban centers are in different states than their residences. This map highlights that phenomenon, showing the percentage of workers in each county who commute to a different state for work.
Not surprisingly, the counties with the highest percentages are adjacent to state borders. Nine counties have greater than 50% commuting to a different state. In most cases, the flow is unidirectional. For example, people leave Phenix City, AL, to work in Columbus, GA, or they drive from Vancouver, WA, to Portland, OR. This may be for tax purposes, or simply because they prefer the lifestyle or affordability of a certain suburb.
Data source: http://factfinder.census.gov/ (ACS 5-yr, Table S0801)
The federal government provides quality ratings for every Medicare and Medicaid-certified nursing home in the US. Each nursing home is given a score of 1 (worst) to 5 (best) for health inspections, staffing, and quality measures. These are then combined into an overall rating that uses the same scale.
For this post, I’ve mapped the average overall scores by state (plus D.C.), and graphed the distributions. The best and worst locations are listed on the map. When the time comes, you may want to move to Hawaii instead of Texas.
Data source: http://www.medicare.gov/nursinghomecompare/search.html
Through the first 86 Academy Awards ceremonies, 85 films have received 10 or more Oscar nominations. For this year’s show (the 87th), none hit the mark; “Birdman” and “The Grand Budapest Hotel” each earned nine nominations.
For this post, I’ve graphed the number of nominations against the number of wins for each of those 85 films. I labeled some of the highlights (winningest, flops, most nominated, etc.) Happy Oscar watching!
Data source: http://awardsdatabase.oscars.org/ampas_awards/help/helpMain.jsp?helpContentURL=statistics/indexStats.html
It’s no secret that the disparity in wealth is growing. This inequality manifests itself geographically; in most cities, the rich tend to cluster in opulent neighborhoods while poorer families live in more affordable areas. Here, I’ve mapped the median household income by census tract for nine counties (labeled by the largest city in the county). I selected cities that highlight the remarkable degree of economic segregation that can occur.
Data source: http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml (ACS 2013, 5-yr, B19013)
Terraforming other planets is a common theme of sci-fi books and movies; humans need to leave Earth and inhabit new worlds, but first they have to make them livable. These maps are a play on that idea. I used digital elevation models of Mars (where a large northern ocean may actually have existed if/when the planet had a thicker atmosphere) and the Moon, and simply filled them with water to three elevations. I will note that many people have made far prettier versions of terraformed maps (e.g., Mars, Moon). The goal here was to show how changes in sea level relative to datum affect the land-sea balance.
Part of the reason I elected to show multiple sea levels was that the notion of datum on other planetary bodies is somewhat arbitrary. Without an actual ocean, we have to choose what will be considered zero elevation. For rocky planets and moons where we have a global DEM, we tend to use the average equatorial radius of the equipotential surface. So I flooded Mars and the Moon to their respective datums, then added and subtracted a kilometer to the sea level on each.
Without tectonics neither really has distinct continents; the topography and dichotomies result primarily from volcanism and cratering. I image that having only one large continent with a lot of crater lakes/oceans would be pretty bad. Fortunately, it’s not something we need to consider seriously…for now…
Data source: ftp://pdsimage2.wr.usgs.gov/pub/pigpen/
As I mentioned in a previous post, legalized recreational marijuana is coming to Oregon this summer. As data on the topic are easy to access, I decided to make a graph on what the local economy offers.
I scraped strain and price data for 850 items from 22 dispensaries in Portland, OR. This plots tetrahydrocannabinol (THC) content against cannabidiol (CBD) content. Prices for these strains range from $4/gram to $14/gram (proportional to the width of each bubble). Plotting either of the compounds against price yields no significant correlation (r^2<0.1). I’ve included data on the same strain sold at multiple stores; in some cases, the price varies, or the percent of the compounds is different.
The second graph is identical to the first, but I used a log scale on the y-axis so that you can see the details within the blob of high-THC, low-CBD strains.
Data source: https://www.leafly.com/
Many people believe that they can beat the stock market. Chancesare, they’re wrong. Fortunately, in the long run, the Dow trends upward, so if you have a diversified portfolio, your equity will generally grow. The approach of a day trader is far more complex because price fluctuations over the course of a day are highly unpredictable. They are not, however, random. With this in mind, I figured it would be interesting to identify which dates are more likely to see gains, and which are more prone to losses.
Using approximately 100 years of data (July 30th, 1914 – January 26th, 2015), I calculated the percentage change in closing price for the Dow Jones Industrial Average (DJIA) from day to day. I then ran summary statistics binning by day of the year, and selected two metrics to plot as heat maps.
For both maps, I’ve used a diverging color scheme centered on a neutral market (light gray). Blue cells signify market gains, while orange cells correspond to losses. White boxes with “N/A” are either dates that don’t exist, or fixed-date holidays when the market is closed (i.e., New Year’s Day, Independence Day, and Christmas).
The left map shows the percentage of time a given day sees an increase in the DJIA. For example, out of the 72 instances that the market was open on April 15th (Tax Day!), the Dow increased 53 of them, so it improved 53/72=73.6% of the time. One of the worst days occurs just over a week earlier, on April 7th; out of the 73 days it was open, the DJIA rose only 26, or 35.6% of the time.
The right map reveals the median percent change in the DJIA for each date. (Because the market can change significantly on a given day, the average and sum were both strongly influenced by outliers.) This generally shows the same pattern as the left map, but with different relative magnitudes. For example, while April 15th is the day that has most frequently seen gains, the median gain is not as large as that of January 2nd, when the median change in the DJIA is +0.43%.
The four best and worst days are listed under each map. April 15th is among the top four on each, and April 7th is in the bottom four on each.
Finally, note that this graph is intended for entertainment only – it should not be used for investment strategy!
Data source: http://measuringworth.com/DJA/
Gene therapy might just be the medical world’s future secret sauce for preventing/treating diseases. While it’s still considered an experimental approach, more than 2,000 clinical trials around the world are making significant progress on the topic. This treemap breaks down the focus (i.e., indication or disease) of the trials by their geographical distribution.
For example, more than 64% of gene therapy clinical trials are aimed at treating cancer. Of those, close to 68% are North American trials. So of all gene therapy trials, about 44% are conducted in North America, focusing on cancer diseases. The study of healthy volunteers is the only category in which North America does not conduct the plurality/majority of trials; Europe leads the way on this subject.
Data source: http://www.abedia.com/wiley/index.html
The Motion Picture Association of America (MPAA) has been rating movies for almost 50 years. The ratings we are most familiar with today (G, PG, PG-13, R, and NC-17) provide general guidelines for the minimum ages of viewers (though many would disagree with ratings for particular films). In addition to the letter rating, the MPAA writes a brief, sometimes comical, justification for ratings above G, identifying what in the film would not be suitable for all children.
Using data from IMDB, I ran a word frequency analysis on the ratings of 15,715 movies, binned by rating. The histogram shows the usage frequency of the 21 words that occur in >10% of explanations for at least one rating category. For example, the word “language” appears in the rating description of 62.5% of PG, 47.8% of PG-13, 80.9% of R, and 3.0% of NC-17 movies. Common words (e.g., articles, prepositions, conjunctions) were excluded.
While I find word clouds to be completely useless as a tool for conveying statistical significance, I wanted to provide a larger selection of common words in ratings. As word clouds are more attractive than lists, I’ve generated one for each rating. The NC-17 cloud is smaller because there are few films in this category, and thus there was a smaller pool of common words.
A few fun facts:
-The word “explicit” appears in 58.2% of NC-17 ratings, but only 0.1% of R ratings. Conversely, “mild” is used in 53.2% of PG ratings, but only 0.2% of PG-13 ratings.
-While “sexual” is common in PG-13 (29.7%), R (26.2%), and NC-17 (41.8%) rating descriptions, “sexuality” frequently occurs only in R (32.4%) and NC-17 (34.3%) films; in PG-13, “sexuality” is observed only 7.5% of the time.
-“Violence” is the only word to be used in >20% of each of the four ratings, peaking at 56.8% in R-rated movies.
Data source: http://www.imdb.com/interfaces (plain text data files)
Using a list of the 52,131 active medallion taxi drivers in New York City, I’ve graphed the frequencies of the 20 most common first and last names. Middle names were not considered. I did not bin alternate spellings of the same name. As such, the five most common first names are Md, Mohammad, Mohammed, Muhammad, and Mohamed.
The heat map below the bar graphs shows the frequency of first and last name pairings. This hints at the diversity of first or last names in some countries compared to others. For example, Singh is the most common last name, but none of the most common first names are paired with it; it is traditionally a last name in India, but due to the diversity of first names in India, none of them are in the top 20. The most common first name paired with Singh is Balwinder, which is the 52nd most common first name. Conversely, Jean (the French version of John) is the sixth most common first name, but is not associated with any of the top 20 last names. Francois, the most common last name paired with Jean, is not even among the 100 most common last names.
Data source: https://data.cityofnewyork.us/Transportation/Medallion-Drivers-Active/jb3k-j3gp
For most people living in the US, Social Security benefits will come through the Old-Age, Survivors, and Disability (OASDI) program. The beneficiaries can be retired workers (plus their spouses and children), widowers, parents, or children of the deceased, or disabled workers (plus their spouses and children).
This set of maps shows some county-level statistics related to OASDI payments. The top left map identifies the percentage of a county’s population that receives OASDI benefits. The top right map reveals which group tends to be the beneficiary; retired people are the largest group to receive benefits, but as stated in the previous paragraph, there are others who are also supported by the program. The white region in the Southeastern US is due to the large population of disabled workers receiving OASDI payments in those counties. The bottom maps are concerned with the amount of money paid out. On the left is the average monthly payment across all beneficiaries. On the right, I identified the difference in payment for men and women (subtracting the average women’s payment from the men’s). This metric only considers seniors (ages 65+), but includes people receiving OASDI benefits for any reason. There were no counties where women had a higher average payment than men.
Data sources: http://www.ssa.gov/policy/docs/statcomps/oasdi_sc/index.html
http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml (for county populations)
2015 is just a day away. A lot of folks make resolutions…and they search for them on Google. I’ve selected six popular health-focused resolutions. Each of them is more commonly searched for in the first week of the year than any other time. That’s not to say the resolution isn’t kept though; people are just less likely to look up the terms or phrases on Google. The data are for US searches only, and the time series are averages of 11 years of data (2004-2014). Good luck with your resolutions!
Data source: http://www.google.com/trends/
The recent series of high-profile killings of black persons by police officers (i.e., Tamir Rice, Eric Garner, Michael Brown, Ezell Ford, and John Crawford III) is, to say the least, saddening and troubling. Without getting too deep into the issues at hand, I thought I’d just show the data for what they are. This graphic depicts the relationship between the police presence in a city (officers per 1k population) and the black and/or African American representation (% of city population). The data include 100 of the largest US cities – all have at least 200k people. The strong correlation, though not surprising, is certainly disappointing. Maps are provided for viewing the geographic distribution of each variable independently.
Data sources:
http://factfinder2.census.gov/ (ACS 2013, 1-yr, B02001)
The job title of analyst can mean a lot of things depending on the company and city; as positions go, it’s among the most vague. That said, it has also become remarkably common among recent college graduates. If I had a nickel for every time I asked someone what s/he did and the response was, “Oh, I’m an analyst,” I’d have a few bucks.
So what does an analyst make these days? I’ve graphed the average analyst (and senior analyst) salaries in 16 major US cities, as well as the national average. I also adjusted the salaries for cost of living in the graph on the right. For New York, where C2ER separates cost of living for Manhattan, Brooklyn, and Queens, I used a population-weighted average. Note that, while analysts tend to make slightly more in NY and SF, after adjusting for cost of living, these are actually two of the least affordable cities to live in on an analyst’s salary.
Data sources:
http://www.glassdoor.com/Salaries/
In November, Oregonians approved Measure 91, which will legalize recreational marijuana use in the state for people aged 21 and older. Here, I’ve compared support for the initiative to the current distribution of medical marijuana users.
The map of registered patients only includes Oregon residents. Gilliam, Sherman, and Wheeler Counties are combined when reporting patient registrants to protect confidentiality, so a single percentage is used for all three.
While there is a general geographic trend, with the more populated western counties having higher percentages on both maps, a linear regression of the values shows a weak correlation (R^2 = 0.067). The three highest and lowest percentages are labeled on each map. Morrow County is the only one to be an extremum on both maps (bottom three).
Data sources: