For our second story, I’ve been wanting to work with data from the Adoption and Foster Care Analysis and Reporting System (AFCARS), a federally mandated data collection system that provides information on all children involved in foster care or adoption cases. The National Data Archive on Child Abuse and Neglect distributes two data files for each fiscal year; one file contains adoption data and the other foster care data. Each adoption data file contains 37 elements that provide information on the adopted child’s gender, race, birth date, ethnicity and prior relationship with the adoptive parents.
Looks like a rich source of stories and trends, right?
The data is free, but entails a lengthy ordering process with the National Data Archive on Child Abuse and Neglect. Earlier today I checked in with someone there on the status of my request, and got this message:
On Monday Nov. 5th, our funding agency within the U.S. Department of Health and Human Services directed the Archive to suspend all NCANDS (child-level) and AFCARS and data shipments. No further details are available.
The Archive is working with the Children’s Bureau on this matter in order to resume shipments as soon as possible, but no time frame for release has been provided. We apologize for the inconvenience, and ask you to bear with us.
This seems like an opportune moment for a FOIA. What other suggestions do you have for dealing with data gatekeepers who say “No”?
I have been collecting some tools for datavisualizations. Click on the pictures to go to the collections.
With Veterans Day less than a month away, our team (Lothar Krause, Ezra Eeman and Lindsey McCormack) will examine the distribution of homeless veterans across the United States.
Under the Obama Administration, the Department of Veterans Affairs (VA) has declared the goal of ending veteran homelessness by 2015. However, with over 65,000 homeless veterans in 2011 and an increasing number of young homeless veterans from the Iraq and Afghanistan conflicts, this goal remains elusive.
Veterans have long been overrepresented among the homeless. Nationwide they are about 10 percent of the general population, but comprise almost 14 percent of the homeless adult population. Five states– California, New York, Florida, Texas and Georgia– account for more than half the total veteran homeless in the country.
However, when you look at homeless veterans as a percentage of the total homeless population of each state, a different picture emerges. From this point of view, rural prairie states have the most urgent problem. In 2010, one out of every three homeless people in Kansas was a veteran; that percentage fell to 19% in 2011, but the state still leads the country in percentage of veterans among its homeless population.
Other states in which the concentration of homeless veterans exceeds the national average include Montana, North Dakota, South Dakota, Wyoming and Alaska.
We want to find out why this is so. Is it simply because these prairie states send a higher proportion of people to the military? Is there a link with high rates of unemployment in these states, or the difficulty of accessing services in rural areas?
One caveat: the numbers we are working with are gathered by the US Department of Housing and Urban Development (HUD), which coordinates an annual point-in-time survey of the nation’s homeless population. In 2011 HUD changed methodology for counting homeless veterans. Before 2011 the point-in-time estimate only counted veterans living in homeless shelters; since then, HUD mandated that states also collect data on “unsheltered” veterans (i.e. those living on the streets), as well as those living in VA residential programs.
Our data sources include:
For homeless vets: the Veteran Administration’s 2010 and 2011 Point-In-Time Census of Homeless Veterans
For overall veteran population: the VA’s VetPop2007 data set, based on a 2007 census with population estimates up to the year 2036.
Dennis Culhane, Director of Research for the National Center on Homelessness among Veterans at the United States Department of Veterans Affairs
Paul Rieckhoff, Chief Executive Officer and Founder, Iraq and Afghanistan Veterans of America
Shelley MacDermid Wadsworth, Director of the Military Family Research Institute (MFRI) at Purdue University
Kevin R. Convey
Data-Driven Interactive Journalism
Russell Chun/Amanda Hickman
24 Oct. 2012
Redesigning “Classify This”:
The Atlantic’s “Classify This” visualization looks as if it were designed more with coolness of look in mind than intelligibility of data.
Using two different means of charting pages classified and pages declassified (line vs. bar) makes it nearly impossible to compare the two over time, which one presumes is the point of the graphic. Rendering this in three dimensions only intensifies the headache.
My first redesign is simple: render both kinds of pages in adjacent bars, black for classified, white for declassified. Background panels can be used to designate the sitting president during the time period. This presents the information clearly and without frills.
Slightly less successful but still interesting is the idea of removing the three-dimensional aspect but retaining the use of a line for classified pages. Although not as accurate or immediately clear as number one, it does have the virtue of presenting classification as a trend rather than a point in time.
Another interesting way to look at the data is number three, a horizon chart that allows the reader to see immediately, starkly and symbolically the ratio of pages classified to declassified. It isn’t without problems though, since some may have trouble seeing what is below the axis as anything but a negative number, which of course it is not. Still it is dramatic and conveys all of the original information save the trend line in a much clearer way.
Classify This 1
Classify This 2
Classify This 3
Besides Timeline.js, which Ezra used to create a great presidential debate timeline, there are other timeline tools as well. Some offer an easy-to-use interface with no coding:
Others offer more configuration and require some coding:
If you can’t tell the difference between the mean and the median with 100% assurance, Robert Niles is your friend. His “Statistics Every Writer Should Know” is just that. There are plenty of resources to bring your math and statistics up to speed, though:
+ Really basic newsroom math (from ASU Cronkite Professor Steve Doig),
+ Statistics Hell is bizarre and unnerving, but includes tons of handouts and lessons on statistical methods. If animated gifs of flame engulfed brains aren’t your thing, maybe look elsewhere.
+ R is a statistical computation language. Take a look at their Documentation and Contributed Documentation
+ open courses at Carnegie Mellon
+ Probabilistic Graphical Models (at Stanford)
+ Head First Statistics is supposed to be a good way to get started. There’s a Data Analysis book in the series, too.
+ Windows users might like R through Excel.
+ Think Stats (html, bound)
+ Numbers in the Newsroom (which IRE sells) are both supposed to be very good.