Wednesday, November 30, 2011

Being a More Versatile Journalist: Data Journalism Veteran Steve Doig Wants Journalists to Know Statistics

Aerial photograph of the devastation from hurricane Andrew in 1992. Steve Doig, who was a reporter for the Miami Herald at the time, used his data journalism chops to survey the damage and write a Pulitzer-prize winning expose on construction malpractice. Earlier this year, I asked him what aspiring data journalists should be learning.

I cringe when bloggers begin a post by apologizing to readers for a lack of updates. This is partly because most people do, or should, understand that the gig doesn’t pay. But mostly, every word you waste on explaining your absence is one more chance for a reader to lose interest and go somewhere else. So I’ll just say it’s been an eventful couple of months, and tell you why it’s actually relevant to this blog.

Having just finished a master’s in journalism at the University of Illinois, I was extremely lucky to find a National Science Foundation grant that is training better K-12 science teachers.

At the grant, we do this by teaching lessons in entrepreneurial leadership to science teachers. That translates into experiences like students constructing their own spectrophotometers, or high school students manufacturing their own biofuel, or even collaborations where high school students set up demonstrations on electricity for grade school students to work through.

It’s a radical, but practical approach that hopes to improve the nation’s competitiveness in science teaching. In January, results from the National Assessment of Educational Progress report card on teaching showed that 47 percent of all high school seniors in the country are deficient in the sciences.

Why would an NSF grant want a journalist? For one, I understood their language. Being a former undergraduate student of mechanical engineering, I had taken chemistry, physics, calculus, and statistics courses. Secondly, they wanted someone experienced in the ways of conducting interviews (i.e., collecting data) and translating the information into an easily digestible form (i.e., not only help write reports for the NSF but also write for public dissemination).

That was all they were looking for initially, until I mentioned I had worked with NodeXL, a template that turns Microsoft Excel into a tool for analyzing social networks. I was introduced to the program by Brant Houston, in his investigative reporting class at the university. The Excel plug-in comes in handy during an investigation when you need to do things like plot like the flow of money or political influence within organizations or among groups of people. As it turns out, the grant was conducting a first-of-its kind analysis of teaching networks and needed someone with my expertise.

The moral of this story could be that if you develop skills beyond traditional journalism in undergraduate/graduate school, it’s easier to parlay your skills into a new career when the journalism jobs market tanks. But the fact is I’m still practicing journalism, albeit during my off-hours.

I recently submitted an investigation of a local church with more than $100,000 in tax liens to, a Knight foundation-funded community news website. The investigation required digging up and looking through nonprofit tax records, federal tax liens, city ordinances, and even credit union call reports. The investigation stemmed from a legal notice I stumbled upon in the aforementioned investigative journalism class.

Rather, this is why a journalist should learn data journalism: to become a more versatile investigator.

When I was teaching introductory journalism classes to freshmen and sophomore university students, I wanted them to know exactly why it’s useful to have computer and data journalism skills. So I put together a presentation on data journalism for a lecture of about 100 students, and asked data journalism veteran Steve Doig, who is currently the Knight Chair at the Walter Cronkite school of journalism, for a few bits of advice.

More on Doig later. But first, it was important to set the scene for the students.

It’s 1992. Cell phones were bulky and cumbersome. Starbucks had just recently gone public, but hadn’t begun its rapid expansion or executing its plan for world domination. The standard internet connection came in at a blistering 14.4 kbps, that is, if you had one at all.

The internet didn’t look like anything it did today. But it was rapidly changing, and in profound ways. The first visual browser, Mosaic, had just been developed – just a few blocks away here on the U of I campus, I pointed out to the students – at the National Center for Supercomputing Applications.

So, 1992 was a good year for the internet. But it was a horrible year for Florida, which suffered one of the worst natural disasters in recent history.

Radar image of Andrew.
At category five, Hurricane Andrew was the strongest kind of hurricane. Its winds gusted up to 177 mph. That made it damaging enough to be the costliest hurricane on record, that was, until hurricane Katrina. Andrew caused tens of billions of dollars in property damage. More than 90,000 homes were decimated.

Tens of thousands of people were left homeless, and the National Guard set up tents and distributed MRE's -- or “Meals Ready to Eat.”

Steve Doig, who was a reporter for the Miami Herald at the time, had the roof of his home blown off. But his was a fairly new home, and he figured it should have weathered Andrew better. This made him wonder if contractors weren’t building houses as sturdy as they should have been.

Like reporters do during natural disasters, the Miami Herald reporters started immediately covering the aftermath and cleanup efforts. But other Herald reporters started thinking over Doig’s question about house construction.

Much like other parts of the country, Dade County Florida was experiencing both urban sprawl and a housing boom. Developers were buying up large, cheap tracts of land and suburbanizing plots of land outside Miami with new houses. And just a cursory drive through these subdivisions, Doig got a sense that the newest suburbs, with the newest houses, didn’t fare as well as older houses.

If the newest houses didn’t fare as well, didn’t that mean there was something wrong with the newer building codes? Were the building codes being relaxed so that developers could make a quick buck? Were homeowners being taken advantage of?

Aerial photos of damage following hurricane Andrew.

These were questions Doig and others at the Herald wondered, but did not immediately have the resources to find the answers. Reporters could ask homeowners. They could ask building inspectors. But what the Herald really needed were cold hard facts – facts about the damage of the houses, facts about the age of the homes in the area, and facts about the intensity of the storm.

Herald reporters learned that the American Red Cross was doing a study of the damage of the area, so they asked for a copy of the information. But the information wasn’t perfect. For one, it was mostly a collection of notes, and nothing neat and convenient like a spreadsheet that reporters could easily search, index and compare entries. So they couldn’t use the Red Cross data.

Doig was able to find a database made by Dade county. But that had only about 8,000 of the damaged homes, which was about 10 percent of all the property damage. It wasn’t a huge sample size, but it did have a property identification number and whether that property was still habitable or not -- and that proved immensely useful.

With the property identification number, Doig could do all sorts of things. He could take that number, and look in the county assessment records -- because local governments keep tabs on every house’s assessed value for property tax purposes -- and then from that they could know how much the property was worth, where it was built and also (very importantly) when it was built.

Now Herald reporters knew when the houses were built, and how many of those survived and how many didn’t. But the reporters still had to find out how big a role the intensity of the hurricane played in the disaster. To do that, reporters needed something meteorologists call a wind contour map, which estimates the speed of the hurricane at different spots on the ground.

The wind contour map, as published by the Miami Herald.

Doig kept collecting information as soon as it was recorded:  the damage of the house, what kinds of winds it experienced, and when it was constructed. And when he overlaid these three sets of information were on a map, he saw a visual pattern. The newer a house was, the more likely it was to fail under the hurricane.

Here’s what that information looks like in a different visual configuration -- a bar chart. It’s more clear in this visualization that newer homes fared worse in high winds, but also that they performed poorly in the lower wind speeds.

Doig and the Herald had the data. They had the proof. But was that where they stopped? No, that was just the start. Then they started asking questions.

They could say with certainty that the newer homes weren’t built as sturdy. This gave them ammunition to confront the builders with the question “Your newer homes weren’t safe enough. How come?” That gave them an enormous advantage in the investigation, and that helped them uncover something pretty startling.

Doig’s roof came off. As it turned out, builders were using staples to hold the roofs together in Miami houses to save money. And it turns out there were so many houses getting built that the building inspectors were overwhelmed with work, and they were inspecting about four times as many homes as they were supposed to. It was gross negligence on a massive scale.

Doig worked with the databases, asked the questions, did the investigation and published a 16 page report called “What Went Wrong.” And for their efforts, they were awarded a Pulitzer Prize in 1993. And none of it could have been possible without the knowhow to use the data that was right in front of them.

Andrew was a long time ago, but Doig’s Pulitzer is a historical marker as to when data journalism really started in earnest. Yet many mainstream media operations, and unfortunately many journalism schools, are still largely clueless about data journalism. This is despite all the computer networking and personal computing power at our disposal, despite the fact that any enterprising individual can find lessons and knowledge about computer programming and statistical analysis immediately, and for free.

It’s a conundrum. While there’s a clear need for journalists who understand computers and aren’t afraid of numbers, and the job market for them is growing, journalists with these skills sometimes find themselves underutilized in an industry that hasn’t caught up with them. Because of this lack of interest, to succeed at data journalism, a journalist has to be proactive, self-learning, and constantly augmenting his or her skill set. So several months ago I asked Doig if he had any advice for aspiring data journalists.

“Take a basic statistics class,” began Doig’s email. “That was something I failed to do while I was in college, and I've spent my career slowly and painfully teaching myself how to do things I could have quickly learned in a semester in college. But try to take that class in the sociology or political science department, because the examples they will use will resonate better for journalists than those in a stats class taught by the math department.”

Doig had taken a calculus class in his freshman year at Dartmouth in 1966, and only earned a “so-so” grade. But his math teacher had helped co-author the BASIC computer language, and so Doig had his first brush with computer programming at the time.

It wasn’t until 15 years later that the personal computer revolution had whittled the size and price of computers down to something a reporter could buy. When Doig was working as a reporter in Tallahassee in 1981, he purchased an Atari 800 computer and started putting his BASIC skills to work.

“So I got the Atari and began learning to make it do fun things,” he wrote. “But I also began to realize I could make it do work that would help me in my job covering state government in Tallahassee.”

“One thing I recall is writing a program in BASIC that would take a legislative rollcall vote and parse it out by various political demographics beyond simple Dems vs. GOPs: rural vs. urban, upstate vs downstate, race, gender, leadership vs. rank-and-file, etc. It also wrote out the ‘how they voted’ agate type that had always been a pain to type in as a sidebar to our stories.”

As an aside, it’s important to note that Doig’s used his experience in working with voting statistics in 2000, during the much-contested Florida recount in the race between Al Gore and George W. Bush. He concluded that had Florida had an error-free count of the votes in that presidential election, Al Gore would have won the state, and therefore the presidency.

“The real importance of the Atari, though, was how it opened up to me the possibilities of analyzing data for stories, data that would be too tedious to analyze by hand,” Doig wrote.

“All my later success, including my role in the Hurricane Andrew Pulitzer, is thanks to the running start I got by playing around with that Atari 800.”