Wednesday, November 30, 2011

Being a More Versatile Journalist: Data Journalism Veteran Steve Doig Wants Journalists to Know Statistics

Aerial photograph of the devastation from hurricane Andrew in 1992. Steve Doig, who was a reporter for the Miami Herald at the time, used his data journalism chops to survey the damage and write a Pulitzer-prize winning expose on construction malpractice. Earlier this year, I asked him what aspiring data journalists should be learning.

I cringe when bloggers begin a post by apologizing to readers for a lack of updates. This is partly because most people do, or should, understand that the gig doesn’t pay. But mostly, every word you waste on explaining your absence is one more chance for a reader to lose interest and go somewhere else. So I’ll just say it’s been an eventful couple of months, and tell you why it’s actually relevant to this blog.

Having just finished a master’s in journalism at the University of Illinois, I was extremely lucky to find a National Science Foundation grant that is training better K-12 science teachers.

At the grant, we do this by teaching lessons in entrepreneurial leadership to science teachers. That translates into experiences like students constructing their own spectrophotometers, or high school students manufacturing their own biofuel, or even collaborations where high school students set up demonstrations on electricity for grade school students to work through.

It’s a radical, but practical approach that hopes to improve the nation’s competitiveness in science teaching. In January, results from the National Assessment of Educational Progress report card on teaching showed that 47 percent of all high school seniors in the country are deficient in the sciences.

Why would an NSF grant want a journalist? For one, I understood their language. Being a former undergraduate student of mechanical engineering, I had taken chemistry, physics, calculus, and statistics courses. Secondly, they wanted someone experienced in the ways of conducting interviews (i.e., collecting data) and translating the information into an easily digestible form (i.e., not only help write reports for the NSF but also write for public dissemination).

That was all they were looking for initially, until I mentioned I had worked with NodeXL, a template that turns Microsoft Excel into a tool for analyzing social networks. I was introduced to the program by Brant Houston, in his investigative reporting class at the university. The Excel plug-in comes in handy during an investigation when you need to do things like plot like the flow of money or political influence within organizations or among groups of people. As it turns out, the grant was conducting a first-of-its kind analysis of teaching networks and needed someone with my expertise.

The moral of this story could be that if you develop skills beyond traditional journalism in undergraduate/graduate school, it’s easier to parlay your skills into a new career when the journalism jobs market tanks. But the fact is I’m still practicing journalism, albeit during my off-hours.

I recently submitted an investigation of a local church with more than $100,000 in tax liens to CU-CitizenAccess.org, a Knight foundation-funded community news website. The investigation required digging up and looking through nonprofit tax records, federal tax liens, city ordinances, and even credit union call reports. The investigation stemmed from a legal notice I stumbled upon in the aforementioned investigative journalism class.

Rather, this is why a journalist should learn data journalism: to become a more versatile investigator.

When I was teaching introductory journalism classes to freshmen and sophomore university students, I wanted them to know exactly why it’s useful to have computer and data journalism skills. So I put together a presentation on data journalism for a lecture of about 100 students, and asked data journalism veteran Steve Doig, who is currently the Knight Chair at the Walter Cronkite school of journalism, for a few bits of advice.

Monday, September 12, 2011

What improved word clouds reveal in Obama, Bernanke jobs and economy speeches


The above is a word cloud using President Obama’s Sept. 8 address to Congress. As is customary with word clouds, the more times a word occurs in a text, the larger the font size in the cloud. Even if you weren’t aware of the nature of the speech, it’s obvious from the cloud that Obama’s address to Congress dealt with “jobs” in “America.”

But word clouds have limits. Seth Duncan, analytics director for the digital public relations firm WCG, wrote on the bynd.com blog in 2010 that the simplicity of the word cloud could contribute to a decline of reading comprehension. In his post, “Word Clouds and the Cognitive Decline of PR and Marketing,” Duncan wrote that he strongly believed “that the word cloud is the biggest enemy of deep reading and lowest form of artificial intelligence in marketing and PR.”

“You can read the content very quickly (because they don’t contain much information) and they have a unique look. I also think that word clouds can provide useful information for SEM or SEO planning. But people are fooling themselves if they think that a word cloud offers a satisfactory summary of hundreds or thousands of pages of text,” he wrote.

NYU political science PhD student Drew Conway has a similar, but different beef with word clouds. Conway looked at a word cloud, essential a plot of words in three dimensions (x, y, and font size), and saw a missed opportunity. “They are meant to summarize a single statistics—word frequency—yet they use a two dimensional space to express that,” he wrote.

His solution came from his background in statistics, which oftentimes compares two sets of data. For his improved word cloud, he compared two speeches by political figures and used the x-axis to describe the similarity between two speeches. To accomplish this, he used the free, open-source statistical programming environment R, which has a data-mining and graphics plotting features, along with some custom coding.

But what to compare the Obama jobs speech to? That same day, bankers and business executives at the Economic Club of Minnesota waited eagerly to hear the Fed Chair Ben Bernanke outline what the Fed would do to alleviate economic concerns.

Obama and Bernanke were speaking to two very different audiences, and had different objectives. Obama was speaking to a Congress hell bent on being re-elected and an anxious, under-employed American public. Meanwhile, Bernanke was speaking to titans of industry and banking. These differences shouldn’t be an excuse not to compare the two speeches; rather, both speakers are components of the administration weighing in on essentially the same issue.

Differences in their speeches could signal a difference in opinion and discord about an appropriate response, while similarities could point to ideas with a measure of political support. If nothing else, it’s worth looking at how two high-ranking officials in an administration tailor speeches on economic issues to two different audiences.

Here’s what those two speeches look like in Conway’s “better word cloud.” Click to see the plot in a higher resolution.

Friday, September 9, 2011

Hopelessness and Hope in Pilsen - BATTLE IN THE BARRIO part 4/4


An anti-Fisk poster hung by activists in a Pilsen Thrift store.
“And every morning was a requiem
or the feast day of a martyr -
the priest in black or red,
cortege of traffic, headlights
funneling through incense
under viaducts. While my surplice
settled around me like smoke
my father rode the blue spark
of a streetcar to the foundry
where, in the dark mornings,
the cracks of carbonized windows
flowed with the blood of stained glass.”


- Excerpt from “Autobiography,” a poem by Stuart Dybek, a Pilsen native and a 2007 recipient of the MacArthur “genius grant.”
NOTE: The following is the last in a series of four stories about the environmental and health impact of coal fired power plants on densely-populated, low income Chicago communities. It's called "Battle in the Barrio: the Struggle in Chicago's Pilsen Neighborhood Against Pollution." The series is a journalistic project that culminated in a master's thesis for the University of Illinois at Urbana-Champaign.

Part One: Four Sisters, One Rare Disorder
Part Two: Old Problems, New Attention

Part Three: The People VS the Bottom Line

Part Four: Hopelessness and Hope in Pilsen

Visualization - Is there injustice in Pilsen?
Visualization - Chicago's Pilsen neighborhood struggles with pollution
South-side children have greatest exposure to lead in Chicago, health department data shows

If you have the time, Maria Torres has stories.

Since she became a community organizer a decade ago, helping gather signatures for petitions and lately rallying support for the Clean Power Ordinance, she’s collected quite a few.

Mostly, they involve people who’ve suddenly come down with asthma, respiratory illnesses, rare forms of cancer, lupus and other medical abnormalities.

“I have a family that lives right in front of the Perez school,” she said. “Her son was just diagnosed with asthma, and has to use an inhaler. And he’s real little. You feel for them, because they tell you how hard it is for her son to use the inhaler. It’s really hard for him because he’s a little kid and he doesn’t know how to. He just developed it, and didn’t have it before. I feel for them, I really feel for them. And it scares me.”

In addition to the verb “scares,” as in, “it scares me,” and “freaks,” as in “it freaks me out,” she frequently uses the adjectives “spooky” and “weird” to describe the magnitude of health problems she’s heard of while knocking on doors as a community organizer in Pilsen.
There’s the story she heard about an 80-year old woman, who lives on Morgan between 18th and 19th streets, not far from the Fisk plant, and got a routine X-ray for breathing problems.

The doctors asked the woman’s daughter, who took her mother in to be examined, if the mother was a regular smoker.

“She’s never smoked a day in her life,” Torres said. “But her lungs were all black.”