Monday, December 19, 2011

A breakthrough in data visualization, what it means for data journalism, predicting the news



Earlier this month, the National Science Foundation announced a new system to help researchers make sense of stores of scientific papers, and potentially find the “next big thing.”

The Action Science Explorer, or ASE, developed jointly by University of Michigan and University of Maryland faculty, takes a difficult cognitive task -- backtracking through paper citations to identify a breakthrough -- and “offloads” it to the much easier task of perceiving density in network visualizations. In other words, it takes mounds of difficult to digest research, and uses social network analysis techniques and graphing to make the information immediately recognizable.

The ASE visually represents papers and concepts as they appear over time, identifies the moment where fields branched out and flourished, and also finds moments where other research became obsolete or lost. It also identifies emerging fields of study:

“Users can quickly appreciate the strength of relationships between groups of papers and see bridging papers that bring together established fields. Even more potent for those studying emerging fields is the capacity to explore an evolutionary visualization using a temporal slider. Temporal visualizations can show the appearance of an initial paper, the gradual increase in papers that cite it, and sometimes the explosion of activity for ‘hot’ topics. Other temporal phenomena are the bridging of communities, fracturing of research topics, and sometimes the demise of a hypotheses.”
(from the ASE tech report)

Here’s how it works:



The ASE researchers say this software has potential in the fields of linguistics, biology and sociology, writing “Both students and educators must have access to accurate surveys of previous work, ranging from short summaries to in-depth historical notes. Government decision-makers must learn about different scientific fields to determine funding priorities.”

But suppose data journalists use similar tools to analyze legislation over time, to forecast future bills and political alliances. Clusters would indicate where certain provisions failed, where lobbyists and special interests had influenced legislation the most, and possibly how those interests would proceed in the future. Instead of conducting reactionary reporting, or relying on too-late intelligence that lets legislation slip through unnoticed, reporters could use the system to help guide questions and investigations.

In September, computer scientist Kalev Leetaru here on the University of Illinois campus did something just as remarkable. He compiled more than 100 million media reports, text-mined and crunched them in a supercomputer, and was able to chart and even predict the instability in Libya and Egypt.

Impressively, Leetaru was also able to use those news reports to estimate the location of al-Qaeda leader Osama Bin Ladin with a 200km degree of accuracy. From the BBC news, who reported on Leetaru’s research:
The computer event analysis model appears to give forewarning of major events, based on deteriorating sentiment.
However, in the case of this study, its analysis is applied to things that have already happened.
According to Kalev Leetaru, such a system could easily be adapted to work in real time, giving an element of foresight.
"That's the next stage," said Mr Leetaru, who is already working on developing the technology.
"It looks like a stock ticker in many regards and you know what direction it has been heading the last few minutes and you want to know where it is heading in the next few.
“Predictive reporting” or “news forecasting” could prove invaluable to digital newsrooms, where seconds mean the difference between breaking the news and just being one of the reporting mob. And if news agencies work on integrating advances in computer and information science into the office, instead of just reporting on them, it could enhance reporting across the entire organization.

Wednesday, December 14, 2011

Founding a Professional Society of Drone Journalists


It’s been quite a month for drones. After Iranian armed forces captured one of the coveted American RQ170 stealth drones, the very same stealth drone that pierced Pakistani airspace to spy on Osama bin Laden, Wired’s Spencer Ackerman released previously unpublished photos of the carnage that U.S. military drones unleashed in Waziristan.

Later, the Los Angeles Times wrote about how the U.S. Customs and Boarder Protection lent a Predator B drone to North Dakota law enforcement. Sheriffs in Nelson County, N.D., fearing a search for missing cattle would end with deadly firefight with a “sovereign citizen” group, spied on the group and arrested members after the drone revealed they were unarmed. The report went on to reveal that local law enforcement had used Predators stationed at the Grand Forks Air Base for at least two dozen surveillance flights since June, and the FBI and DEA have used Predators in their own investigations.

Salon’s Glen Greenwald warned of the expansion of domestic drones, and the sizable lobbying power of drone contractors in Congress, writing “the escalating addition of drones — weaponized or even just surveillance — to the vast arsenal of domestic weapons that already exist is a serious, consequential development. The fact that it has happened with almost no debate and no real legal authorization is itself highly significant.”

Meanwhile, the Washington Post dedicated its December 4 front page to the Israeli military’s use of drones in Gaza. But one Post reporter asked the question that journalists like me have been wondering for some time: What’s the potential use for drones in journalism?

Melissa Bell’s piece, “Drone journalism? The idea could fly in the U.S.” mentions my writing on a drone journalism Google group, where I mention that drone technology could help journalists “to take water or air samples or to scan for topographical data to make assessments about industrial impact on the environment.”

Bell mentioned Matt Waite, a journalism professor at the University of Nebraska – Lincoln and developer of Pulitzer Prize-winning Politifact, who just began the world’s first drone journalism lab. Waite unveiled his plan for a drone journalism lab at a News Foo conference, where the immediate reaction was skepticism.

“News Foo had a number of tech people very interested in and sensitive to privacy issues and they were quite wary,” Waite told data journalist Ben Welsh. “They immediately went to TMZ+Lindsay Lohan as an example of how drones could be misused.

“So when I started thinking about this idea, I immediately thought that people would rightfully be wary of this and that the sooner we started talking about ethics and laws, the sooner we could have answers for criticisms and guidelines to balance the public’s right to know and people’s expectations of privacy.”

I was unaware of Waite’s announcement, or his drone journalism lab, until the WaPo story. But given the most spectacular breach of journalism ethics in recent history (the News of the World/NewsCorp phone hacking scandal), it was not lost on me how important it would be to establish a code of ethics for drone journalists. The code of ethics would be deliberated and drawn up by experts in the field, similar to the way the Society of Professional Journalists developed and supported its code of ethics.

To that end, I purchased Dronejournalism.org as the future home of the Professional Society of Drone Journalists (PSDJ). At the time of this post, the website is dominated by a placard that displays the mission statement of the PSDJ: “Dedicated to developing the ethical, educational and technological framework for the emerging field of drone journalism.”

I also called Waite to bounce ideas about the first professional organization for drone journalists. One of his ideas was that the organization pursues a code of ethics via Wiki-style collaboration, but that the collaboration should only involve experts and practitioners of drone journalism. He, too, realized the need for an organization to help pull down a concrete ethical framework for journalists.

“This is really cool on one side, really creepy on the other,” Waite said in the conversation. “I think you are being dishonest if you are on the cool side, not thinking there’s something creepy about [drone journalism]. There’s a significant opportunity for mayhem and privacy violations.”

On the other hand, he said, “I think you are missing the point if you don’t see the amazing things you can do with the technology.”

For an example, Waite pointed out that Russian citizen journalists had employed an SLR-equipped drone to obtain aerial shots of a recent protest. The Daily Beast, one of the first news organizations to use a drone, surveyed tornado damage in Joplin, Missouri, and flood damage in Natchez, Mississippi and Minot, North Dakota.


Video from a citizen journalist capturing footage during Poland protests.

Waite said one of the first things he’s going to try to do with his first drones is attempt to violate his own privacy. And, of course, if the drone does violate his privacy, that would make a first case study for developing an ethical framework for drone journalism. “I could stand on a public sidewalk and see if I can’t get a drone high enough to get into my backyard with my kids with a sign that says ‘you’re violating my privacy,’” he said.

But there’s two other components to the PSDJ besides ethics: education and technology. We need to teach journalists how to use the equipment safely and effectively, and we need to keep journalists at the forefront of civil drone technology.

Waite used a $1,000 grant from the company he founded to purchase an off-the-shelf drone, the AR Drone quadcopter by Parrot, to be equipped later with a GoPro HD video recorder. Out of the box, the AR Drone provides a relatively stable platform for shooting video, and is controllable by iPhone or Android smartphone. Steve Doig, a Pulitzer Prize-winning data journalist who teaches at ASU, also is experimenting with the AR Drone platform.

"You can get it at Brookstone in the mall," Waite said. "It's got an API and you can hack it. It's made of stock parts. You can controll it from your smartphone. And it's cheap."

A Parrot AR Drone in flight.

The next step for me will likely be purchasing the same drone and outfitting it in the same fashion. Not too much later, I hope to be able to develop some Arduino-based, fixed-wing aircraft to shoot photos along a predetermined path, and stitch those photos together later. But Waite and I know this is just a starting point; an inexpensive, yet effective demonstration of the concept. From there, it’s experimentation and learning.

“What I would love to do, once we have these platforms, is let’s cover some news,” Waite said.  “A house fire in your city. Spring floods. There will be tornadoes, it’s as predictable as the sun coming up. Let’s cover them and write about our experiences and through those.”

Wednesday, November 30, 2011

Being a More Versatile Journalist: Data Journalism Veteran Steve Doig Wants Journalists to Know Statistics

Aerial photograph of the devastation from hurricane Andrew in 1992. Steve Doig, who was a reporter for the Miami Herald at the time, used his data journalism chops to survey the damage and write a Pulitzer-prize winning expose on construction malpractice. Earlier this year, I asked him what aspiring data journalists should be learning.

I cringe when bloggers begin a post by apologizing to readers for a lack of updates. This is partly because most people do, or should, understand that the gig doesn’t pay. But mostly, every word you waste on explaining your absence is one more chance for a reader to lose interest and go somewhere else. So I’ll just say it’s been an eventful couple of months, and tell you why it’s actually relevant to this blog.

Having just finished a master’s in journalism at the University of Illinois, I was extremely lucky to find a National Science Foundation grant that is training better K-12 science teachers.

At the grant, we do this by teaching lessons in entrepreneurial leadership to science teachers. That translates into experiences like students constructing their own spectrophotometers, or high school students manufacturing their own biofuel, or even collaborations where high school students set up demonstrations on electricity for grade school students to work through.

It’s a radical, but practical approach that hopes to improve the nation’s competitiveness in science teaching. In January, results from the National Assessment of Educational Progress report card on teaching showed that 47 percent of all high school seniors in the country are deficient in the sciences.

Why would an NSF grant want a journalist? For one, I understood their language. Being a former undergraduate student of mechanical engineering, I had taken chemistry, physics, calculus, and statistics courses. Secondly, they wanted someone experienced in the ways of conducting interviews (i.e., collecting data) and translating the information into an easily digestible form (i.e., not only help write reports for the NSF but also write for public dissemination).

That was all they were looking for initially, until I mentioned I had worked with NodeXL, a template that turns Microsoft Excel into a tool for analyzing social networks. I was introduced to the program by Brant Houston, in his investigative reporting class at the university. The Excel plug-in comes in handy during an investigation when you need to do things like plot like the flow of money or political influence within organizations or among groups of people. As it turns out, the grant was conducting a first-of-its kind analysis of teaching networks and needed someone with my expertise.

The moral of this story could be that if you develop skills beyond traditional journalism in undergraduate/graduate school, it’s easier to parlay your skills into a new career when the journalism jobs market tanks. But the fact is I’m still practicing journalism, albeit during my off-hours.

I recently submitted an investigation of a local church with more than $100,000 in tax liens to CU-CitizenAccess.org, a Knight foundation-funded community news website. The investigation required digging up and looking through nonprofit tax records, federal tax liens, city ordinances, and even credit union call reports. The investigation stemmed from a legal notice I stumbled upon in the aforementioned investigative journalism class.

Rather, this is why a journalist should learn data journalism: to become a more versatile investigator.

When I was teaching introductory journalism classes to freshmen and sophomore university students, I wanted them to know exactly why it’s useful to have computer and data journalism skills. So I put together a presentation on data journalism for a lecture of about 100 students, and asked data journalism veteran Steve Doig, who is currently the Knight Chair at the Walter Cronkite school of journalism, for a few bits of advice.

Monday, September 12, 2011

What improved word clouds reveal in Obama, Bernanke jobs and economy speeches


The above is a word cloud using President Obama’s Sept. 8 address to Congress. As is customary with word clouds, the more times a word occurs in a text, the larger the font size in the cloud. Even if you weren’t aware of the nature of the speech, it’s obvious from the cloud that Obama’s address to Congress dealt with “jobs” in “America.”

But word clouds have limits. Seth Duncan, analytics director for the digital public relations firm WCG, wrote on the bynd.com blog in 2010 that the simplicity of the word cloud could contribute to a decline of reading comprehension. In his post, “Word Clouds and the Cognitive Decline of PR and Marketing,” Duncan wrote that he strongly believed “that the word cloud is the biggest enemy of deep reading and lowest form of artificial intelligence in marketing and PR.”

“You can read the content very quickly (because they don’t contain much information) and they have a unique look. I also think that word clouds can provide useful information for SEM or SEO planning. But people are fooling themselves if they think that a word cloud offers a satisfactory summary of hundreds or thousands of pages of text,” he wrote.

NYU political science PhD student Drew Conway has a similar, but different beef with word clouds. Conway looked at a word cloud, essential a plot of words in three dimensions (x, y, and font size), and saw a missed opportunity. “They are meant to summarize a single statistics—word frequency—yet they use a two dimensional space to express that,” he wrote.

His solution came from his background in statistics, which oftentimes compares two sets of data. For his improved word cloud, he compared two speeches by political figures and used the x-axis to describe the similarity between two speeches. To accomplish this, he used the free, open-source statistical programming environment R, which has a data-mining and graphics plotting features, along with some custom coding.

But what to compare the Obama jobs speech to? That same day, bankers and business executives at the Economic Club of Minnesota waited eagerly to hear the Fed Chair Ben Bernanke outline what the Fed would do to alleviate economic concerns.

Obama and Bernanke were speaking to two very different audiences, and had different objectives. Obama was speaking to a Congress hell bent on being re-elected and an anxious, under-employed American public. Meanwhile, Bernanke was speaking to titans of industry and banking. These differences shouldn’t be an excuse not to compare the two speeches; rather, both speakers are components of the administration weighing in on essentially the same issue.

Differences in their speeches could signal a difference in opinion and discord about an appropriate response, while similarities could point to ideas with a measure of political support. If nothing else, it’s worth looking at how two high-ranking officials in an administration tailor speeches on economic issues to two different audiences.

Here’s what those two speeches look like in Conway’s “better word cloud.” Click to see the plot in a higher resolution.

Friday, September 9, 2011

Hopelessness and Hope in Pilsen - BATTLE IN THE BARRIO part 4/4


An anti-Fisk poster hung by activists in a Pilsen Thrift store.
“And every morning was a requiem
or the feast day of a martyr -
the priest in black or red,
cortege of traffic, headlights
funneling through incense
under viaducts. While my surplice
settled around me like smoke
my father rode the blue spark
of a streetcar to the foundry
where, in the dark mornings,
the cracks of carbonized windows
flowed with the blood of stained glass.”


- Excerpt from “Autobiography,” a poem by Stuart Dybek, a Pilsen native and a 2007 recipient of the MacArthur “genius grant.”
NOTE: The following is the last in a series of four stories about the environmental and health impact of coal fired power plants on densely-populated, low income Chicago communities. It's called "Battle in the Barrio: the Struggle in Chicago's Pilsen Neighborhood Against Pollution." The series is a journalistic project that culminated in a master's thesis for the University of Illinois at Urbana-Champaign.

Part One: Four Sisters, One Rare Disorder
Part Two: Old Problems, New Attention

Part Three: The People VS the Bottom Line

Part Four: Hopelessness and Hope in Pilsen

Visualization - Is there injustice in Pilsen?
Visualization - Chicago's Pilsen neighborhood struggles with pollution
South-side children have greatest exposure to lead in Chicago, health department data shows

If you have the time, Maria Torres has stories.

Since she became a community organizer a decade ago, helping gather signatures for petitions and lately rallying support for the Clean Power Ordinance, she’s collected quite a few.

Mostly, they involve people who’ve suddenly come down with asthma, respiratory illnesses, rare forms of cancer, lupus and other medical abnormalities.

“I have a family that lives right in front of the Perez school,” she said. “Her son was just diagnosed with asthma, and has to use an inhaler. And he’s real little. You feel for them, because they tell you how hard it is for her son to use the inhaler. It’s really hard for him because he’s a little kid and he doesn’t know how to. He just developed it, and didn’t have it before. I feel for them, I really feel for them. And it scares me.”

In addition to the verb “scares,” as in, “it scares me,” and “freaks,” as in “it freaks me out,” she frequently uses the adjectives “spooky” and “weird” to describe the magnitude of health problems she’s heard of while knocking on doors as a community organizer in Pilsen.
There’s the story she heard about an 80-year old woman, who lives on Morgan between 18th and 19th streets, not far from the Fisk plant, and got a routine X-ray for breathing problems.

The doctors asked the woman’s daughter, who took her mother in to be examined, if the mother was a regular smoker.

“She’s never smoked a day in her life,” Torres said. “But her lungs were all black.”

Thursday, September 8, 2011

South-side children have greatest exposure to lead in Chicago, health department data shows


This interactive heat map, compiled using Chicago Department of Public Health data, GIS files, and Google Fusion, shows where Children with the highest rates elevated blood lead levels in Chicago live. Data is from 2010.


Chicago Department of Public Health data shows that children in the poorer, industrialized south of Chicago are more likely to have dangerous levels of lead in their bodies than children in more affluent neighborhoods.

The data, obtained by a FOIA request from the health department, shows the levels of lead the agency found in children 17 and under in the city of Chicago. Most children tested for lead, however, were under 6 years old.

“An EBL or elevated blood lead level, is defined… as the child’s highest venous test with a result of 6 or more micrograms lead (Pb) per deciliter blood,” the health department wrote.

According to the EPA, there is no safe level for lead in the human bloodstream. At 10 micrograms per deciliter of blood, children can develop symptoms such as “lowered intelligence, reading and learning disabilities, impaired hearing, reduced attention span, hyperactivity, and antisocial behavior.”

The most recent results are from 2010, but the file contains annual results back to 2005. They were compiled with the help of an epidemiologist in the department.

“Multiple blood lead tests were determined using an algorithm that matches children by name, date of birth and sex, while allowing for common typographical and data entry (eg, reversing first and last name) errors for blood lead tests conducted within a calendar year,” the health department wrote.

In the interactive heat map at the top of the post shows the rate at which children in each of Chicago’s 77 communities reported elevated levels of lead.

The Englewood community has the highest EBL rate, where 9.15 percent of the children who were tested for lead came back with a positive EBL. Neighborhoods in the north end of Chicago had EBL rates between 0.8 percent and 3.31 percent.

Wednesday, September 7, 2011

Visualization - is there injustice in Pilsen?


This visualization was produced as part of a series about Pilsen, a Chicago neighborhood, and its struggle against pollution. Parts one, two and three of that series have been published on MentalMunition.com.