Today is Open Data Day. Here in DC, the hacking crowd meets at the World Bank to open up data and prepare some really interesting, mostly DC focused projects as can be seen on this Tumblr. (I could only follow the action through twitter, as I was late in registering. But I am glad to see these kinds of events being popular, even on a grey Saturday).
One thing though surprised me. When looking at the hacking crowd the Open Data Day invited, developers, designers, librarians, statisticians, and interested citizens were mentioned. But I think one group is missing in the equation: journalists.
And yet, this group of professionals has the most to gain from open data and APIs that allow to access data previously closed or difficult to evaluate.
At the 15th International Anti-Corruption Conference last year in Brasilia, Giannina Segnini, who leads the investigative unit of the Costa Rican newspaper La Nación, proposed the necessary evolution of a modern journalist.
She used the analogy of the investigative journalist as a topo (mole translated literally, even though a tracking dog might be a better fit).
Starting off as a library mole, the journalist went into the archives and digged tunnels among folders, and piles of papers, birth registries, account information, etc. This, of course, was not enough to get the complete picture. The journalist then had to leave the archives to put the information into context, talk to people, and thus become a topo callejero, a street mole. For the last century, this is how investigative journalism has worked.
However, in the 21st century, and a world becoming increasingly interconnected across countries, archives becoming digital and growing infinitely, just sifting through files and folders and talking to people on the streets is not enough anymore.
Now, you have to dive into the sea of digital data, if you want to uncover new stories, you have to become a hacker, a topo hacker. Or, to express it differently, a data journalist.
The investigative unit Giannina set up is impressive. It includes developers, journalists, and designers working together to investigate issues. Data is scraped from websites 24/7 to create databases of public information stored on La Nación’s servers that can then be used for further investigations. More detail on the process here and in this graphic The data machine:
In Giannina’s words:
“The good thing about this is that if you combine those two worlds [journalism and hacking], the outcome is very powerful. Not only can you actually prove facts, but also … you are not relying on a source.”
This research already has led to some impressive investigations, uncovering corruption and embezzlement.
There are similar initiatives in the region, such as Argentina’s La Nación. Here’s a great post on how to build such a data journalism team.
Other examples are innovative reporting initiatives like InfoAmazonia, a platform run by Knight Fellow Gustavo Faleiros that gathers news and maps from the Amazon Rainforest. At the IACC Hackathon, he added a new layer to discover the relationship between cattle ranching and deforestation.
Data journalism initiatives have grown increasingly popular over the last year or two. The Guardian Data Blog definitely led the pack when it comes to visualising the data and raising the awareness for potential stories hidden amongst masses of data. In Germany, I like the work of the data team of Die Zeit. And there is even a Data Journalism Award.
This kind of journalism is the future in a world dominated by data. And while a lot seems to be happening, there is still a lot more to be done to make sure more journalists can bring light into the dark corners of corrupt businesses, political intrigue and crime with the help of data.
The tools: Scraping, Visualizing, Security
So, there is still an initial barrier to overcome before feeling comfortable in finding, accessing and interpreting data, but more and more initiatives are geared to to exactly this. Tactical Tech is preparing a School of Data, the Kickstarter initiative For Journalism aims to develop online courses for programming and visualizing of data. Paul Bradshaw is writing a book on how to use scraping in journalism. And the wiki Scraperwiki is quite accessible, I believe.
Now, it would be great if citizen activists, developers and journalists create a new agora to hold those in power accountable and make sure that the data that matters is put to use to make politics more transparent, and develop better policies.
Next year, the Open Data Day needs to happen with journalists.
Illustration by Tactical Tech