FlashHacks: Unlocking 10 Million Data Points in 10 Days
We spoke with Hera Hussain, Community Manager at OpenCorporates, about their decision to launch their new crowd scraping programme at OKFestival, their #FlashHacks campaign to scrape 10 million data points in 10 days and the overall collaborative spirit of the open knowledge movement – Here is what we found!
Technology and the people using it to effect change are at the heart of the open knowledge movement and at OKFestival over 60 developers took time out of their busy schedules to sit in the scorching Berlin sun and help OpenCorporates and three NGOs liberate over 10 million data points.
OpenCorporates aims to fulfil a simple mission – have an open URL for every company in the world. Unfortunately, at the moment, most data about companies is not open, it’s locked away in PDFs and on government and company websites making it nearly impossible to link it back to the company itself. While governments – the owners of a significant amount of corporate data – have embraced open data and have even expressed their willingness to open up company registries for example, gaining access to this data in open formats has been slow; while we wait, citizens are deprived of essential information needed to understand the complicated corporate web. OpenCorporates is not waiting and for the last years they their tiny team has been scraping government and company websites to populate an open database of corporate information.
While fulfilling their mission requires more people than their tiny seven person team can provide and over the years, people have continually reached out to OpenCorporates to find out how they can contribute to their growing database of corporate data, there was no straightforward way for the community to contribute. The team decided that they would harness the energy and enthusiasm generated at OKFestival as well as mobilise the abundances of civic minded developers lurking around the Kulturbrauerei (the festival venue, a massive former brewery complex) in order to launch their new crowd-scraping platform, Missions. To kick things off, they set an ambitious challenge: liberate 10 million data points in only 10 days by empowering the community to write scrapers, computer programmes that extract data from human-readable content. According to Hera Hussain, Community Manager at OpenCorporates and the mastermind behind the initiative:
Mission Accomplished! During OKFestival alone, 6 million data points were added to the database! Those combined with the 6 million added in the 7 days leading up to the festival, meant that the open data community didn’t just meet challenge, they demolished it! The idea to run the #FlashHacks campaign came together months before, at another open event, Transparency Camp, organised by the Sunlight Foundation. OpenCorporates decided they’d team up with three NGOs at OKFestival (Open Knowledge Germany, Code for Africa and Sunlight Foundation) in order to scrape information about German companies, African licence data and US political finance data. By partnering with these three organisations who require access to this data to do the work that they do, it made the whole campaign more tangible for participants who were able to see exactly why they were spending hours in the heat writing bots.
While opening up data was a clear goal of the #FlashHacks campaign, OpenCorporates also had another mission – to celebrate and teach the art and science of an essential tool used by the open data and open knowledge movement – scrapers. This goal – perhaps more important in the long run – was also an astonishing success! Experienced developers were paired with other developers who had never written scrapers before and non-developers came along to help find datasets to be scraped! Hera described the amazing energy and the enthusiasm to teach stating
She enjoyed being able to directly experience to the excitement displayed by developers who had never written scrapers before, writing their first scraper and realising the contribution they could make to an NGO with just a few hours of their time.
When asked why Hera and the OpenCorporates team chose to launch this new platform and #FlashHacks campaign at OKFestival, Hera said “There isn’t a better place to do this than OKFest, everybody who is anybody in the open data movement was there; there was an incredible buzz about it!” Despite a jam packed programme to compete with, people turned up in droves to contribute anywhere from a few hours to a few days of their time to opening up data. OKFestival demonstrated that collaboration, sharing and participation are more than values preached by the open movement, they are the driving forces that make it possible to scrape 10 million data points in 10 days. We are awaiting our next mission.