Data Sharing: Access, Language, and Context All Matter

Investments in data – and open data – have grown during the 15 years of the MDG era. These data are collected for various audiences and purposes: project monitoring, censuses, transparency and accountability, and so on. Often, data collected for one purpose may be equally valuable in other contexts. But, due to low awareness or restricted access, these data are too rarely shared. Since the World Bank launched its open data initiative, many of these barriers have begun to recede as open data moves from niche toward default status. However, there is more progress to be made against– and a few key barriers to target.

To begin: in order for data to be effectively shared, it needs to be accessible, meaning available for use and reasonably easy to find. Once accessed, good datasets help tell stories, and stories require both language and context. In this instance, language is the format in which a dataset is available (CSV, XML, JSON – or too often, PDF). Context includes many things; but, to over-simplify, I am referring to both metadata (field definitions, time period covered, etc.) and collection methods used (survey methodology, inclusion/exclusion criteria, calculations, etc.).

As we look to data sharing and re-use as a core driver of achieving the SDGs, here are a few thoughts on where the data community can progress:

  1. Access: So many needles, so many haystacks. I am often asked to track down datasets by various colleagues and partners, because I “know where to find things.” This is a symptom of a global data system that i) is fragmented and disorganized, ii) has data portals with poor user experience design that intimidate and turn away users, and iii) too often requires emailing the right person and calling in favors for data that are not open and free. As Carl Cullinane of LSE said, “There’s a big difference between open and accessible data.” To bridge this gap, we need better ways of curating data as a community, better human-centered designs of portals, and more time spent thinking of how and where users of all levels of expertise will access information. Once data are accessed, clear and open licenses are needed to authorize its use.

  2. Language: humans read data too! JSON and XML are great formats for data to be interoperable and shareable between systems. APIs fuel innovation by reducing transaction costs in establishing new portals and visuals that re-use existing data sources. But these formats are largely incomprehensible to 99% of potential data users. Civil society and government staff, who often have excellent analytical skills, are perhaps more excluded by data that are only available in JSON or XML than by data locked up in PDFs. So keep your JSON and XML – they are critical to fueling innovation in the civic tech space – but make sure CSVs are available as well. Additionally, the actual language of the data matters – publishing data in English for Francophone or Spanish-speaking countries simply won’t do.

  3. Context: the meaning of x. As datasets proliferate, it is critical to know which data can be used together, which fields are comparable, and what data are appropriate for your particular use case. Metadata have often been left behind by the open data movement – the irony being that they are typically the easiest data to provide. Data are neither shareable nor useful if they do not have clearly outlined metadata and documentation of collection methods, as they limit the ways in which they can be responsibly used. Similarly, responsible data use demands proper attention be paid to metadata and ReadMe files that are frequently ignored. I have often cringed reading incorrect conclusions drawn from studies or papers due to improper use or incomplete understanding of what the data mean, and which data are present or missing. Both creators and users of data need to do better in documenting and acknowledging the rules of the road for each dataset to enable wider sharing and re-use.

Over the next 15 years of the SDG era, we look forward to working in partnership with the many groups attempting to address these challenges to make data more comparable, shareable, and useful. To name a few, DG has recently announced its formal participation in the Joined-Up Data Alliance and the Global Partnership for Sustainable Development Data. Stay tuned for frequent updates on what we’re learning together, and what we as a community are able to accomplish.

Image from the Open Knowledge Foundation under CC license.