Posts

The Three Phases of Open Data Quality Control

  By Dennis D. McDonald, Ph.D., dmcdonald@balefireglobal.com Introduction In my previous post about open data quality the suggested solutions relate not just to adhering to standards but also to making sure that the processes by which open data are published and maintained are efficiently and effectively managed. In this post I drill down a bit more on that point about the management processes. Three Phases When discussing open data it helps to look at open data projects with tasks divided into at least three related phases: Assessment and planning Data preparation and publishing Ongoing maintenance and support Different tools and processes are relevant to each phase. Each can have an impact on the quality of the data as well as its perceived quality. Phase 1. Assessment and planning Critical to data quality at this first phase of an open data project is an understanding of the "who, where, how, how much, and why" of the data. If the goals of the project include making data f...

How Important Is Open Data Quality?

Image
By Dennis D. McDonald, Ph.D. Email: dmcdonald@balefireglobal.com At risk? Martin Doyle's Is Open Data at Risk from Poor Data Quality is a thoughtful piece but doesn’t address this question: Should data quality standards observed in open data programs be any different from the data quality standards observed in any other programs that produce or consume data? My first response is to answer with a definite “No!” but I think the question is worth discussing. Data quality is a complex issue that people have been wrestling with for a long time. I remember way back in graduate school doing a class project on measuring “error rates” in how metadata were assigned to technical documentation that originated from multiple sources. Just defining what we meant by “error” was an intellectually challenging exercise that introduced me to the complexities of defining quality as well as the impacts quality variations can have on information system cost and performance. Reading Doyle’s article rem...

ODI DATA CERTIFICATES ARE A BIG DEAL

Image
This morning something happened that will gradually impact the way we interact with data. This morning OpenDataSoft (ODS) embedded the Open Data Institute's Data Set Certificates into each and every data set page. Will other open data platforms follow? Maybe. Embedding certificates is something I have been advocating since the idea was just an idea at the Open Data Institute (ODI). No one until ODS ever took me up on my offer. These data certificates show a willingness on the part of the data steward to consider the following: The impact on individual privacy API and format documentation to ensure a greater chance of data re-use Metadata on where the data originates and how often it is refreshed RDF description tags and identifiers that allow f...

How Cost Impacts Open Data Program Planning - and Vice Versa

Image
By Dennis D. McDonald, Ph.D. dmcdonald@balefireglobal.com Introduction How important are costs when you are planning an open data program? Are they, as suggested by Rebecca Merrett in Addressing cost and privacy issues with open data in government , the "… elephant in the room," especially when data anonymization costs are being considered? Or are such costs just a normal consideration when planning any project where quantities of different types of data have to be manipulated and delivered? It's difficult to make a generalization about this. Open data program costs can fall along at least three general dimensions: 1. Controlled versus uncontrolled 2. Known versus unknown 3. Startup versus ongoing 1. Controlled versus uncontrolled Why worry about what you can’t control? The answer is because they can impact your program whether you control them or not. Examples of uncontrolled costs might be: Taxes, licensing, insurance, and registration fees. Staff salaries that can...

Three Things about Open Data Programs That Make Them Special

Image
By Dennis D. McDonald, Ph.D., Balefire Global, dmcdonald@balefireglobal.com During the brainstorming session at the inaugural meeting of the Open Data Enthusiasts meet up last week in Washington DC, attendee David Luria commented that we need to do a better job of understanding, defining, and communicating the objectives of open data programs if we want them to be successful. I couldn't agree more. Program objectives need to be clearly defined and shared with stakeholders and program participants so that everyone is marching in the same direction. If we don't understand and agree on our objectives how can we establish requirements and metrics to measure what we're trying to accomplish? Admittedly the above principle is straight out of Project Management 101 and describes the initial steps you need to take in planning and documenting any project, not just those involving open data. Still, what I have noticed after involvement with many data related projects is that there a...

SOMETIMES THE OPEN DATA PLATFORM DOES MAKE A DIFFERENCE

In May of this year, Asian and European countries met for the regional Open Government Partnership summits to once again discuss transparency and open government. In light of the session tracks that were presented I am evaluating some of the technologies of the past and how there has been a welcome and fundamental shift from Open Government and Open Data being ambiguously linked toward the separation of the two in more current thinking and in technology approaches. Most notably, the deprecation of Microsoft's Open Government Data Initiative platform is a positive sign of the times that the government community is becoming aware of the danger of open government and open data linking. Harlan Yu and David G. Robinson discussed the OGP in "The New Ambiguity of Open Government" (Princeton CITP/Yale ISP Working Paper).   "The Open Government Declaration is broad approach toward 'openness,' as signatories commit to 'seeking ...

Open Data Program Managers Need Both Analytical and Structural Data Skills

Image
By Dennis D. McDonald, Ph.D., Balefire Global, dmcdonald@balefireglobal.com Introduction In Management Needs Data Literacy to Run Open Data Programs I addressed the question of how much “data literacy” open data program managers need. I outlined a series of topics corresponding to different parts of the data management lifecycle the program managers need to be familiar with. While certainly I don't believe it is necessary for all program managers to be “data scientists” to manage open data programs effectively, I do think that there are certain data related skills that managers do need. One of the most important is the ability to think about data both from analytical as well as structural perspectives. The analytical perspective Analytically, managers need to understand that useful data are not just random collections of numbers but represent patterns and trends that can be used to tell stories about the objects or events with which the numbers are associated. The range of tools ...