How Cost Impacts Open Data Program Planning

By Dennis D. McDonald, Ph.D. dmcdonald@balefireglobal.com

Introduction

How important are costs when you are planning an open data program? Are they, as suggested by Rebecca Merrett in Addressing cost and privacy issues with open data in government, the "… elephant in the room," especially when data anonymization costs are being considered? Or are such costs just a normal consideration when planning any project where quantities of different types of data have to be manipulated and delivered?

It's difficult to make a generalization about this. Open data program costs can fall along at least three general dimensions:

1. Controlled versus uncontrolled
2. Known versus unknown
3. Startup versus ongoing

1. Controlled versus uncontrolled

Why worry about what you can’t control? The answer is because they can impact your program whether you control them or not. Examples of uncontrolled costs might be:

Taxes, licensing, insurance, and registration fees.
Staff salaries that can't be reassigned to other programs or cost centers.
Maintenance and support costs for systems that will be incurred regardless of how data are managed or used.
Costs related to unanticipated changes to source data formats.

Examples of controlled costs might be:

Contractors that can be terminated or reassigned to another cost center at the end of the project.
Staffers whose chargeback amounts vary by the number of hours they record in the organization’s time tracking system.
Other costs (for example, postage, communication, printing) that driven by how incoming requests or inquiries are processed and handled.

2. Known versus unknown

As Thufir Hawat told Paul Atreides in David Lynch’s Dune, "Remember … the first step in avoiding a trap is knowing of its existence." So it is with the data related costs associated with open data programs.

It can be troublesome (or at least costlier and more time-consuming) to erroneously assume that data associated with a source program are sufficiently "clean" for publication, even if the data are coming from a system or database that has operated successfully for many years. Older systems designed for batch processing might rarely if ever touch on records that contain errors or out of range values that might “choke" the intake process of the open data platform. Newer or online systems might automatically exclude such values from processing but they might still get passed across and displayed openly in a system designed for public scrutiny, possibly causing misunderstanding or embarrassment.

How to avoid such "unknowns" that might lead to unknown or unexpected costs? The answer: sample, test, and start small. Be aware of data cleanup and standardization cost before committing to a budget and schedule. Use this information to prioritize processing and release of files. Then continually feed back into the schedule the results from actual experience.

Be aware of the options available for anonymizing and how they will impact data visualization. For some crime statistics, for example, it may be undesirable (or even illegal) to pinpoint actual incident data on a neighborhood “heat map” by blurring the exact location (say, to the block as opposed to residents level). Such a strategy which itself might lead to misunderstanding or interpretation errors. Knowing about such issues in advance will help you avoid the "trap" of unanticipated costs and schedule delays.

3. Startup versus ongoing costs

Understanding the costs associated with starting a program up (developing a strategy, building a governance group, prototyping, contracting, internal selling, website modification, etc.) and maintaining the program (keeping data updated, adding new files and services, responding to public comments and criticisms, etc. ) will influence sustainability of the open data program.

Knowing about the one time start up and the recurring and ongoing costs will be important. Managing these costs and the labor and non-labor resources associated with them over time will require strong and consistent leadership and governance to address important questions such as:

Can improvements in operational efficiencies and standardization counteract cost increases related to additional data files being added to the program?
Can maintenance and support costs be reduced through outsourcing?
Can business processes associated with ongoing data extraction and transformation be centralized or standardized?
Does it make sense to associate ongoing open data program costs with costs incurred by other operating departments?

An example of the last item is the possible trade-off between shifting data from delivery in response to manually processed Freedom of Information requests to more open access through the open data program. This was discussed in Does Replacing Freedom of Information Request Handling with Open Data Based Self Service Reduce Operating Costs? Whether such a shift might result in real offsets is a question answered by analyzing real cost data. (An assumption is that mechanisms actually exist for tracking such costs on a consistent basis and that cost, in fact, is an important variable in planning and program management.)

Conclusion

Regardless of how program costs are treated from an accounting perspective, the need for resource tracking will be strong given the “matrix organization” approach that some open data programs employ. Staff who support multiple programs including the open data program will need regular communication with program management in order to maintain the program going forward.
Maintaining the efficiency of such a distributed operation poses special challenges, not the least of which is cost control. A related challenge will be maintaining proficiency and efficiency of how open data program related tasks are performed when work is distributed across individuals who participate only infrequently.

Related reading:

Scoping Out the ‘Total Cost of Standardization’ in Federal Financial Reporting
The Limitations of Government Program Financial Transparency
Recommendations for Collaborative Management of Government Data Standardization Projects
Recouping “Big Data” Investment in One Year Mandates Serious Project Management
Data Cleanup, Big Data, Standards, and Program Transparency

Search This Blog

Jason Hare's Data Management Blog

How Cost Impacts Open Data Program Planning - and Vice Versa

Comments

Post a Comment

Popular posts from this blog

Podcast: Open Data Discussions with Anthony Fung

WHITE HOUSE OPEN DATA INNOVATION SUMMIT - WHAT I SAID, WHAT I MEANT TO SAY

AN OPEN DATA POLICY FOR NORTH CAROLINA: COMING SOON?