OPEN DATA PORTALS SHOULD BE API [FIRST]

- December 12, 2014

As you can see from the figure, September and November are not far below October. Some readers may wonder why there is a surge in API calls starting in May of 2014. May through October was spent building open source service architectures on Red Hat JBOSS Switch Yard that could mine and automatically append data sets within the Open Raleigh Portal.

Open Raleigh uses a responsive web design that is friendly to most handheld devices but the API needs a little help to push data into the portal. The portal itself releases every data set as an API endpoint. This API is a READ only API. Writing some code we can have the Socrata portal allow us to append data sets.

Socrata is not alone in the Web/Mobile [First] category. ESRI, CKAN and to some extent, Junar are architected on the same principals. This is not a direct criticism or endorsement of any particular platform.

THE CONSEQUENCES OF GETTING IT WRONG

Discussing multi-nodal approaches and espousing an API [First] strategy may seem esoteric until one looks at anecdotal issues with some recent portal launches.

Minneapolis recently launched its open data portal to scathing reviews (http://blogs.citypages.com/blotter/2014/12/minneapolis_new_open_data_portal_is_a_disaster.php).

Most of the criticisms were centered on the performance of the site, latency, non-responsive design and crashing web pages. Note that these were all complaints from citizens trying to use the portal through different browsers. The city blamed ESRI but latency and poorly designed pages that do not validate are not inherent in the platform. ESRI is not an API [First] product. The city said it should have gone with Socrata. Given the city’s ability to manage a roll out it seems clear that the lack of a multi-nodal standards based approach was a significant but not a single-cause reason for the beta failure.

CKAN also recent announced the launch of a new comprehensive open data portal on the Ebola crisis through a tweet. Here is the URL:

https://data.hdx.rwlabs.org/ebola

This is not responsive design. When I look at it on my mobile I see a tiny version of the full site and not anyway to meaningfully consume or re-use the data unless I switch to a larger interface.

Now let’s think of the consequences of that:

Who is the data for? If it is for field workers this is a huge fail. The most common field device is tablets and mobile. Not having some kind of app consuming an API would be an obstacle toward data re-use.

How do I consume and re-use the data? Following the rabbit hole of links I can get to geo-data about the crises data. Most of the catalog lists are CSV data sets along side PDFs giving the context of the data. This is good in that I have metadata. Bad in that I cannot query an obvious API point to merge this sites data with other data for my own analysis.

CONCLUSIONS

This is only the tip of the iceberg. Aside from the technical issues around not using an API [First] strategy we have policy issues around PII (In the case of Minneapolis) and UX issues in the CKAN instance.

So, I conclude the following by comparing human and API consumed data:

Data consumed by humans have lower re-use value in that they are not being redistributed

Data that is served on a Web/Mobile [First] platform needs more work to reuse data than platforms that are API [First]

Original article: https://www.linkedin.com/pulse/open-data-portals-should-api-jason-hare?trk=mp-reader-card

Search This Blog

Jason Hare's Data Management Blog

OPEN DATA PORTALS SHOULD BE API [FIRST]

Comments

Post a Comment

Popular posts from this blog

Podcast: Open Data Discussions with Anthony Fung

Open Data Licensing

AN OPEN DATA POLICY FOR NORTH CAROLINA: COMING SOON?