Optically scanning PDFs into Open Data



Optically scanning PDF data and turning it into machine readable format data would be quite an accomplishment. This is happening right around the corner from the City of Raleigh using minutes from our council meetings. I would like to see what the CSV files look like.


This could be a large impact on open data in reusing PDFs and perhaps other proprietary formatted data types. I am only now starting to imagine all of the potential uses for something like this.


Turning scanned documents into structured data at reporterslab.org


A new open-source program under development at Raleigh Public Record aims to pull structured data from scanned-in public records. And ahead of its release at the 2013 Computer-Assisted Reporting Conference, developers are...

See the full article here.

Comments

Popular posts from this blog

Podcast: Open Data Discussions with Anthony Fung

Open Data Licensing

AN OPEN DATA POLICY FOR NORTH CAROLINA: COMING SOON?