PDF to Excel

URL of PDF:
Straight to excel?
testing links: For programmers:
  • Blog post intro
  • Please don't hit this install with automated scrapers yet. If you want to do that, put it on your own site
  • This is also suitable for glueing into a crowdsource thing.
  • Source code
  • json of the raw parse is dumped into all the excel comment fields where there's stuff in the text - you can just ignore the actual content
  • if you're careful, you can, possibly, try getting with a full size bounding box for page 0 and it might work (for some PDFs), which means you can automate this in 2 get requests. We'll work on this simple. The code you get back is the MD5 hex checksum of the pdf you ask it for.
  • It requires the pdftohtml to be in /usr/local/bin (the one with the -xml option - pdf2html wont do), and the following perl modules:
  • If you like this, you may want to sign up for notifications from the LinkedGov project run by friends .

    .