Have you tried https://sourceforge.net/projects/pdf2xml/

On Tue, Dec 9, 2014 at 2:12 PM, Allan O. via skunkworks <skunkworks@lists.my.co.ke> wrote:
Hi all, 

I am looking to automate text extraction from a PDF document (close to over 2000) pages. I am thinking it'd be better if I convert it into a structured document for automated parsing. 
Is there a tried and tested tool/way to convert PDF to XML/JSON? 

Regards, 
 

_______________________________________________
skunkworks mailing list
skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke