Convert PDF to a structured document (XML/JSON)

Hi all, I am looking to automate text extraction from a PDF document (close to over 2000) pages. I am thinking it'd be better if I convert it into a structured document for automated parsing. Is there a tried and tested tool/way to convert PDF to XML/JSON? Regards,

Have you tried https://sourceforge.net/projects/pdf2xml/ On Tue, Dec 9, 2014 at 2:12 PM, Allan O. via skunkworks < skunkworks@lists.my.co.ke> wrote:
Hi all,
I am looking to automate text extraction from a PDF document (close to over 2000) pages. I am thinking it'd be better if I convert it into a structured document for automated parsing. Is there a tried and tested tool/way to convert PDF to XML/JSON?
Regards,
_______________________________________________ skunkworks mailing list skunkworks@lists.my.co.ke ------------ List info, subscribe/unsubscribe http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks ------------
Skunkworks Rules http://my.co.ke/phpbb/viewtopic.php?f=24&t=94 ------------ Other services @ http://my.co.ke
participants (2)
-
Allan O.
-
Bwana Lawi