Hi all, 

I am looking to automate text extraction from a PDF document (close to over 2000) pages. I am thinking it'd be better if I convert it into a structured document for automated parsing. 
Is there a tried and tested tool/way to convert PDF to XML/JSON? 

Regards,