Has anyone used OCR software to batch extract invoice data during due diligence?
We’re mid-way through diligence on a deal and uncovered that the seller’s homegrown portal stores ~20,000 invoices as PDFs but with no built-in reporting. We may need to extract key fields like invoice amount, vendor, and date to support the QoE analysis.
Has anyone gone through something similar and used OCR tools (e.g., Amazon Textract) to automate this? My initial thought is that we can just pull a representative sample, but trying to assess feasibility if it ends up being needed.
Thanks in advance!