Skip to main content
Waters

Cannot extract text from a PDF file using a NuGenesis SDMS extraction template - WKB247710

Article number: 247710

SYMPTOMS

  • SDMS cannot extract text from archived PDF files
  • May affect some or all of the text in the PDF

ENVIRONMENT

  • NuGenesis 9 SDMS
  • PDF file was produced with the SyncFusion library

CAUSE

The SyncFusion library generated the PDF in such a way that the SDMS text extractor cannot pull text from the file.

FIX or WORKAROUND

  1. Enhancement request CRI-5233 was filed for this issue.
  2. Use a different code library, or a PDF virtual printer, to generate the PDF.
  3. If the SyncFusion library must be used, update to the latest version, or at least v20.3.

ADDITIONAL INFORMATION

SyncFusion stores all PDF content into "form XObjects" in the PDF.  The PDF text extractor in SDMS is unable to extract text from form XObjects.

Not able to find a solution? Click here to request help.