Cannot extract text from a PDF file using a NuGenesis SDMS extraction template - WKB247710
Article number: 247710
SYMPTOMS
- SDMS cannot extract text from archived PDF files
- May affect some or all of the text in the PDF
ENVIRONMENT
- NuGenesis 9 SDMS
- PDF file was produced with the SyncFusion library
CAUSE
The SyncFusion library generated the PDF in such a way that the SDMS text extractor cannot pull text from the file.
FIX or WORKAROUND
- Enhancement request CRI-5233 was filed for this issue.
- Use a different code library, or a PDF virtual printer, to generate the PDF.
- If the SyncFusion library must be used, update to the latest version, or at least v20.3.
ADDITIONAL INFORMATION
SyncFusion stores all PDF content into "form XObjects" in the PDF. The PDF text extractor in SDMS is unable to extract text from form XObjects.
