[Slackbuilds-users] Copying PDF-1.7 text using -14.2

Richard Ellis rellis at dp100.com
Mon Sep 14 17:15:18 UTC 2020


On Mon, Sep 14, 2020 at 09:57:13AM -0700, Rich Shepard wrote:
>On Mon, 14 Sep 2020, Alexander Verbovetsky wrote:
>>Maybe there is no text inside, just picture?  PDF is a container, not 
>>a format.
>
>Scientific journal articles are primarily text with occasional plots 
>or other images.  And I believe that PDF stand for "Portable Document 
>Format."

The Adobe name does expand to those words, but the word "Document" in 
the name bears no resemblence to how the data inside the PDF document 
produces a visual output.

The internal structure of a PDF is basicacally best described as 
"electronic paper" than as a "document".  PDF, internally, is simply a 
way of specifying how to physically position "visual things" on a 
virtual sheet of paper, and whether you have any luck with extracting 
"text" later depends very much upon how the creating software generated 
the internal PDF data.



More information about the SlackBuilds-users mailing list