Take Charge of PDF in GNU Emacs
Display popup annotation in docview |
All or None
Docview is the default major-mode for displaying PDF in Emacs. It converts each page into an image using ghostscript into a temporary folder (/tmp/docview<nnnn>) and displays the same. For large PDF document, this conversion is unnecessary in most cases - especially when one is looking at the document for quick reference.
An ideal solution would be to convert pages on demand. However, for this, one would need to know the number of pages in the document. This can be done either by some external tools like pdfinfo or one would need to parse the metadata in the document. So here's some elisp code for the same.
(require 'pdf)
(pdf-get "Root.Pages.Count")
or
M-x pdf-total-pages
;; Metadata about the document
(pdf-get "Root.Metadata")
So here we are - now that one can parse the metadata, what else can we do?
Form fields and Embedded scripts
M-x pdf-form
PDF supports embedded Javascript for adding simple interactivity to PDF forms. You can list the scripts embedded in the document. Now you can take an informed decision before clicking the warning dialog for running script.
There's many a slip...
Were you ever caught off-guard while uploading a filled PDF form on a website and seeing incomplete form upon verification in-browser? Don't blame yourself - it's not your fault.
PDF specification has many versions, interpretation and implementation. Then there are linearized (optimized for web) and flattened (best for printing but no interactivity) versions.
Initially, a cross-reference table was included at the end of the
PDF for easy random access to any object in the document. With advent
of web, this table was moved to the beginning of the file to enable
faster rendering on websites. These are called Linearized PDF.
However, when a PDF form is saved, new objects are added at the end of
the file. This invalidates the xref table which needs to be regenerated. This process of regenerating the cross-reference table (or streams in later versions) is called Flattening.
Hence, all PDF clients are not the same and their behavior differ. In such cases, your best bet is to use the suggested software throughout i.e. both parties use the same software for creation or verification. If you must verify with a different software and the form looks incorrect, try to verify the flattened version.
Annotations
Use following examples to extract annotations from the document. The number of Kids subscript will differ based on document structure. If the annotation has rich content, capable viewer can enhance the display rather than displaying plain text.
;; Annotation text
(pdf-get "Root.Pages.Kids[0].Kids[0].Kids[0].Annots[1].Contents")
;; Annotation Rich content XHTML
(pdf-get "Root.Pages.Kids[0].Kids[0].Kids[0].Annots[1].RC")
;; Popup object associated with Text annotation
;; Should have /Open true to show up
(pdf-get "Root.Pages.Kids[0].Kids[0].Kids[0].Annots[1].Popup")
A text annotation represents a “sticky note” attached to a point in the PDF docu-
ment. When closed, the annotation appears as an icon; when open, it displays a
pop-up window containing the text of the note in a font and size chosen by the
viewer application.
ment. When closed, the annotation appears as an icon; when open, it displays a
pop-up window containing the text of the note in a font and size chosen by the
viewer application.
Use a to toggle annotation and <tab>/<backtab> to circle through annotations.
Note: If annotations are missing, ensure you're using GS for conversion and not mutool.
(setq doc-view-pdf->png-converter-function #'doc-view-pdf->png-converter-ghostscript)
Incremental update
Objects (and thereby their appearances) in a PDF document can be altered by adding an updated definition of the object at the end of the file. When you fill up a PDF form, that's how the values are saved in the document. e.g. below is a sample of PDF objects added while saving a text field with value "form".
11 0 obj
<< /Type /Annot /Subtype /Widget /Parent 7 0 R /AP << /N 28 0 R>> /Rect [89 799 238 810] /F 4 /Border [0 0 0.72] /BS 16 0 R /MK << /BC [0.4 0.4 0.4]>> /V (form) /M (D:20221118060522)>>
endobj
<< /Type /Annot /Subtype /Widget /Parent 7 0 R /AP << /N 28 0 R>> /Rect [89 799 238 810] /F 4 /Border [0 0 0.72] /BS 16 0 R /MK << /BC [0.4 0.4 0.4]>> /V (form) /M (D:20221118060522)>>
endobj
28 0 obj
<< /Length 84 /Subtype /Form /Resources << /Font << /Helv 18 0 R>>>> /BBox [0 0 149 11]>> stream
/Tx BMC q 0.72 w 0.4 G 0 0 149 11 re S BT /Helv 6.66 Tf 0 g 1 0 0 1 0 0 Tm 2 3.38 Td (form) Tj ET Q EMC
endstream
endobj
Warning: If the PDF form has been updated and saved multiple times, chances are that it has complete history of edits in the document.
Redundant Processes
In docview, sometimes conversion from PDF to PNG fail. Those processes are not killed automatically. You'll notice the conversion process number in mode-line. You can use K to kill those hanging processes.
Code
References
Comments
Post a Comment