dump_data
Metadata, page and bookmark info (XML-escaped or JSON)
Usage
pdftl
<input>dump_data[json][output<output>]
Details
Extracts document-level metadata and structural information from the input PDF and prints it to the console (or a specified file).
This operation is the primary way to export data for
inspection or for later use by the update_info
operation. By default, all string values in the output are
processed with XML-style escaping (e.g., < becomes
<).
Alternatively, passing the json parameter will produce a
structured JSON output, which is often easier for other
programs to parse.
Output Format Details (Stanza Format)
The default output is a plain text, line-based, key-value
format. It consists of both simple top-level fields and
multi-line “stanzas”. A stanza is a block of related data that
begins with a line like InfoBegin or BookmarkBegin.
The data from this command is consumed by update_info.
Top-Level Fields
These fields appear as simple Key: Value lines.
PdfID0: <hex_string>The first part of the PDF’s unique file identifier.
Updatable by
update_info.
PdfID1: <hex_string>The second part of the PDF’s unique file identifier.
Not updatable by
update_info.
NumberOfPages: <integer>The total number of pages in the document.
Stanzas
These are multi-line blocks, each describing a single record.
These can all be updated by update_info.
1. Info Stanza (Document Metadata)
Each metadata entry (e.g., Title, Author) gets its own stanza.
InfoBeginInfoKey: <key_name>Standard keys includeTitle,Author,Subject,Keywords,Creator,Producer,CreationDate,ModDate.InfoValue: <value_string>
2. Bookmark Stanza
Represents a single bookmark (outline) item.
BookmarkBeginBookmarkTitle: <title_string>BookmarkLevel: <integer>(1 is top level)BookmarkPageNumber: <integer>The 1-indexed target page number.
3. PageMedia Stanza (Page-level Boxes)
Describes geometry boxes for a page. Coordinates are in PDF points,
space-separated (e.g., 0 0 595 842).
PageMediaBeginPageMediaNumber: <integer>The 1-indexed page number.PageMediaRotation: <0|90|180|270>PageMediaRect: <x1> <y1> <x2> <y2>(MediaBox) Always present.PageMediaDimensions: <width> <height>PageMediaCropRect: <x1> <y1> <x2> <y2>Omitted if identical toPageMediaRect.PageMediaBleedRect: <x1> <y1> <x2> <y2>Omitted if identical toPageMediaCropRect(orPageMediaRectif no crop).PageMediaTrimRect: <x1> <y1> <x2> <y2>Omitted if identical toPageMediaCropRect(orPageMediaRectif no crop).
4. PageLabel Stanza (Logical Page Numbers)
Defines a page numbering style range.
PageLabelBeginPageLabelNewIndex: <integer>The 1-indexed physical starting page for this numbering.PageLabelStart: <integer>The starting number for this labelling (e.g., 1).PageLabelPrefix: <string>String to prepend to page label (e.g.,A-).PageLabelNumStyle: <Style>Standard styles:DecimalArabic,UppercaseRoman,LowercaseRoman,UppercaseLetters,LowercaseLetters,NoNumber.
Examples
Print XML-escaped metadata for in.pdf
pdftl in.pdf dump_data
Save XML-escaped metadata for in.pdf to data.txt
pdftl in.pdf dump_data output data.txt
Print metadata for in.pdf in JSON format
pdftl in.pdf dump_data json
Tags: info, metadata
Source: pdftl.operations.dump_data
Read online: https://pdftl.readthedocs.io/en/stable/operations/dump_data.html
Type: Operation