dump_fonts
Extract font metadata
Usage
pdftl
<input>dump_fonts[<spec>...][output<output>]
Details
The dump_fonts operation extracts comprehensive structural and layout metadata
about both embedded and un-embedded fonts defined across the document’s page resources.
Outputs a normalized JSON object grouping fonts by their internal object IDs, including:
name: Raw PostScript name of the font exactly as it appears in the PDF (including subset prefix)
base_font: Cleaned PostScript name of the font (e.g., Helvetica-Bold)
subtype: The layout design specification style (e.g., TrueType, Type0, Type1, Type3)
is_embedded: Boolean indicating if the binary font asset stream exists inside the PDF
font_bytes: Actual compressed payload size of the embedded stream in bytes (0 if un-embedded)
is_subset: True if the font has been structurally subsetted to reduce file size
encoding: Character mapping sequence used (e.g., WinAnsiEncoding, Identity-H, Standard)
has_to_unicode: True if a /ToUnicode translation CMap exists (crucial for reliable text extraction)
traits: Decoded stylistic metadata dictionary extracted from the font’s descriptor bitmask
metrics: Extracted typography metrics (like ascent, descent, and italic angle), only including keys natively present in the PDF descriptor.
obj_id: PDF indirect object reference index number
usages: A dictionary mapping the local resource name (e.g., “F1”) to an array of pages where it appears.
You can optionally provide page specifications to limit inspection to specific pages.
Examples
Print font metadata for in.pdf to console
pdftl in.pdf dump_fonts
Save font metadata for in.pdf to a file
pdftl in.pdf dump_fonts output fonts.json
Print font metadata for pages 1, 2, 3, and 4
pdftl in.pdf dump_fonts 1 2-4
Tags: info, metadata, fonts
Source: pdftl.operations.dump_fonts
Read online: https://pdftl.readthedocs.io/en/latest/operations/dump_fonts.html
Type: Operation