`dump_fonts`

Extract font metadata

Usage

pdftl <input> dump_fonts [<spec>...] [output <output>]

Details

The dump_fonts operation extracts comprehensive structural and layout metadata about both embedded and un-embedded fonts defined across the document’s page resources.

Outputs a normalized JSON object grouping fonts by their internal object IDs, including:

name: Raw PostScript name of the font exactly as it appears in the PDF (including subset prefix)
base_font: Cleaned PostScript name of the font (e.g., Helvetica-Bold)
subtype: The layout design specification style (e.g., TrueType, Type0, Type1, Type3)
is_embedded: Boolean indicating if the binary font asset stream exists inside the PDF
font_bytes: Actual compressed payload size of the embedded stream in bytes (0 if un-embedded)
is_subset: True if the font has been structurally subsetted to reduce file size
encoding: Character mapping sequence used (e.g., WinAnsiEncoding, Identity-H, Standard)
has_to_unicode: True if a /ToUnicode translation CMap exists (crucial for reliable text extraction)
traits: Decoded stylistic metadata dictionary extracted from the font’s descriptor bitmask
metrics: Extracted typography metrics (like ascent, descent, and italic angle), only including keys natively present in the PDF descriptor.
obj_id: PDF indirect object reference index number
usages: A dictionary mapping the local resource name (e.g., “F1”) to an array of pages where it appears.

You can optionally provide page specifications to limit inspection to specific pages.

Examples

Print font metadata for in.pdf to console

pdftl in.pdf dump_fonts

Save font metadata for in.pdf to a file

pdftl in.pdf dump_fonts output fonts.json

Print font metadata for pages 1, 2, 3, and 4

pdftl in.pdf dump_fonts 1 2-4

Tags: info, metadata, fonts

Source: pdftl.operations.dump_fonts

Read online: https://pdftl.readthedocs.io/en/latest/operations/dump_fonts.html

Type: Operation

dump_fonts

Usage

Details

Examples

`dump_fonts`