dump_fonts

Extract font metadata

Usage

pdftl <input> dump_fonts [<spec>...] [output <output>]

Details

The dump_fonts operation extracts comprehensive structural and layout metadata about both embedded and un-embedded fonts defined across the document’s page resources.

Outputs a normalized JSON object grouping fonts by their internal object IDs, including:

  • name: Raw PostScript name of the font exactly as it appears in the PDF (including subset prefix)

  • base_font: Cleaned PostScript name of the font (e.g., Helvetica-Bold)

  • subtype: The layout design specification style (e.g., TrueType, Type0, Type1, Type3)

  • is_embedded: Boolean indicating if the binary font asset stream exists inside the PDF

  • font_bytes: Actual compressed payload size of the embedded stream in bytes (0 if un-embedded)

  • is_subset: True if the font has been structurally subsetted to reduce file size

  • encoding: Character mapping sequence used (e.g., WinAnsiEncoding, Identity-H, Standard)

  • has_to_unicode: True if a /ToUnicode translation CMap exists (crucial for reliable text extraction)

  • traits: Decoded stylistic metadata dictionary extracted from the font’s descriptor bitmask

  • metrics: Extracted typography metrics (like ascent, descent, and italic angle), only including keys natively present in the PDF descriptor.

  • obj_id: PDF indirect object reference index number

  • usages: A dictionary mapping the local resource name (e.g., “F1”) to an array of pages where it appears.

You can optionally provide page specifications to limit inspection to specific pages.

Examples

Print font metadata for in.pdf to console

pdftl in.pdf dump_fonts

Save font metadata for in.pdf to a file

pdftl in.pdf dump_fonts output fonts.json

Print font metadata for pages 1, 2, 3, and 4

pdftl in.pdf dump_fonts 1 2-4

Tags: info, metadata, fonts

Source: pdftl.operations.dump_fonts

Read online: https://pdftl.readthedocs.io/en/latest/operations/dump_fonts.html

Type: Operation