`dump_images`

Extract PDF embedded image metadata to JSON

Usage

pdftl <input> dump_images [<spec>...] [output <output>]

Details

The dump_images operation extracts metadata about embedded images in a PDF file.

It traverses the PDF’s content streams (including nested Form XObjects) to correctly calculate the absolute bounding boxes of all drawn images using the Current Transformation Matrix (CTM).

Outputs a JSON object containing page-level image metadata, including:

name: Internal PDF resource name
obj_id: PDF object number (shared across pages if the same image is reused)
bbox: Absolute bounding box coordinates [x_min, y_min, x_max, y_max] in PDF points
width_px: Native image width in pixels
height_px: Native image height in pixels
ppi_x: Horizontal resolution in pixels per inch, derived from bbox and pixel dimensions
ppi_y: Vertical resolution in pixels per inch, derived from bbox and pixel dimensions
colorspace: Resolved color space descriptor — includes family, ICC profile details, colorant names for spot colors, and alternate space where applicable
bits: Bit depth per component
stream_bytes: Compressed stream size in bytes as stored in the PDF
format: Compression filter, e.g. flatedecode (PNG-style), dctdecode (JPEG)

Note: If the same image object is drawn multiple times (e.g. as a tiling pattern), it will appear once per placement with its own bbox and ppi values. The obj_id field can be used to identify duplicate placements of the same underlying stream.

You can optionally provide page specifications to limit extraction to specific pages. You can also filter by resolution by providing min_dpi=<n> or max_dpi=<n> as arguments.

Examples

Print image metadata for in.pdf to console

pdftl in.pdf dump_images

Save image metadata for in.pdf to a file

pdftl in.pdf dump_images output images.json

Save image metadata for in.pdf to a file and save a copy of in.pdf

pdftl in.pdf dump_images output images.json --- output copy.pdf

Print image metadata for pages 1, 3, 4, and 5

pdftl in.pdf dump_images 1 3-5

List only images with a resolution exceeding 150 DPI.

pdftl in.pdf dump_images max_dpi=150

Tags: info, metadata, images

Source: pdftl.operations.dump_images

Read online: https://pdftl.readthedocs.io/en/stable/operations/dump_images.html

Type: Operation

dump_images

Usage

Details

Examples

`dump_images`