dump_images

Extract PDF embedded image metadata to JSON

Usage

pdftl <input> dump_images [<spec>...] [output <output>]

Details

The dump_images operation extracts metadata about embedded images in a PDF file.

It traverses the PDF’s content streams (including nested Form XObjects) to correctly calculate the absolute bounding boxes of all drawn images using the Current Transformation Matrix (CTM).

Outputs a JSON object containing page-level image metadata, including:

  • name: Internal PDF resource name

  • obj_id: PDF object number (shared across pages if the same image is reused)

  • bbox: Absolute bounding box coordinates [x_min, y_min, x_max, y_max] in PDF points

  • width_px: Native image width in pixels

  • height_px: Native image height in pixels

  • ppi_x: Horizontal resolution in pixels per inch, derived from bbox and pixel dimensions

  • ppi_y: Vertical resolution in pixels per inch, derived from bbox and pixel dimensions

  • colorspace: Colorspace family, e.g. /DeviceRGB, /DeviceCMYK, /ICCBased

  • bits: Bit depth per component

  • stream_bytes: Compressed stream size in bytes as stored in the PDF

  • format: Compression filter, e.g. flatedecode (PNG-style), dctdecode (JPEG)

Note: If the same image object is drawn multiple times (e.g. as a tiling pattern), it will appear once per placement with its own bbox and ppi values. The obj_id field can be used to identify duplicate placements of the same underlying stream.

You can optionally provide page specifications to limit extraction to specific pages.

Examples

Print image metadata for in.pdf to console

pdftl in.pdf dump_images

Save image metadata for in.pdf to a file

pdftl in.pdf dump_images output imagesa.json

Save image metadata for in.pdf to a file and save a copy of in.pdf

pdftl in.pdf dump_images output images.json --- output copy.pdf

Print image metadata for pages 1, 3, 4, and 5

pdftl in.pdf dump_images 1 3-5

Tags: info, metadata, images

Source: pdftl.operations.dump_images

Read online: https://pdftl.readthedocs.io/en/stable/operations/dump_images.html

Type: Operation