`dump_streams`

Dump page content streams as seen by replace

Usage

pdftl <input> dump_streams [normalize=false] [recurse=false] [resources=true] [annotate] [<page_spec>...] [output <output>]

Details

The dump_streams operation outputs page content streams in the same form that the replace operation operates on: by default normalized (one PDF operator per line), with Form XObjects recursively included.

This is the primary tool for crafting a regular expression to pass to replace. Instead of reaching for mutool show or an external PDF inspector, run dump_streams to see exactly the text that replace will match against.

Options

normalize=false — output the raw, un-normalized stream bytes as stored in the PDF, instead of the normalized form. Annotation is suppressed when normalization is disabled.
recurse=false — restrict output to top-level page content streams only, skipping Form XObjects. Mirrors the same flag on replace.
resources=true — pretty-print the associated structural dictionary mapping for each Page and Form XObject. Very helpful to inspect Font and Form maps.
annotate — append a PDF-style % comment to each operator line explaining what the operator does (e.g. % show/text: Show text). Particularly useful when learning the PDF content stream format or hunting for the right operator to target with replace.

Output format

Each content stream is preceded by a labelled header block:

================
=== Page <N>
================

For Form XObjects:

============================================
=== Page <N> / XObject <name> (<obj>:<gen>)
============================================

When an XObject is shared across multiple pages, a warning appears in the header identifying the other pages that reference it.

Stream content follows as decoded text (latin-1). Annotation comments, when requested, use standard PDF % comment syntax so the output remains valid PDF content stream text.

Page specification

Standard page specs are supported (e.g. 1, 2-4, 1 3-5). Default is all pages.

Relationship to `replace`

dump_streams intentionally mirrors replace’s behavior:

Behavior	`replace`	`dump_streams`
Normalizes page streams	yes	yes (default)
Normalizes XObject streams	yes	yes (default)
Recurses into Form XObjects	yes (default)	yes (default)

Examples

Print normalized content streams for all pages to stdout

pdftl in.pdf dump_streams

Dump page content streams along with their pretty-printed resource blocks

pdftl in.pdf dump_streams resources=true

Dump normalized content streams for pages 1-3 to a file

pdftl in.pdf dump_streams 1-3 output streams.txt

Dump streams with operator annotations to help write a replace spec

pdftl in.pdf dump_streams annotate

Dump the raw (un-normalized) content stream for page 1

pdftl in.pdf dump_streams normalize=false 1

Dump only top-level page content streams, skipping Form XObjects

pdftl in.pdf dump_streams recurse=false

Tags: info, content_stream, replace

Source: pdftl.operations.dump_streams

Read online: https://pdftl.readthedocs.io/en/latest/operations/dump_streams.html

Type: Operation

dump_streams