dump_streams

Dump page content streams as seen by replace

Usage

pdftl <input> dump_streams [normalize=false] [recurse=false] [resources=true] [annotate] [<page_spec>...] [output <output>]

Details

The dump_streams operation outputs page content streams in the same form that the replace operation operates on: by default normalized (one PDF operator per line), with Form XObjects recursively included.

This is the primary tool for crafting a regular expression to pass to replace. Instead of reaching for mutool show or an external PDF inspector, run dump_streams to see exactly the text that replace will match against.

Options

  • normalize=false — output the raw, un-normalized stream bytes as stored in the PDF, instead of the normalized form. Annotation is suppressed when normalization is disabled.

  • recurse=false — restrict output to top-level page content streams only, skipping Form XObjects. Mirrors the same flag on replace.

  • resources=true — pretty-print the associated structural dictionary mapping for each Page and Form XObject. Very helpful to inspect Font and Form maps.

  • annotate — append a PDF-style % comment to each operator line explaining what the operator does (e.g. % show/text: Show text). Particularly useful when learning the PDF content stream format or hunting for the right operator to target with replace.

Output format

Each content stream is preceded by a labelled header block:

================
=== Page <N>
================

For Form XObjects:

============================================
=== Page <N> / XObject <name> (<obj>:<gen>)
============================================

When an XObject is shared across multiple pages, a warning appears in the header identifying the other pages that reference it.

Stream content follows as decoded text (latin-1). Annotation comments, when requested, use standard PDF % comment syntax so the output remains valid PDF content stream text.

Page specification

Standard page specs are supported (e.g. 1, 2-4, 1 3-5). Default is all pages.

Relationship to replace

dump_streams intentionally mirrors replace’s behavior:

Behavior

replace

dump_streams

Normalizes page streams

yes

yes (default)

Normalizes XObject streams

yes

yes (default)

Recurses into Form XObjects

yes (default)

yes (default)

Examples

Print normalized content streams for all pages to stdout

pdftl in.pdf dump_streams

Dump page content streams along with their pretty-printed resource blocks

pdftl in.pdf dump_streams resources=true

Dump normalized content streams for pages 1-3 to a file

pdftl in.pdf dump_streams 1-3 output streams.txt

Dump streams with operator annotations to help write a replace spec

pdftl in.pdf dump_streams annotate

Dump the raw (un-normalized) content stream for page 1

pdftl in.pdf dump_streams normalize=false 1

Dump only top-level page content streams, skipping Form XObjects

pdftl in.pdf dump_streams recurse=false

Tags: info, content_stream, replace

Source: pdftl.operations.dump_streams

Read online: https://pdftl.readthedocs.io/en/latest/operations/dump_streams.html

Type: Operation