dump_streams
Dump page content streams as seen by replace
Usage
pdftl
<input>dump_streams[normalize=false][recurse=false][resources=true][annotate][<page_spec>...][output<output>]
Details
The dump_streams operation outputs page content streams in the same form
that the replace operation operates on: by default normalized (one PDF
operator per line), with Form XObjects recursively included.
This is the primary tool for crafting a regular expression to pass to
replace. Instead of reaching for mutool show or an external PDF
inspector, run dump_streams to see exactly the text that replace will
match against.
Options
normalize=false— output the raw, un-normalized stream bytes as stored in the PDF, instead of the normalized form. Annotation is suppressed when normalization is disabled.recurse=false— restrict output to top-level page content streams only, skipping Form XObjects. Mirrors the same flag onreplace.resources=true— pretty-print the associated structural dictionary mapping for each Page and Form XObject. Very helpful to inspect Font and Form maps.annotate— append a PDF-style%comment to each operator line explaining what the operator does (e.g.% show/text: Show text). Particularly useful when learning the PDF content stream format or hunting for the right operator to target withreplace.
Output format
Each content stream is preceded by a labelled header block:
================
=== Page <N>
================
For Form XObjects:
============================================
=== Page <N> / XObject <name> (<obj>:<gen>)
============================================
When an XObject is shared across multiple pages, a warning appears in the header identifying the other pages that reference it.
Stream content follows as decoded text (latin-1). Annotation comments,
when requested, use standard PDF % comment syntax so the output
remains valid PDF content stream text.
Page specification
Standard page specs are supported (e.g. 1, 2-4, 1 3-5).
Default is all pages.
Relationship to replace
dump_streams intentionally mirrors replace’s behavior:
Behavior |
|
|
|---|---|---|
Normalizes page streams |
yes |
yes (default) |
Normalizes XObject streams |
yes |
yes (default) |
Recurses into Form XObjects |
yes (default) |
yes (default) |
Examples
Print normalized content streams for all pages to stdout
pdftl in.pdf dump_streams
Dump page content streams along with their pretty-printed resource blocks
pdftl in.pdf dump_streams resources=true
Dump normalized content streams for pages 1-3 to a file
pdftl in.pdf dump_streams 1-3 output streams.txt
Dump streams with operator annotations to help write a replace spec
pdftl in.pdf dump_streams annotate
Dump the raw (un-normalized) content stream for page 1
pdftl in.pdf dump_streams normalize=false 1
Dump only top-level page content streams, skipping Form XObjects
pdftl in.pdf dump_streams recurse=false
Tags: info, content_stream, replace
Source: pdftl.operations.dump_streams
Read online: https://pdftl.readthedocs.io/en/latest/operations/dump_streams.html
Type: Operation