pdftl

pdftl (“PDF tackle”) is a command-line interface for PDF manipulation written in Python. It provides a pdftk-compatible interface for standard operations while adding extended capabilities for text extraction, image modification, and document analysis.

The core of pdftl relies on pikepdf (a Python binding for qpdf).

For online documentation: Read the Docs.

Installation

pdftl requires Python 3.10 or later and runs on Windows, macOS and Linux.

Because pdftl is a command-line tool, the recommended installation method is via pipx, which installs the application into an isolated environment so its dependencies don’t conflict with your system Python.

To install the core package (covers standard pdftk functionality):

pipx install pdftl

Many of the extended features require additional Python dependencies. To install the tool with all optional features enabled:

pipx install "pdftl[full]"

Some features also require system software such as java.

Alternative installation methods

You can also use standard pip if you prefer to manage your own virtual environments.
If you only need specific features you can specify e.g., pipx install "pdftl[signing,optimize-images]". Features are listed under [project.optional-dependencies] in pyproject.toml.

Feature overview

Standard operations

Combine and organize: create, cat, shuffle, insert, and move.
Split: burst, delete, or delete_blank.
Metadata: dump_data, update_info, set.
Attachments: attach_files, unpack_files, dump_files, delete_attachments.
Bookmarks and links: dump_bookmarks, add_bookmarks, delete_bookmarks, update_bookmarks, and dump_dests.
Watermarking: stamp / background, multistamp / multibackground.

Geometry and splitting

Geometry: rotate or zoom.
Clip and crop: crop to margins or standard paper sizes, or clip content outside a given region.
Chop: chop pages into grids or rows.
Layout: Shift, scale, and spin content with place, or automatically deskew.
Montage: montage multiple pages onto a grid layout.
Booklet: Create a printable booklet.

Text, forms and annotations

Search and comparison: grep and diff_text.
Extraction: dump_text, dump_tables, and dump_fonts.
Forms: fill_form, generate_fdf, dump_data_fields and stamp_fields.
Annotations: modify_annots, delete_annots, dump_annots, dump_data_annots, and highlight.
Actions and scripts: dump_actions and delete_actions.
Accessibility and structure: tag for auto-tagging, and dump_tags to inspect the structure tree.

Security

Decryption: input_pw.
Encryption: owner_pw, user_pw, encrypt_aes256, and allow. Inspect with dump_encryption.
Signatures: sign_key, sign_cert, and dump_signatures.

Images and Vectors

Information: dump_images and dump_colorspaces.
Conversion: render pages as images.
Optimization: optimize_images, delete_images, resample_images.
Editing: recolor_images, modify_images, add_images, or export_images, edit externally and then import_images.
Vectors: simplify_vectors and recolor_vectors.
Barcodes: Generate and place on pages with barcode.

Advanced

Content streams: replace parts of content streams, inject PDF operators, or inspect with dump_streams, edit and import with import_streams.
Fonts: export_fonts, edit as needed and then import_fonts. And embed_fonts.
Dynamic text: add_text for Bates stamping, page numbers, etc., and style_text to change appearance.
Cleanup: normalize and linearize.
Layers (OCGs): dump_layers and modify_layers.
Presentations: Remove slide transition frames with unpause.
Plugins: Write custom operations in Python or use mutate_content.

Examples

For additional examples, run pdftl help examples.

Concatenation

# Merge two files
pdftl in1.pdf in2.pdf cat output combined.pdf

# Now with in2.pdf zoomed in
pdftl A=in1.pdf B=in2.pdf cat A Bz1 output combined2.pdf

Geometry

# Take pages 1-5, rotate them 90 degrees East, and crop to A4
pdftl in.pdf cat 1-5east --- crop A4 output out.pdf

Pipelining

You can chain operations without intermediate files using ---:

# Burst a file, but rotate and stamp every page first
pdftl in.pdf rotate south \
  --- stamp watermark.pdf \
  --- burst output page_%04d.pdf

# Merge, crop to letter paper, rotate the last page, and output with encryption
pdftl A=a.pdf B=b.pdf cat A1-5 B2-end \
  --- crop '4-8,12(letter)' \
  --- rotate endright \
  output out.pdf owner_pw foo user_pw bar encrypt_aes256

Forms and metadata

# Fill a form and flatten it (make it non-editable)
pdftl form.pdf fill_form data.fdf flatten output signed.pdf

Modify annotations

# Change all Highlight annotations on odd pages to Red
pdftl docs.pdf modify_annots "odd/Highlight(C=[1 0 0])" output red_notes.pdf

Modify content

# Add a watermark
pdftl in.pdf stamp watermark.pdf output marked1.pdf

# Add a semi-transparent red watermark on odd pages
pdftl in.pdf add_text 'odd/YOUR AD HERE/(position=mid-center, font=Helvetica-Bold, size=72, rotate=45, color=1 0 0 0.5)' output with_ads.pdf

# Add Bates numbering starting at 000121
# Result: DEF-000121, DEF-000122, ...
pdftl in.pdf \
  add_text "/DEF-{page+120:06d}/(position=bottom-center, offset-y=10)" \
  output bates.pdf

# Content stream replacement with regular expressions
# Change black to red
pdftl in.pdf replace '/0 0 0 (RG|rg)/1 0 0 \1/' output redder.pdf

Python API

While pdftl is primarily a CLI tool, it also exposes a Python API for integrating PDF workflows into your scripts. It supports both a Functional interface (similar to the CLI) and a Fluent interface (for method chaining).

from pdftl import pipeline

# Chain operations fluently without saving intermediate files
(
    pipeline("input.pdf")
    .rotate("right")
    .stamp("watermark.pdf")
    .save("output.pdf")
)

See the API Tutorial for more details.

A simple server interface to API is provided; try it here.

Operations and options

Operation	Description
`add_bookmarks`	Add top-level bookmarks
`add_images`	Stamp user-specified images onto PDF pages
`add_text`	Add user-specified text strings to PDF pages
`attach_files`	Attach files to the output PDF
`background`	Use a 1-page PDF as the background for each page
`barcode`	Generate and add a barcode to pages
`booklet`	Impose pages into printable booklet signatures
`burst`	Split a single PDF into multiple files
`cat`	Concatenate pages from input PDFs into a new PDF
`chop`	Chop pages into multiple smaller pieces
`clip`	Clip page content to a rectangle
`create`	Create a new PDF
`crop`	Crop pages to a rectangle
`delete`	Delete pages from an input PDF
`delete_actions`	Delete action info
`delete_annots`	Delete annotation info
`delete_attachments`	Delete file attachments based on criteria
`delete_blank`	Delete blank or near-blank pages
`delete_bookmarks`	Delete bookmarks
`delete_images`	Delete images
`deskew`	Automatically detect and correct document skew
`diff_text`	Diff the text content of two PDFs and output bounding boxes
`dump_actions`	Dump action info
`dump_annots`	Dump annotation info
`dump_bookmarks`	Extract PDF bookmarks into YAML or JSON
`dump_colorspaces`	Report color spaces used
`dump_data`	Metadata, page and bookmark info (XML-escaped)
`dump_data_annots`	Dump annotation info in pdftk style
`dump_data_fields`	Print PDF form field data with XML-style escaping
`dump_data_fields_utf8`	Print PDF form field data in UTF-8
`dump_data_utf8`	Metadata, page and bookmark info (in UTF-8)
`dump_dests`	Print PDF named destinations data to the console
`dump_encryption`	Print PDF encryption details and permissions
`dump_files`	List file attachments
`dump_fonts`	Extract font metadata
`dump_images`	Extract PDF embedded image metadata to JSON
`dump_layers`	Dump layer info (JSON)
`dump_signatures`	List and validate digital signatures
`dump_streams`	Dump page content streams as seen by `replace`
`dump_tables`	Extract tables to JSON, CSV, or Markdown
`dump_tags`	Inspect the PDF structure tree and reading order
`dump_text`	Print PDF text data to the console or a file
`embed_fonts`	Automatically locate and embed missing system fonts
`export_fonts`	Export fonts and a JSON manifest for external editing
`export_images`	Export images and a JSON manifest for external editing
`fill_form`	Fill a PDF form
`filter`	Do nothing (the default if `<operation>` is absent)
`generate_fdf`	Generate an FDF file containing PDF form data
`grep`	Match text patterns and get bounding boxes
`highlight`	Highlight text matching a regex pattern
`import_fonts`	Import edited fonts from a directory using a JSON manifest
`import_images`	Import edited images from a directory using a JSON manifest
`import_streams`	Import and apply modified content streams
`inject`	Inject code at start or end of page content streams
`insert`	Insert blank pages
`modify_annots`	Modify properties of existing annotations
`modify_images`	Apply in-place image pixel modifications and effects
`modify_layers`	Merge or strip specific layers
`montage`	Impose pages onto a grid layout
`move`	Move pages to a new location
`multibackground`	Use multiple pages as backgrounds
`multistamp`	Stamp multiple pages onto an input PDF
`mutate_content`	Mutate page content streams using a user-supplied Python script
`normalize`	Reformat page content streams
`optimize_images`	Optimize images
`place`	Shift, scale, and spin page content
`render`	Render PDF pages as images
`recolor_images`	Convert images to grayscale
`recolor_vectors`	Make non-image page content gray
`replace`	Regex replacement on page content streams
`resample_images`	Resample images
`rotate`	Rotate pages in a PDF
`server`	Start the pdftl API server
`set`	Set document properties, viewer preferences, and page labels
`shuffle`	Interleave pages from multiple input PDFs
`simplify_vectors`	Reduce vector path complexity
`stamp`	Stamp a 1-page PDF onto each page of an input PDF
`stamp_fields`	Stamp PDF content into form fields
`style_text`	Change appearance of text
`tag`	Auto-tag a PDF for accessibility using OpenDataLoader
`unpack_files`	Unpack file attachments
`unpause`	Remove ‘pause’ frames from a slide deck
`update_bookmarks`	Replace PDF bookmarks from a YAML or JSON file
`update_info`	Update PDF metadata from dump_data instructions
`update_info_utf8`	Update PDF metadata from dump_data_utf8 instructions
`zoom`	Rescale entire pages

Option	Description
`allow <perm>`	Specify permissions for encrypted files
`compress`	Compress output file streams (default)
`drop_info`	Discard document-level info metadata
`drop_xfa`	Discard form XFA data if present
`drop_xmp`	Discard document-level XMP metadata
`encrypt_128bit`	Use 128 bit encryption (obsolete, maybe insecure)
`encrypt_40bit`	Use 40 bit encryption (obsolete, highly insecure)
`encrypt_aes128`	Use 128 bit AES encryption (maybe obsolete)
`encrypt_aes256`	Use 256 bit AES encryption
`fast`	Skip stream recompression for faster saving
`flatten`	Flatten all annotations
`keep_final_id`	Copy final input PDF’s ID metadata to output
`keep_first_id`	Copy first input PDF’s ID metadata to output
`linearize`	Linearize output file(s)
`no_encrypt_metadata`	Leave metadata unencrypted
`need_appearances`	Set a form rendering flag in the output PDF
`output <file>`	The output file path, or a template for burst
`owner_pw <pw>`	Set owner password and encrypt output
`replacement_font <file>`	Replace the font used for all form fields with a TTF file
`sign_cert <file>`	Path to certificate PEM
`sign_field <name>`	Signature field name (default: Signature1)
`sign_key <file>`	Path to private key PEM
`sign_pass_env <var>`	Environment variable with sign_cert passphrase
`sign_pass_prompt`	Prompt for sign_cert passphrase
`uncompress`	Disable compression of output file streams
`user_pw <pw>`	Set user password and encrypt output
`verbose`	Turn on verbose output

Links

License: This project is licensed under the Mozilla Public License 2.0.
Changelog: CHANGELOG.md.
Online documentation: pdftl on Read the Docs.