pdftl

PyPI CI codecov Documentation Status PyPI - Python Version Static Badge

pdftl (β€œPDF tackle”) is a CLI tool for PDF manipulation written in Python. It is intended to be a command-line compatible extension of the venerable pdftk.

Leveraging the power of pikepdf (qpdf) and other modern libraries, it offers advanced capabilities like cropping, chopping, regex text replacement, adding text and arbitrary content stream injection.

Quick start

pipx install pdftl[full]

# merge, crop to letter paper, rotate last page and output with encryption with one command
pdftl A=a.pdf B=b.pdf cat A1-5 B2-end \
    --- crop '4-8,12(letter)' \
    --- rotate endright \
    output out.pdf owner_pw foo user_pw bar encrypt_aes256

Key features and pdftk compatibility

  • Familiar syntax: Command-line compatible with pdftk. Verified against Mike Haertl’s php-pdftk test suite and the pdftk-java test suite logic, so s/pdftk/pdftl/ should result in working scripts.

  • Pipelining: Chain multiple operations in a single command using ---.

  • Performant: pdftl seems faster than pdftk-java for many operations (based on informal benchmarks). Reason: pdftl mostly drives pikepdf which drives qpdf, a fast C++ library.

  • Extra/enhanced operations and features such as zooming pages, smart merging preserving links and outlines, cropping/chopping up pages, text extraction, optimizing images.

  • Modern security: Supports AES-256 encryption and modern permission flags out of the box.

  • Content editing: Find & replace text via regular expressions, inject raw PDF operators, or overlay dynamic text.

pdftl maintains command-line compatibility with pdftk while introducing features required for modern PDF workflows.

Feature

pdftk (Legacy)

pdftl (Modern)

Pipelining

❌ (Requires temp files)

βœ… Native (Chain ops with ---)

Encryption

⚠️ (Obsolete RC4)

βœ… AES-256 Support

Syntax

Standard

βœ… Compatible Extension

Page Geometry

❌

βœ… Crop to fit, Zoom, & Chop

Pipelined Logic

❌

βœ… Rotate + Stamp in one command

Plugins

❌

βœ… Custom operations/mutation scripts written in Python

Installation

Often complex binary

βœ… Simple pipx install pdftl

Performance

Variable

βœ… Powered by pikepdf/qpdf

Link Integrity

⚠️ Often breaks TOC/Links

βœ… Preserves internal cross-refs

Shell Completion

⚠️ zsh

βœ… bash, zsh and powershell

Help

⚠️ Basic (manpage)

βœ… Self-documenting: pdftl help <operation/option/topic/tag>

Installation

Install pipx, and then:

pipx install pdftl[full]

A simple pip install pdftl[full] install is also supported.

Note: The [full] install includes ocrmypdf for image optimization, reportlab for text generation, pypdfium2 for text extraction and robust flattening, and pyHanko for cryptographic signature functionality. Omit [full] to omit those features and dependencies.

Key features

πŸ“„ Standard operations

βœ‚οΈ Geometry & splitting

  • Whole-page geometry: rotate pages (absolute or relative) or zoom pages

  • Clip and Crop: crop pages to margins or standard paper sizes (e.g., β€œA4”), or keep pages unchanged and clip to hide content outside a given region.

  • Chop: chop pages into grids or rows (e.g., split a scanned spread into two pages).

  • Shift, scale and spin page content inside the page boundaries using place.

  • Montage: montage multiple pages onto a grid layout for contact sheets and N-up handouts.

  • Booklet: create a print-ready booklet with optional RTL support and signature splitting.

πŸ“ Forms & annotations

πŸ” Security

πŸ› οΈ Advanced

  • Text replacement: replace text in content streams using regular expressions (experimental).

  • Code injection: inject raw PDF operators at the head/tail of content streams.

  • Images: optimize_images (smart compression via OCRmyPDF), delete_images, dump_images or render PDF to images.

  • Dynamic text: add_text supports Bates stamping and can add page numbers, filenames, timestamps, etc.

  • Cleanup: normalize content streams, linearize for web viewing.

  • Layers (aka OCGs): dump_layers) and modify_layers: list, strip or merge PDF layers.

  • Plugins: write your own custom operation in Python, save to ~/.config/pdftl/operations (*nix) or %APPDATA%\pdftl\config (Windows) and you can use it in pdftl, just like the built-in operations. And you can mutate_content using simple Python scripts.

Examples

For more than 100 other examples: pdftl help examples.

Concatenation

# Merge two files
pdftl in1.pdf in2.pdf cat output combined.pdf

# Now with in2.pdf zoomed in
pdftl A=in1.pdf B=in2.pdf cat A Bz1 output combined2.pdf

Geometry

# Take pages 1-5, rotate them 90 degrees East, and crop to A4
pdftl in.pdf cat 1-5east --- crop "(a4)" output out.pdf

Pipelining

You can chain operations without intermediate files using ---:

# Burst a file, but rotate and stamp every page first
pdftl in.pdf rotate south \
  --- stamp watermark.pdf \
  --- burst output page_%04d.pdf

Forms and metadata

# Fill a form and flatten it (make it non-editable)
pdftl form.pdf fill_form data.fdf flatten output signed.pdf

Modify annotations

# Change all Highlight annotations on odd pages to Red
pdftl docs.pdf modify_annots "odd/Highlight(C=[1 0 0])" output red_notes.pdf

Modify content

# Add a watermark, the pdftk way
pdftl in.pdf stamp watermark.pdf output marked1.pdf
# Add an obnoxious semi-transparent red watermark on odd pages only
pdftl in.pdf add_text 'odd/YOUR AD HERE/(position=mid-center, font=Helvetica-Bold, size=72, rotate=45, color=1 0 0 0.5)' output with_ads.pdf
# Add Bates numbering starting at 000121
# Result: DEF-000121, DEF-000122, ...
pdftl in.pdf \
  add_text "/DEF-{page+120:06d}/(position=bottom-center, offset-y=10)" \
  output bates.pdf
# Content stream replacement with regular expressions (YMMV)
# Change black to red
pdftl in.pdf replace '/0 0 0 (RG|rg)/1 0 0 \1/' output redder.pdf

Python API

While pdftl is primarily a CLI tool, it also exposes a robust Python API for integrating PDF workflows into your scripts. It supports both a Functional interface (similar to the CLI) and a Fluent interface (for method chaining).

from pdftl import pipeline

# Chain operations fluently without saving intermediate files
(
    pipeline("input.pdf")
    .rotate("right")
    .stamp("watermark.pdf")
    .save("output.pdf")
)

See the API Tutorial for more details.

Operations and options

Operation

Description

add_bookmarks

Add top-level bookmarks

add_text

Add user-specified text strings to PDF pages

attach_files

Attach files to the output PDF

background

Use a 1-page PDF as the background for each page

booklet

Impose pages into printable booklet signatures

burst

Split a single PDF into multiple files

cat

Concatenate pages from input PDFs into a new PDF

chop

Chop pages into multiple smaller pieces

clip

Clip page content to a rectangle

crop

Crop pages to a rectangle

delete

Delete pages from an input PDF

delete_annots

Delete annotation info

delete_attachments

Delete file attachments based on criteria

delete_bookmarks

Delete bookmarks

delete_blank

Delete blank or near-blank pages

delete_images

Delete images

dump_annots

Dump annotation info

dump_bookmarks

Extract PDF bookmarks into YAML or JSON

dump_colorspaces

Report color spaces used

dump_data

Metadata, page and bookmark info (XML-escaped)

dump_data_annots

Dump annotation info in pdftk style

dump_data_fields

Print PDF form field data with XML-style escaping

dump_data_fields_utf8

Print PDF form field data in UTF-8

dump_data_utf8

Metadata, page and bookmark info (in UTF-8)

dump_dests

Print PDF named destinations data to the console

dump_encryption

Print PDF encryption details and permissions

dump_files

List file attachments

dump_images

Extract PDF embedded image metadata to JSON

dump_layers

Dump layer info (JSON)

dump_signatures

List and validate digital signatures

dump_text

Print PDF text data to the console or a file

fill_form

Fill a PDF form

filter

Do nothing (the default if <operation> is absent)

generate_fdf

Generate an FDF file containing PDF form data

highlight

Highlight text matching a regex pattern

inject

Inject code at start or end of page content streams

insert

Insert blank pages

modify_annots

Modify properties of existing annotations

modify_layers

Merge or strip specific layers

montage

Impose pages onto a grid layout

move

Move pages to a new location

multibackground

Use multiple pages as backgrounds

multistamp

Stamp multiple pages onto an input PDF

mutate_content

Mutate page content streams using a user-supplied Python script

normalize

Reformat page content streams

optimize_images

Optimize images

place

Shift, scale, and spin page content

replace

Regex replacement on page content streams

render

Render PDF pages as images

rotate

Rotate pages in a PDF

set

Set document properties, viewer preferences, and page labels

shuffle

Interleave pages from multiple input PDFs

stamp

Stamp a 1-page PDF onto each page of an input PDF

unpack_files

Unpack file attachments

unpause

Remove β€˜pause’ frames from a slide deck

update_bookmarks

Replace PDF bookmarks from a YAML or JSON file

update_info

Update PDF metadata from dump_data instructions

update_info_utf8

Update PDF metadata from dump_data_utf8 instructions

zoom

Rescale entire pages

Option

Description

allow <perm>

Specify permissions for encrypted files

compress

Compress output file streams (default)

drop_info

Discard document-level info metadata

drop_xfa

Discard form XFA data if present

drop_xmp

Discard document-level XMP metadata

encrypt_128bit

Use 128 bit encryption (obsolete, maybe insecure)

encrypt_40bit

Use 40 bit encryption (obsolete, highly insecure)

encrypt_aes128

Use 128 bit AES encryption (maybe obsolete)

encrypt_aes256

Use 256 bit AES encryption

fast

Skip stream recompression for faster saving

flatten

Flatten all annotations

keep_final_id

Copy final input PDF’s ID metadata to output

keep_first_id

Copy first input PDF’s ID metadata to output

linearize

Linearize output file(s)

no_encrypt_metadata

Leave metadata unencrypted

need_appearances

Set a form rendering flag in the output PDF

output <file>

The output file path, or a template for burst

owner_pw <pw>

Set owner password and encrypt output

replacement_font <file>

Replace the font used for all form fields with a TTF file

sign_cert <file>

Path to certificate PEM

sign_field <name>

Signature field name (default: Signature1)

sign_key <file>

Path to private key PEM

sign_pass_env <var>

Environment variable with sign_cert passphrase

sign_pass_prompt

Prompt for sign_cert passphrase

uncompress

Disable compression of output file streams

user_pw <pw>

Set user password and encrypt output

verbose

Turn on verbose output