Python API Guide
pdftl offers a robust Python API that allows you to integrate PDF manipulation capabilities directly into your applications. Unlike the CLI, which works with strings and file paths, the API is designed to work with Python objects and structured data.
Two Ways to Play
There are two primary ways to interact with pdftl: the Fluent Interface (recommended for pipelines) and the Functional Interface (best for single operations).
1. The Fluent Interface (Recommended)
The Fluent API uses method chaining to build readable, multi-step processing pipelines. It automatically manages the passing of PDF objects between steps.
from pdftl import pipeline
# 1. Open a PDF
# 2. Rotate all pages right 90 degrees
# 3. Crop pages 1-5 to a 10pt margin
# 4. Save the result
(
pipeline("input.pdf")
.rotate("right")
.crop("1-5(10pt)")
.save("output.pdf")
)
You can also work with existing pikepdf objects:
import pikepdf
from pdftl import pipeline
pdf = pikepdf.open("input.pdf")
# Apply transformations and save to a new file
# Note: The pipeline allows chaining, effectively handing the modified
# PDF object to the next step.
pipeline(pdf).add_text("1/Watermark/").save("watermarked.pdf")
Best Practices: Explicit Arguments
While pdftl allows flexible argument passing, it is best practice to use Keyword Arguments for complex operations like concatenating files. This makes your pipelines robust and unambiguous.
For operations that accept multiple inputs (like cat), use the inputs keyword. For simpler operations (like stamp), positional arguments are standard.
# Unambiguous and robust
(
pipeline("chapter1.pdf")
.cat(inputs=["chapter2.pdf", "chapter3.pdf"])
.stamp("watermark.pdf")
.save("full_book.pdf")
)
2. The Functional Interface
If you need to perform a single, specific action—especially one that returns data (like dump_data or dump_annots)—the functional interface is often simpler.
Functions are available directly under the pdftl namespace.
import pdftl
# Dump metadata
info = pdftl.dump_data(inputs=["report.pdf"])
print(f"Page Count: {info.pages}")
print(f"Metadata: {info.doc_info}")
Return Values
By default, the API returns the result of the operation directly (unwrapping the internal OpResult container):
Modification commands (like
rotate,crop) return the modifiedpikepdf.Pdfobject.Extraction commands (like
dump_data,dump_annots) return the extracted data (dictionaries, lists, strings).
If an operation fails, it raises an exception (e.g., pdftl.exceptions.OperationError).
Advanced: Full Result Objects
If you need access to the execution summary or want to check success flags explicitly without exceptions, you can request the full result object using full_result=True.
Attribute |
Type |
Description |
|---|---|---|
|
|
|
|
|
Structured data (dict, list, string) returned by info/dump commands. |
|
|
The processed PDF object (for manipulation commands). |
|
|
A human-readable summary of what happened. |
Example: Handling Return Values
# Returns a list of dictionaries directly
annotations = pdftl.dump_annots("input.pdf")
for annot in annotations:
print(f"Found annotation on page {annot['Page']}: {annot['Properties'].get('Subtype', 'Unknown')}")
Advanced Features
Simulating CLI Behavior (Hooks)
Sometimes you want the side effects of the CLI (like printing a formatted report to stdout or writing a file to disk) without writing the logic yourself. You can force this using the run_cli_hook=True argument.
# This will write the formatted FDF file to disk,
# just like running 'pdftl generate_fdf output form.fdf'
pdftl.generate_fdf(
inputs=["form.pdf"],
output="data.fdf",
run_cli_hook=True
)
Mixing Inputs
The API is smart about inputs. You can pass file paths, open pikepdf.Pdf objects, or a mix of both.
import pikepdf
import pdftl
cover_page = pikepdf.open("cover.pdf")
# Merge an open PDF object with a file on disk
# Using 'inputs' keyword argument is recommended for clarity
pdftl.cat(
inputs=[cover_page, "chapter1.pdf"],
output="book.pdf"
)