# `dump_tables` Extract tables to JSON, CSV, or Markdown ## Usage > pdftl `` `dump_tables` `[csv|markdown]` `[...]` `[output` `]` ## Details The `dump_tables` operation extracts tabular data from a PDF file and outputs it as structured JSON. It uses the `tablers` library for table detection and extraction. Tables are identified by their line/rectangle borders (lattice-style detection). **Note:** This operation works only with native text-based PDFs. Scanned PDFs or PDFs where tables are rendered as images will not yield results. ### Filtering * `min_rows=N` — exclude tables with fewer than N rows (e.g. `min_rows=2`) * `min_cols=N` — exclude tables with fewer than N columns * `min_area=N` — exclude tables whose bounding box area is less than N square points * `no_empty` — exclude tables where every cell is empty ### Output Schema The output JSON contains a `tables` list. Each entry corresponds to a detected table and includes: * **page**: The 1-indexed page number containing the table. * **table_index**: The 0-indexed position of this table among all tables on that page. * **bbox**: Bounding box of the table `[x1, y1, x2, y2]` in PDF points. * **rows**: Number of rows detected. * **cols**: Number of columns detected. * **data**: A list of rows, each a list of cell objects with: * **text**: The cell's text content, or `null` for merged continuation slots. * **merged_left**: `true` if this slot continues a cell from the left. * **merged_top**: `true` if this slot continues a cell from above. ### Output Formats By default, output is JSON. Pass `csv` to output each table as CSV blocks separated by a `---` delimiter line. Pass `markdown` to output tables in Markdown format. ### Dependency note Table extraction requires the `tablers` library. Install it with: pip install pdftl[dump-tables] or directly: pip install tablers ## Examples > Print tables from in.pdf as JSON to stdout ``` pdftl in.pdf dump_tables ``` > Save table data from in.pdf to tables.json ``` pdftl in.pdf dump_tables output tables.json ``` > Save tables from in.pdf as CSV ``` pdftl in.pdf dump_tables csv output tables.csv ``` > Print tables from in.pdf as Markdown ``` pdftl in.pdf dump_tables markdown ``` > Extract tables from pages 1, 3, 4, and 5 ``` pdftl in.pdf dump_tables 1 3-5 ``` > Skip likely-spurious tables ``` pdftl in.pdf dump_tables min_rows=2 min_cols=2 no_empty ``` **Tags**: info, tables, text *Source: pdftl.operations.dump_tables* *Read online: [https://pdftl.readthedocs.io/en/latest/operations/dump_tables.html](https://pdftl.readthedocs.io/en/latest/operations/dump_tables.html)* *Type: Operation*