#  `dump_tables`

Extract tables to JSON, CSV, or Markdown
## Usage
> pdftl `<input>` `dump_tables` `[csv|markdown]` `[<page_spec>...]` `[output` `<output>]`

## Details
The `dump_tables` operation extracts tabular data from a PDF file and
outputs it as structured JSON.

It uses the `tablers` library for table detection and extraction. Tables
are identified by their line/rectangle borders (lattice-style detection).

**Note:** This operation works only with native text-based PDFs. Scanned
PDFs or PDFs where tables are rendered as images will not yield results.

### Filtering

* `min_rows=N` — exclude tables with fewer than N rows (e.g. `min_rows=2`)
* `min_cols=N` — exclude tables with fewer than N columns
* `min_area=N` — exclude tables whose bounding box area is less than N square points
* `no_empty` — exclude tables where every cell is empty

### Output Schema

The output JSON contains a `tables` list. Each entry corresponds to a
detected table and includes:

* **page**: The 1-indexed page number containing the table.
* **table_index**: The 0-indexed position of this table among all tables
  on that page.
* **bbox**: Bounding box of the table `[x1, y1, x2, y2]` in PDF points.
* **rows**: Number of rows detected.
* **cols**: Number of columns detected.
* **data**: A list of rows, each a list of cell objects with:
    * **text**: The cell's text content, or `null` for merged continuation
      slots.
    * **merged_left**: `true` if this slot continues a cell from the left.
    * **merged_top**: `true` if this slot continues a cell from above.

### Output Formats

By default, output is JSON. Pass `csv` to output each table as CSV blocks
separated by a `---` delimiter line. Pass `markdown` to output tables in
Markdown format.

### Dependency note

Table extraction requires the `tablers` library. Install it with:

    pip install pdftl[dump-tables]

or directly:

    pip install tablers
## Examples

> Print tables from in.pdf as JSON to stdout
```
pdftl in.pdf dump_tables
```

> Save table data from in.pdf to tables.json
```
pdftl in.pdf dump_tables output tables.json
```

> Save tables from in.pdf as CSV
```
pdftl in.pdf dump_tables csv output tables.csv
```

> Print tables from in.pdf as Markdown
```
pdftl in.pdf dump_tables markdown
```

> Extract tables from pages 1, 3, 4, and 5
```
pdftl in.pdf dump_tables 1 3-5
```

> Skip likely-spurious tables
```
pdftl in.pdf dump_tables min_rows=2 min_cols=2 no_empty
```


**Tags**: info, tables, text

*Source: pdftl.operations.dump_tables*

*Read online: [https://pdftl.readthedocs.io/en/latest/operations/dump_tables.html](https://pdftl.readthedocs.io/en/latest/operations/dump_tables.html)*

*Type: Operation*