(qpdf_json)=

# qpdf JSON

In addition to reading and writing PDF files, qpdf can represent an entire PDF as
JSON, and reconstruct a PDF from that JSON. pikepdf exposes this *qpdf JSON*
format through a small set of methods on {class}`pikepdf.Pdf`. It is a niche
feature -- most users never need it -- but it is valuable when you want to
inspect, diff, or transform a PDF with tools that understand JSON rather than
PDF's binary object syntax.

:::{important}
This is qpdf's own JSON representation of a *whole PDF file*, not a general PDF
interchange standard. The format is specific to qpdf and is documented in full in
the qpdf manual under [qpdf JSON](https://qpdf.readthedocs.io/en/stable/json.html).
pikepdf reads and writes **version 2** of the format (`qpdf --json-output` /
`--json-input`), introduced in qpdf 11.

Do not confuse it with the *QPDFJob JSON format*, which describes an *operation*
to perform (see {ref}`jobs`), nor with {meth}`pikepdf.Object.to_json`, which
serializes a *single object*. qpdf JSON serializes the complete document --
every object, the trailer, and optionally the stream data.
:::

## Writing qpdf JSON

{meth}`pikepdf.Pdf.write_qpdf_json` serializes the open PDF to a filename or
writable binary stream:

```python
import pikepdf

with pikepdf.open('input.pdf') as pdf:
    pdf.write_qpdf_json('input.json')
```

The result is a JSON document whose top level describes the qpdf and PDF
versions, the trailer, and a dictionary of every object in the file keyed by its
object/generation number. This is the same output as `qpdf --json-output
input.pdf input.json`.

### Controlling stream data

Stream objects (image data, content streams, fonts, and so on) hold arbitrary
binary data that does not fit naturally into JSON. {class}`pikepdf.JSONStreamData`
controls how that data is represented:

- {attr}`~pikepdf.JSONStreamData.inline` (the default) embeds each stream's data
  directly in the JSON as a base64 string. The JSON is self-contained but can be
  large.
- {attr}`~pikepdf.JSONStreamData.file` writes each stream to a separate sidecar
  file named `{file_prefix}-{object_number}`, keeping the JSON itself small and
  text-friendly.
- {attr}`~pikepdf.JSONStreamData.none` omits stream data entirely -- useful when
  you only care about the document's structure.

```python
from pikepdf import JSONStreamData

with pikepdf.open('input.pdf') as pdf:
    # Structure only, no binary blobs
    pdf.write_qpdf_json('structure.json', json_stream_data=JSONStreamData.none)

    # Streams written alongside as input-1, input-2, ...
    pdf.write_qpdf_json(
        'input.json',
        json_stream_data=JSONStreamData.file,
        file_prefix='input',
    )
```

The `decode_level` argument controls how much qpdf uncompresses stream data
before encoding it. The default, {attr}`pikepdf.StreamDecodeLevel.generalized`,
makes streams more readable; pass {attr}`pikepdf.StreamDecodeLevel.none` to
preserve the stored bytes exactly.

## Reading qpdf JSON

{meth}`pikepdf.Pdf.from_qpdf_json` builds a brand-new {class}`pikepdf.Pdf` from a
complete qpdf JSON document:

```python
pdf = pikepdf.Pdf.from_qpdf_json('input.json')
pdf.save('roundtrip.pdf')
```

The JSON must be a complete representation of a PDF. If you have edited only part
of a document's JSON and want to apply those changes back onto an existing PDF,
use {meth}`pikepdf.Pdf.update_from_qpdf_json` instead. It overlays the objects
present in the JSON onto the open `Pdf`, leaving objects that the JSON does not
mention unchanged:

```python
with pikepdf.open('input.pdf') as pdf:
    pdf.update_from_qpdf_json('patch.json')
    pdf.save('patched.pdf')
```

This update workflow is the intended way to make targeted edits through JSON:
export the PDF (or part of it), modify the JSON with any tooling you like, and
merge the changes back.

## When to use it

qpdf JSON is worth reaching for when you want to:

- **Inspect** a PDF's structure with `jq`, a text editor, or any JSON-aware tool.
- **Diff** two PDFs meaningfully -- comparing JSON object trees is far more
  legible than comparing binary files.
- **Edit** a document with a pipeline that manipulates JSON, then round-trips
  back to PDF via `update_from_qpdf_json`.
- **Interoperate** with other software that already speaks qpdf's JSON format.

For most programmatic editing, working directly with the {class}`pikepdf.Pdf`
object model is more convenient and avoids a serialization round trip. Reach for
qpdf JSON when the JSON representation itself is the thing you want to work with.
For the precise schema, see the qpdf manual's
[qpdf JSON](https://qpdf.readthedocs.io/en/stable/json.html) chapter.