qpdf JSON

In addition to reading and writing PDF files, qpdf can represent an entire PDF as JSON, and reconstruct a PDF from that JSON. pikepdf exposes this qpdf JSON format through a small set of methods on pikepdf.Pdf. It is a niche feature – most users never need it – but it is valuable when you want to inspect, diff, or transform a PDF with tools that understand JSON rather than PDF’s binary object syntax.

Important

This is qpdf’s own JSON representation of a whole PDF file, not a general PDF interchange standard. The format is specific to qpdf and is documented in full in the qpdf manual under qpdf JSON. pikepdf reads and writes version 2 of the format (qpdf --json-output / --json-input), introduced in qpdf 11.

Do not confuse it with the QPDFJob JSON format, which describes an operation to perform (see Batch operations with JobBuilder), nor with pikepdf.Object.to_json(), which serializes a single object. qpdf JSON serializes the complete document – every object, the trailer, and optionally the stream data.

Writing qpdf JSON

pikepdf.Pdf.write_qpdf_json() serializes the open PDF to a filename or writable binary stream:

import pikepdf

with pikepdf.open('input.pdf') as pdf:
    pdf.write_qpdf_json('input.json')

The result is a JSON document whose top level describes the qpdf and PDF versions, the trailer, and a dictionary of every object in the file keyed by its object/generation number. This is the same output as qpdf --json-output input.pdf input.json.

Controlling stream data

Stream objects (image data, content streams, fonts, and so on) hold arbitrary binary data that does not fit naturally into JSON. pikepdf.JSONStreamData controls how that data is represented:

  • inline (the default) embeds each stream’s data directly in the JSON as a base64 string. The JSON is self-contained but can be large.

  • file writes each stream to a separate sidecar file named {file_prefix}-{object_number}, keeping the JSON itself small and text-friendly.

  • none omits stream data entirely – useful when you only care about the document’s structure.

from pikepdf import JSONStreamData

with pikepdf.open('input.pdf') as pdf:
    # Structure only, no binary blobs
    pdf.write_qpdf_json('structure.json', json_stream_data=JSONStreamData.none)

    # Streams written alongside as input-1, input-2, ...
    pdf.write_qpdf_json(
        'input.json',
        json_stream_data=JSONStreamData.file,
        file_prefix='input',
    )

The decode_level argument controls how much qpdf uncompresses stream data before encoding it. The default, pikepdf.StreamDecodeLevel.generalized, makes streams more readable; pass pikepdf.StreamDecodeLevel.none to preserve the stored bytes exactly.

Reading qpdf JSON

pikepdf.Pdf.from_qpdf_json() builds a brand-new pikepdf.Pdf from a complete qpdf JSON document:

pdf = pikepdf.Pdf.from_qpdf_json('input.json')
pdf.save('roundtrip.pdf')

The JSON must be a complete representation of a PDF. If you have edited only part of a document’s JSON and want to apply those changes back onto an existing PDF, use pikepdf.Pdf.update_from_qpdf_json() instead. It overlays the objects present in the JSON onto the open Pdf, leaving objects that the JSON does not mention unchanged:

with pikepdf.open('input.pdf') as pdf:
    pdf.update_from_qpdf_json('patch.json')
    pdf.save('patched.pdf')

This update workflow is the intended way to make targeted edits through JSON: export the PDF (or part of it), modify the JSON with any tooling you like, and merge the changes back.

When to use it

qpdf JSON is worth reaching for when you want to:

  • Inspect a PDF’s structure with jq, a text editor, or any JSON-aware tool.

  • Diff two PDFs meaningfully – comparing JSON object trees is far more legible than comparing binary files.

  • Edit a document with a pipeline that manipulates JSON, then round-trips back to PDF via update_from_qpdf_json.

  • Interoperate with other software that already speaks qpdf’s JSON format.

For most programmatic editing, working directly with the pikepdf.Pdf object model is more convenient and avoids a serialization round trip. Reach for qpdf JSON when the JSON representation itself is the thing you want to work with. For the precise schema, see the qpdf manual’s qpdf JSON chapter.