qpdf JSON
In addition to reading and writing PDF files, qpdf can represent an entire PDF as
JSON, and reconstruct a PDF from that JSON. pikepdf exposes this qpdf JSON
format through a small set of methods on pikepdf.Pdf. It is a niche
feature – most users never need it – but it is valuable when you want to
inspect, diff, or transform a PDF with tools that understand JSON rather than
PDF’s binary object syntax.
Important
This is qpdf’s own JSON representation of a whole PDF file, not a general PDF
interchange standard. The format is specific to qpdf and is documented in full in
the qpdf manual under qpdf JSON.
pikepdf reads and writes version 2 of the format (qpdf --json-output /
--json-input), introduced in qpdf 11.
Do not confuse it with the QPDFJob JSON format, which describes an operation
to perform (see Batch operations with JobBuilder), nor with pikepdf.Object.to_json(), which
serializes a single object. qpdf JSON serializes the complete document –
every object, the trailer, and optionally the stream data.
Writing qpdf JSON
pikepdf.Pdf.write_qpdf_json() serializes the open PDF to a filename or
writable binary stream:
import pikepdf
with pikepdf.open('input.pdf') as pdf:
pdf.write_qpdf_json('input.json')
The result is a JSON document whose top level describes the qpdf and PDF
versions, the trailer, and a dictionary of every object in the file keyed by its
object/generation number. This is the same output as qpdf --json-output input.pdf input.json.
Controlling stream data
Stream objects (image data, content streams, fonts, and so on) hold arbitrary
binary data that does not fit naturally into JSON. pikepdf.JSONStreamData
controls how that data is represented:
inline(the default) embeds each stream’s data directly in the JSON as a base64 string. The JSON is self-contained but can be large.filewrites each stream to a separate sidecar file named{file_prefix}-{object_number}, keeping the JSON itself small and text-friendly.noneomits stream data entirely – useful when you only care about the document’s structure.
from pikepdf import JSONStreamData
with pikepdf.open('input.pdf') as pdf:
# Structure only, no binary blobs
pdf.write_qpdf_json('structure.json', json_stream_data=JSONStreamData.none)
# Streams written alongside as input-1, input-2, ...
pdf.write_qpdf_json(
'input.json',
json_stream_data=JSONStreamData.file,
file_prefix='input',
)
The decode_level argument controls how much qpdf uncompresses stream data
before encoding it. The default, pikepdf.StreamDecodeLevel.generalized,
makes streams more readable; pass pikepdf.StreamDecodeLevel.none to
preserve the stored bytes exactly.
Reading qpdf JSON
pikepdf.Pdf.from_qpdf_json() builds a brand-new pikepdf.Pdf from a
complete qpdf JSON document:
pdf = pikepdf.Pdf.from_qpdf_json('input.json')
pdf.save('roundtrip.pdf')
The JSON must be a complete representation of a PDF. If you have edited only part
of a document’s JSON and want to apply those changes back onto an existing PDF,
use pikepdf.Pdf.update_from_qpdf_json() instead. It overlays the objects
present in the JSON onto the open Pdf, leaving objects that the JSON does not
mention unchanged:
with pikepdf.open('input.pdf') as pdf:
pdf.update_from_qpdf_json('patch.json')
pdf.save('patched.pdf')
This update workflow is the intended way to make targeted edits through JSON: export the PDF (or part of it), modify the JSON with any tooling you like, and merge the changes back.
When to use it
qpdf JSON is worth reaching for when you want to:
Inspect a PDF’s structure with
jq, a text editor, or any JSON-aware tool.Diff two PDFs meaningfully – comparing JSON object trees is far more legible than comparing binary files.
Edit a document with a pipeline that manipulates JSON, then round-trips back to PDF via
update_from_qpdf_json.Interoperate with other software that already speaks qpdf’s JSON format.
For most programmatic editing, working directly with the pikepdf.Pdf
object model is more convenient and avoids a serialization round trip. Reach for
qpdf JSON when the JSON representation itself is the thing you want to work with.
For the precise schema, see the qpdf manual’s
qpdf JSON chapter.