v10

Breaking changes for v10.x

Dropped Python 3.9 compatibility, since it is end of life. Python 3.10 through 3.14 are supported.
Dropped macOS 13 support, since it is end of life.
Dropped macOS 14 Intel wheels, because GitHub doesn’t provide a way to build them - macOS 15 Intel works fine.
Dropped deprecated method Pdf.check() (use .check_pdf_syntax()).

pikepdf supports free-threaded (no-GIL) CPython. Starting with v10.8.0, pikepdf publishes free-threaded CPython 3.14 (cp314t) wheels to PyPI; before v10.8.0, free-threaded use required building from source. As always, coordinating concurrent modification of the same object across threads requires a lock – see the architecture notes on thread safety.

v10.10.0

Behavior change

pikepdf.PdfImage.as_pil_image() and pikepdf.PdfImage.extract_to() now apply an image’s soft mask (/SMask) or explicit/colour-key mask (/Mask) by default, returning an image with an alpha channel (LA or RGBA) and writing a transparency-capable format (.png). Previously the mask was silently ignored and the opaque base image was returned. Images without a mask are unaffected. Pass apply_mask=False to recover the old behavior and obtain only the opaque base image.

New features

Image masking is now honored when extracting images. /SMask soft masks, /Mask stencil (explicit) masks, and /Mask colour-key masks are composited into an alpha channel. Soft-mask images whose resolution differs from the base image are resampled to match (ISO 32000-2 §8.9.6).
Added support for /CalRGB, /CalGray and /CalCMYK images, which previously raised NotImplementedError. The samples are decoded as their device equivalents, and for CalRGB/CalGray an ICC profile synthesized from the colour space’s WhitePoint/Gamma/Matrix is attached to the extracted image so the calibration is preserved for colour-managed consumers.
Added support for the /Lab colour space, extracted as a Pillow LAB image (saved as TIFF) with the PDF L*/a*/b* ranges remapped to Pillow’s conventions.
Added support for 16-bit-per-component images. 16-bit grayscale is extracted losslessly as Pillow I;16; 16-bit RGB and CMYK are reduced to 8-bit (with a warning) because Pillow has no higher-bit-depth raw mode for them.
Extended colour-space handling to several cases that previously raised NotImplementedError: ICCBased CMYK images and indexed images now produce the correct default /Decode array, and inline images that name their colour space now resolve it from the in-scope /Resources when obtained via pikepdf.parse_content_stream().
JPEG (/DCTDecode) images with a non-default /ColorTransform – a YCCK CMYK or a non-YCbCr RGB JPEG – are now decoded via Pillow (which honours the JPEG’s own markers) and transcoded, instead of failing to extract.
Filter chains that wrap a single terminal image codec (/DCTDecode, /CCITTFaxDecode, /JPXDecode, /JBIG2Decode) in any number of generalized/specialized filters (Flate, LZW, ASCII85/Hex, RunLength) are now peeled and extracted seamlessly.
Added pikepdf.PdfImage.MAX_IMAGE_PIXELS, a settable class-level limit on the number of pixels pikepdf will decode from a single image. Until set, it defaults to max(500_000_000, PIL.Image.MAX_IMAGE_PIXELS) – a floor suited to high-DPI scanned PDFs – and tracks Pillow’s setting; once assigned it becomes independent of Pillow. Set it to None to disable the check.
pikepdf.Array now implements the standard Python list interface: slicing (including del on slices and slice assignment), and the clear(), count(), index(), insert(), pop(), remove(), and reverse() methods.

Limitations

/SMaskInData (alpha encoded inside a JPEG 2000 stream) is applied only when Pillow’s JPEG 2000 decoder surfaces the alpha channel itself; a pre-multiplied (SMaskInData 2) result is not un-premultiplied. A /Matte entry on a soft mask is not undone (a warning is emitted). When an image has both an /SMask and a /Mask, the soft mask takes precedence. Colour-key masking is applied only to 8-bit L/RGB/CMYK images.
A filter chain containing two or more terminal image codecs (for example [/DCTDecode /CCITTFaxDecode]) cannot be decoded by any reader and now raises UnsupportedImageTypeError rather than NotImplementedError.

Security

Hardened image extraction against decompression-bomb (memory exhaustion) attacks. A malicious PDF could declare an image with enormous /Width and /Height so that pikepdf.PdfImage.as_pil_image() attempted to allocate many gigabytes before reading the (tiny) image stream. pikepdf now enforces a configurable pixel limit, pikepdf.PdfImage.MAX_IMAGE_PIXELS (analogous to PIL.Image.MAX_IMAGE_PIXELS), across every image-decode path – including the 2/4-bit transcoding path, 1-bit and 8-bit images, and the Pillow-decoded JPEG/JPEG2000/CCITT path. Oversized images raise pikepdf.DecompressionBombError and borderline images emit pikepdf.DecompressionBombWarning (both subclass Pillow’s equivalents). (#733)

Fixed

A CCITT fax image preceded by a stripped simple filter (e.g. [/FlateDecode /CCITTFaxDecode]) now builds its TIFF header from the /CCITTFaxDecode filter’s own /DecodeParms rather than the leading filter’s, which previously produced a corrupt extraction.
Corrected the documentation of pikepdf.StreamDecodeLevel: the specialized and all levels were each described with the other’s behavior.
Fixed a crash (SIGABRT via std::terminate) that could occur when a file-backed pikepdf.Pdf was deallocated while a Python exception was already propagating – for example when pikepdf.open(filename) appears as a transient element of a list/tuple literal whose later element raises. Opening from a filename closes the file in the input source destructor, which calls back into Python; with an exception already in flight that call raised an error that escaped the destructor. The in-flight exception is now preserved and propagates normally. (#732) Added a guard for a likely non-reproducible related case with DecimalPrecision.

v10.9.0

New features

Added pikepdf.JobBuilder, a fluent, Pythonic builder for qpdf jobs. It assembles a job specification with chained, snake_case methods (input, output, encrypt, add_pages, split_pages, linearize, compress, add_attachment, add_overlay, limits, …) and runs it via the existing pikepdf.Job, without hand-writing qpdf’s camelCase job JSON. Encryption permissions are expressed with the familiar pikepdf.Permissions/pikepdf.Encryption models, and a .set(**kwargs) escape hatch reaches any other job option. Additional methods cover image optimization (optimize_images, externalize_inline_images), page/content transforms (flatten_annotations, flatten_rotation, generate_appearances, coalesce_contents, normalize_content), content removal (remove_metadata, remove_info, remove_acroform, remove_structure, remove_page_labels), page labels (set_page_labels), version control (min_version, force_version), and reproducible/inspection helpers (deterministic_id, static_id, check).
Exposed several pieces of qpdf functionality that pikepdf had not previously bound:
- Whole-document qpdf JSON: pikepdf.Pdf.write_qpdf_json(), pikepdf.Pdf.from_qpdf_json() and pikepdf.Pdf.update_from_qpdf_json() serialize and reconstruct an entire PDF as qpdf JSON (the qpdf --json-output/--json-input format, version 2). This complements the existing object-level pikepdf.Object.to_json(). Added pikepdf.JSONStreamData to control how stream data is represented.
- pikepdf.Pdf.get_xref_table() returns the cross-reference table as structured data (pikepdf.XrefEntry), complementing the print-only pikepdf.Pdf.show_xref_table().
- pikepdf.Pdf.fix_dangling_references() repairs references to objects that are not present in the file.
- pikepdf.Page.flatten_rotation() bakes a page’s /Rotate value into its content stream.
- pikepdf.Page.copy_annotations() copies annotations (and associated form fields) from another page, applying a transformation matrix.
- pikepdf.Page.get_matrix_for_transformations() and pikepdf.Page.get_matrix_for_form_xobject_placement() expose qpdf’s page/form-XObject placement matrices.
- pikepdf.AcroForm.validate(), pikepdf.AcroForm.invalidate_cache() and pikepdf.AcroForm.transform_annotations() for working with interactive forms after manual structural edits.
Added pikepdf.Page.get_images(), which by default recurses into nested form XObjects to find images. The pikepdf.Page.images property is now deprecated: it only reports images referenced directly by the page and silently omits images drawn through form XObjects, which made it appear as if a page “has no images” when it clearly did. Use get_images() instead, or get_images(recursive=False) for the old behavior.
Added pikepdf.Page.rotation, a property that reports a page’s effective clockwise rotation normalized to [0, 360). Unlike the raw page.Rotate attribute, it resolves a /Rotate value inherited from the page tree and reports 0 when no rotation is set, instead of raising. Assigning to it sets the absolute rotation. This addresses the long-standing confusion between the page.Rotate attribute and the page.rotate() method (#467).
pikepdf.Page.rotate() now defaults relative to False, so page.rotate(90) sets an absolute rotation. Passing relative as a positional argument is deprecated and emits a DeprecationWarning; pass it as a keyword argument instead, e.g. page.rotate(90, relative=True). Positional support will be removed in pikepdf 11.
Added pikepdf.Pdf.add_pages_from() to copy pages between documents while preserving interactive AcroForm form fields, returning a pikepdf.PageCopyResult. Naive pages.extend() across documents and save() of documents with orphaned form widgets now emit pikepdf.PageCopyWarning. (#670, #207)

When copying pages, named destinations referenced by the copied pages’ annotations (e.g. table-of-contents links) are now carried into the destination document — both the PDF 1.2 Names.Dests name tree and the legacy PDF 1.1 Root.Dests dictionary — so internal links keep working regardless of merge order. Name collisions are renamed and reported via pikepdf.PageCopyResult (named_dests_added, renamed_dests, dropped_dests). Naive pages.extend() now also warns when copied pages reference named destinations. (#148)

Fixes

Fixed image extraction ignoring the /Decode array, which caused colors to be inverted (or otherwise mismapped) when a PDF specified a non-default /Decode such as [1, 0]. pikepdf.PdfImage.as_pil_image() and pikepdf.PdfImage.extract_to() now apply /Decode as a linear per-channel mapping for grayscale, RGB and CMYK raster images, matching how a PDF viewer renders the image. Previously /Decode was honored only for CCITTFax-encoded images. Thanks to Mark-Joy for the report. #650 Both methods gained an apply_decode_array parameter (default True). Pass apply_decode_array=False to retrieve the raw stored sample values with the least processing – useful for forensic inspection of the underlying image data. Some image types are intentionally not affected: Indexed-colorspace images (where /Decode remaps palette indices rather than colors – a non-identity /Decode there now emits a warning), and DCT (JPEG) / JPX (JPEG 2000) images, whose codecs carry their own color semantics (such as the Adobe APP14 marker for inverted CMYK) that Pillow already honors; re-applying /Decode would double-invert them.
Fixed pikepdf.Pdf.save() decompressing streams when called with compress_streams=False and no explicit stream_decode_level. qpdf 11.10 changed its default stream decode level to generalized, which caused such saves to decompress (without recompressing) streams and balloon the output file. pikepdf now pins the decode level to none in this case, restoring the documented behavior that compress_streams=False alone does not trigger decompression. Fixes #676.
The minimum required qpdf version is now 12.3.2. The new pikepdf.AcroForm.validate() binding calls qpdf’s QPDFAcroFormDocumentHelper::validate, which was added in qpdf 12.3.0, so pikepdf no longer builds against older qpdf releases.

Documentation

Documented a long-standing page-deletion pitfall: deleting a page unlinks it from the page tree, but a page that is still referenced by an outline (bookmark), link annotation, or named destination remains in the saved file. The Deleting pages topic now explains the behavior and gives workarounds. Thanks to m-holger. Closes #196.
Documented how to copy metadata between documents, in a new Copying metadata between documents topic, including why blindly copying all fields (or the raw XMP stream) can import false conformance claims and identifiers. Closes #188.

v10.8.0

Added pikepdf.ReferenceCycleError (a subclass of pikepdf.PdfError), raised when an operation would create a cycle of direct (non-indirect) objects – a direct object may not contain itself, directly or indirectly. Use pikepdf.Pdf.make_indirect() to create a reference cycle instead. This requires a build of qpdf that prevents direct-object cycle construction; on older qpdf the offending operation is permitted as before.
Added a new pikepdf.sanitize module with curated, low-risk helpers for removing active or auxiliary content from untrusted PDFs: remove_javascript, remove_attachments, remove_external_access, remove_thumbnails, remove_search_index, remove_multimedia (Rendition/Movie/Sound/RichMedia/3D content), remove_web_capture (/SpiderInfo), remove_private_app_data (page-piece dictionaries), and remove_collection (PDF portfolio view), plus a pikepdf.sanitize.Sanitizer builder for chaining these operations. The action-based removals now also traverse the document outline (bookmarks) and treat embedded go-to (/GoToE) as external access, and remove_attachments now sweeps /AF associated-file references from every object (XObjects, structure elements, DParts, etc.). Also added a new Sanitizing PDFs topic discussing PDF sanitization, threat models, and the limits of programmatic redaction. Fixes #673.
pikepdf now publishes free-threaded CPython 3.14 (cp314t) binary wheels to PyPI for Linux (manylinux and musllinux, x86-64 and aarch64), macOS (x86-64 and Apple Silicon) and Windows (x86-64). Previously these wheels were not published and free-threaded users had to build pikepdf from source.
Updated the PyPI “Free Threading” trove classifier from “1 - Unstable” to “3 - Supported”.
Some of pikepdf’s dependencies (such as lxml and Pillow) publish their own free-threaded wheels; on less common platforms or when older versions are involved, free-threading might require source builds of those dependencies.
Reimplemented Page’s attribute, item and get accessors in C++ instead of Python. These delegate to the underlying page dictionary and were previously implemented as Python augmentations; moving them to C++ removes extra Python call frames on these hot paths. Behavior is unchanged.
Object construction (Name, Array, Dictionary, the Name.Attr shorthand, the scalar types Integer/Boolean/Real, and NamePath) is now implemented in C++ for improved performance. Behavior is unchanged.

v10.7.3

Upgraded to cibuildwheel 3.4.1 and refreshed pinned GitHub Actions (actions/checkout@v6, actions/upload-artifact@v7, actions/download-artifact@v8, codecov/codecov-action@v6). Dropped the CPython 3.15 prerelease test job, since cibuildwheel has not yet shipped a stable release with 3.15 support.
Fixed Windows wheels bundling a fixed-version, un-mangled copy of the Microsoft Visual C++ runtime (msvcp140*.dll, vcruntime140*.dll, concrt140.dll) inside the package directory. These were inadvertently copied from qpdf’s prebuilt release alongside qpdf30.dll. Shipping them caused a second, conflicting copy of the C++ runtime to load in the same process, which could corrupt CPython’s per-thread state and produce a fatal PyInterpreterState_Get ... the GIL is released (the current Python thread state is NULL) error or an ImportError for pikepdf._core, typically only in some launch environments (e.g. a terminal but not IDLE) or when another extension was imported first. The Windows wheel now copies only qpdf’s own DLLs; delvewheel vendors a name-mangled copy of the C++ standard library runtime and uses CPython’s own vcruntime140, so the wheel remains self-contained without an un-mangled runtime in the package directory that could collide with the system copy. Fixes :issue:718.
Improved the error message raised when pikepdf._core fails to import: it now reports the active interpreter, version, free-threading status, and (on Windows) a hint about the Visual C++ Redistributable and interpreter mismatches.
Fixed a segmentation fault when comparing two direct (non-indirect) Dictionary or Array objects that form a cyclic reference graph, for example a['/Kids'] = [b]; b['/Kids'] = [a]; a == b. The cycle detector previously keyed its bookkeeping on unparseBinary(), which itself recurses through the whole graph and overflowed the C stack for direct cyclic objects. Equality now detects cycles by object identity instead, so such comparisons terminate. Fixes :issue:731.
Fixed an AttributeError when reading a document outline (“bookmarks”) whose items are missing the required /Title field. By default, Pdf.open_outline() now quietly treats a missing /Title as an empty string; passing strict=True raises OutlineStructureError instead. Fixes :issue:730.

v10.7.2

Fixed a segmentation fault when an object that is not an Encryption, dict, bool, or None (for example a list or unittest.mock.MagicMock) was passed to the encryption argument of Pdf.save(). A TypeError is now raised instead. Fixes :issue:727.
Fixed a possible segmentation fault in Page.add_content_token_filter() if the user had previously assigned a non-list value to the private Pdf._token_filter_refs attribute. The attribute is now reset before use.
Suppressed nanobind’s leaked instances/types/functions report at interpreter shutdown. Module-scope Python state (e.g. pytest.mark.parametrize arguments) commonly holds pikepdf objects until the interpreter exits; nanobind reports these as leaks even though they are not bugs. Set the environment variable PIKEPDF_NANOBIND_LEAK_WARNINGS=1 before importing pikepdf to re-enable the report for debugging. Fixes :issue:728.
Fixed Array.append(None) raising TypeError instead of inserting a PDF null object. This was a nanobind migration regression vs. v10.5. Fixes :issue:725.
Fixed Dictionary.__setattr__(name, None) (i.e. d.Key = None) raising TypeError instead of the documented ValueError advising to use del to remove the key. Same nanobind migration regression as Array.append(None).
Fixed macro redefinition warnings on Fedora rawhide (Python 3.14 + glibc 2.42) by ensuring Python.h is included before any standard library headers in all translation units. Fixes :issue:724.
Moved the 2-bit and 4-bit subbyte pixel unpack inner loops from Python into C++, eliminating per-byte interpreter overhead when decoding low-bit-depth images.

v10.7.1

Fixed build to continue generating Python version specific wheels for 3.12 and 3.13 due to open issue in nanobind. Fixes :issue:723. Thanks @mgorny for reporting.
Improved CI build to perform more detailed tests using python3-dbg (debug build) which has more assertions and would have uncovered this issue.

v10.7.0

Yanked release from PyPI due to segfaults on Python 3.12 and 3.13; fixed in 10.7.1.
Python 3.12+ are now built with abi3 (the Stable ABI). Earlier versions and freethreading builds continue to be built against the specific Python versions.
Remove manual hack to generate docs/requirements.txt for the readthedocs.org.

v10.6.0

Released v10.6.0 with version bump only.

v10.6.0rc2

Fixed a regression during nanobind migration (exception hierarchy unintentionally changed).

v10.6.0rc1

Replaced pybind11 with nanobind and added full freethreading support. pikepdf binary size is now both ~20% smaller and about 10% faster thanks to nanobind.

v10.5.1

Updated lockfile to avoid a PyJWT CVE. We only depend use PyJWT via pygithub for developer release tooling not in pikepdf itself, so this is inconsequential for pikepdf users but does silence automated security advisories.
Suppressed GCC -Wpsabi note about C++17 ABI change for std::pair in pybind11 headers.

v10.5.0

Fixed logger in ctm module using __file__ instead of __name__, which produced unhelpful log names. :issue:712
Modernized README.
Test all README code blocks instead of just one.

v10.4.0

Enums are now proper Python enum.Enum/enum.IntFlag types (PEP 435 compliant), migrated from pybind11’s deprecated py::enum_ to py::native_enum.
Reimplemented the PDFDocEncoding codec in pure Python using the standard library charmap pattern, removing the C++ dependency on qpdf for encoding.
Upgraded to qpdf 12.3.2.
Fixed incorrect docstrings for StreamDecodeLevel. :issue:708
Fixed type stubs: added PEP 570 positional-only markers, and corrected index() signature.

v10.3.0

Fixed UnicodeDecodeError when listing keys of a dictionary containing invalid UTF-8. Thanks @qooxzuub. :issue:696
Fixed an issue where opening a PDF with duplicate form field names would cause a crash. Accessing a duplicate field by name now returns a proxy list of all matching fields. Thanks @qooxzuub. :issue:697
Added .values() accessor to Object for iterating over dictionary values. Thanks @qooxzuub.:issue:699,697
Added .copy() and .update() methods to Dictionary. Thanks @qooxzuub.:issue:700
Improved Object.copy implementation and added type stubs. Thanks @qooxzuub.:issue:702
Fixed missing return in SimpleFont._encode_diffmap(). Thanks @lachlan.charlick :issue:706
Improved error messages for invalid dictionary access. Thanks @qooxzuub.:issue:701
Lazy load lxml and Pillow to improve import time. Thanks @qooxzuub. :issue:704
Improved atomic_overwrite robustness for restricted directories and special files. :issue:695

v10.2.0

Fixed unparse_content_stream() not preserving literal strings when given raw Python tuples. :issue:689
The pikepdf.explicit_conversion() context manager is now thread-local and takes precedence over the global setting from pikepdf.set_object_conversion_mode(). Nested context managers are supported via a depth counter.
Moved explicit conversion functions to their own module for better code organization.
Improved C++ test coverage to 97.5% (from 96.4% line coverage, 94.9% to 95.1% function coverage).

v10.1.0

Added pikepdf.NamePath for ergonomic access to deeply nested PDF structures. NamePath provides a single-operation traversal with helpful error messages showing exactly where traversal failed. See Accessing nested objects with NamePath for details.
Added explicit scalar types: pikepdf.Integer, pikepdf.Boolean, and pikepdf.Real. When explicit conversion mode is enabled, these types are returned instead of Python native types (int, bool, Decimal), enabling better type safety and static type checking.
Added pikepdf.set_object_conversion_mode() and pikepdf.get_object_conversion_mode() to control conversion behavior globally.
Added pikepdf.explicit_conversion() context manager for temporarily enabling explicit conversion mode.
Added safe accessor methods to pikepdf.Object: as_int(), as_bool(), as_float(), and as_decimal() with optional default parameters for type-safe access to scalar values.
pikepdf.Integer and pikepdf.Real now support full arithmetic operations with both int and float operands, including true division (/).

v10.0.3

Fixed an issue where PdfImage.as_pil_image() would create additional unused objects in the PDF that called it.
Fixed a shutdown segfault in the alpha release of Python 3.15.
Fixed Pdf.show_xref_table() not actually showing its output.
Pin test dependencies python-xmp-toolkit to < 2.1.0. python-xmp-toolkit 2.1.0 is effectively a breaking change, requiring a new version of libexempi to be installed that is not available on some cibuildwheel builders. As a workaround, we have pinned the older version. We only use python-xmp-toolkit for testing to confirm correctness–pikepdf has its own XML-based implementation of XMP.

v10.0.2

Fixed presentation of strings using unparse_content_stream - if the stream can be represented using PdfDocEncoding, it is rendered in that way for ease of reading. :issue:682
Reformatted C++ source.

v10.0.1

Fixed issue with performing equality test on dictionaries with cyclic subgraphs. :issue:677

v10.0.0

See breaking changes for v10.0.0 above.