Release notes

pike fish being released to water

Releasing a pike.

pikepdf releases use the semantic versioning policy.

The pikepdf API (as provided by import pikepdf) is stable and is in production use. Note that the C++ extension module pikepdf._qpdf is a private interface within pikepdf that applications should not access directly, along with any modules with a prefixed underscore.

Python 3.6 reaches end of life on December 23, 2021. pikepdf 3.x will continue to support Python 3.6 until then.

v4.0.1

  • Fixed documentation build reproducible. (Thanks to Chris Lamb and Sean Whitton.)

  • Fixed issue where file attachments not located in the current working directory would be created with a directory name.

  • Removed some references to Python 3.6.

  • Added some fixes to typing hints from @cherryblossom000.

v4.0.0

Breaking changes

  • Python 3.10 is supported.

  • Dropped support for Python 3.6, since it is reaching end of life soon. We will backport critical fixes to pikepdf 3.x until Python 3.6 reaches end of life in December 2021.

  • We now require C++17 and generate wheels for manylinux2014 Linux targets. We had to drop support for manylinux2010, our previous target, since some of our dependencies like Pillow are no longer supporting manylinux2010.

v3.2.0

  • Fixed support for outline items that have PDF 1.1-style named destinations. #258, #261

  • We now issue a warning if an unnecessary password was provided when opening an unencrypted PDF.

v3.1.1

  • Fixed errors that occurred on import pikepdf for an extension module built with pybind11 2.8.0.

v3.1.0

  • Extraction of common inline image file formats is now supported.

  • Some refactoring and documentation improvements.

v3.0.0

Breaking changes

  • libqpdf 10.3.1 is now required and other requirements were adjusted.

  • pybind11 2.7.1 is now required.

  • Improved page API. Pdf.pages now returns Page instead of page object dictionaries, so it is no longer necessary to wrap page objects as in the previous idiom page = Page(pdf.pages[0]). In most cases, if you use the Dictionary object API on a page, it will automatically do the right thing to the underlying dictionary.

  • Improved content stream API. parse_content_stream now returns a list of pikepdf.ContentStreamInstruction or pikepdf.ContentStreamInlineImage. These are “duck type”-compatible with the previous data structure but may affect code that strongly depended on the return types. unparse_content_stream still accepts the same inputs.

  • TokenType.name and ObjectType.name were renamed to TokenType.name_ and ObjectType.name_, respectively. Unfortunately, Python’s Enum class (of which these are both a subclass) uses the .name attribute in a special way that interfered.

  • Deprecated or private functions were removed: - Object.page_contents_* (use Page.contents_*) - Object.images (use Page.images) - Page._attach (use the new attachment API) - Stream(obj=) (deprecated obj parameter removed) - Pdf.root (use Pdf.Root) - Pdf._process (use Pdf.open(BytesIO(...)) instead)

  • pikepdf.Page.calc_form_xobject_placement() previously returned str when it should have returned bytes. It now returns the correct type.

  • pikepdf.open() and pikepdf.save(), and their counterparts in pikepdf.Pdf, now expect keyword arguments for all except the first parameter.

  • Some other functions have stricter typing, required keyword arguments, etc., for clarity.

  • If a calculating the repr() of a page, we now describe a reference to that page rather than printing the page’s representation. This makes the output of repr(obj) more useful when examining data structures that reference many pages, such as /Outlines.

  • Build scripts and wheel building updated.

  • We now internally use a different API call to close a PDF in libqpdf. This may change the behavior of attempts to manipulate a PDF after it has been closed. In any case, accessing a closed file was never supported.

New functionality

  • Added pikepdf.NameTree. We now bind to QPDF’s Name Tree API, for manipulating these complex and important data structures.

  • We now support adding and removing PDF attachments. #209

  • Improved support for PDF images that use special printer colorspaces such as DeviceN and Separation, and support extracting more types of images. #237

  • Improved error message when Pdf.save() is called on PDFs without a known source file.

  • Many documentation fixes to StreamParser, return types, PdfImage.

  • x in pikepdf.Array() is now supported; previously this construct raised a TypeError. #232

  • It is now possible to test our cibuildwheel configuration on a local machine.

Fixes

  • repr(pikepdf.Stream(...)) now returns syntax matching what the constructor expects.

  • Fixed certain wrong exception types that occurred when attempting to extract special printer colorspace images.

  • Lots of typing fixes.

v2.16.1

  • unparse_content_stream is now less strict about whether elements are lists or tuples, matching its v2.15.1 behavior.

v2.16.0

  • Performance improvement for unparse_content_stream.

  • Fixed some linter warnings.

  • Tightened pybind11 dependencies so we don’t accept new minor revisions automatically.

  • Updated docs on FreeBSD.

v2.15.1

  • Fixed compatibility with pybind11 2.7.0 - some tests fail when previous versions of pikepdf are compiled with that version.

  • Fixed a coverage code exclusion.

  • Added a note missing “version added” comment to documentation.

  • Fixed license string not appearing in metadata - thanks @mara004.

v2.15.0

  • Improved our pdfdoc codec to raise UnicodeEncodeError identifying the problem, instead of a less specific ValueError. Thanks to @regebro. #218

  • We now implement stream reader/writer and incremental encoder/decoder for our pdfdoc codec, making it useful in more places.

  • Fixed an issue with extracting JBIG2 images on Windows, due to Windows temporary file behavior. Thanks to @kraptor. #219

v2.14.2

  • Fixed a syntax error in type hints.

v2.14.1

  • Fixed the ReadTheDocs documentation build, which had broken after the setup.cfg changes in v2.13.0.

  • Amended the Makefile with steps for building Apple Silicon wheels.

  • No manual Apple Silicon release since there are no functional changes.

v2.14.0

  • Implemented a major new feature: overlays (watermarks, page composition). This makes it easier to solve many common tasks that involve copying content from pages to other pages, applying watermarks, headers/footers, etc. #42

  • Added pikepdf.Object.with_same_owner_as() to simplify creating objects that have the same owner as another object.

  • Many improvements to type hints for classes implemented in C++. #213, #214

v2.13.0

  • Build system modernized to use setup.cfg instead of setup.py as much as reasonable.

  • The requirements/*.txt files are now deprecated. Instead use pip install pikepdf[test,docs] to install optional extras.

  • Extended test coverage for a few tests that affect global state, using pytest-forked to isolate them.

  • All C++ autoformatted with clang-format.

  • We now imbue all C++ stringstreams with the C locale, to avoid formatting output incorrectly if another Python extension written in C++ happens to change the global std::locale.

v2.12.2

  • Rebuild wheels against libqpdf 10.3.2.

  • Enabled building Linux PyPy x86_64 wheels.

  • Fixed a minor issue where the inline images would have their abbreviations expanded when unparsed. While unlikely to be problematic, inline images usually use abbreviations in their metadata and should be kept that way.

  • Added notes to documentation about loading PDFs through Python file streams and cases that can lead to poor performance.

v2.12.1

  • Fixed documentation typo and updated precommit settings.

  • Ongoing improvements to code coverage: now related to image handling.

v2.12.0

  • Complete bindings for pikepdf.Annotation (useful for interpreting PDF form widgets, comments, etc.)

  • Ongoing improvements to code coverage: minor bug fixes, unreachable code removal, more coverage.

v2.11.4

  • Fix #160, ‘Tried to call pure virtual function “TokenFilter::handle_token”’; this was a Python/C++ reference counting problem.

v2.11.3

  • Check for versions of jbig2dec that are too old to be supported (lacking the necessary command line arguments to extract an image from a PDF).

  • Fix setup.py typo: cmd_class changed to cmdclass.

v2.11.2

  • Added missing documentation for Pdf.is_encrypted.

  • Added some documentation annotations about when certain APIs were added or changed, going back to 2.0.

v2.11.1

  • Fixed an issue with Object.emplace() not retaining the original object’s /Parent.

  • Code coverage improvements.

v2.11.0

  • Add new functions: Pdf.generate_appearance_streams and Pdf.flatten_annotations, to support common work with PDF forms.

  • Fixed an issue with pip install on platforms that lack proper multiprocessing support.

  • Additional documentation improvements from @m-holger - thanks again!

v2.10.0

  • Fixed a XML External Entity (XXE) processing vulnerability in PDF XMP metadata parsing. (Reported by Eric Therond of Sonarsource.) All users should upgrade to get this security update. CVE-2021-29421 was assigned to this issue.

  • Bind new functions to check, when a PDF is opened, whether the password used to open the PDF matched the owner password, user password, or both: Pdf.user_password_matched and Pdf.owner_password_matched.

v2.9.2

  • Further expansion of test coverage of several functions, and minor bug fixes along the way.

  • Improve parameter validation for some outline-related functions.

  • Fixed overloaded __repr__ functions in _methods.py not being applied.

  • Some proofreading of the documentation by @m-holger - thanks!

v2.9.1

  • Further expansion of test coverage.

  • Fixed function signatures for _repr_mimebundle_ functions to match IPython’s spec.

  • Fixed some error messages regarding attempts to do strange things with pikepdf.Name, like pikepdf.Name.Foo = 3.

  • Eliminated code to handle an exception that provably does not occur.

  • Test suite is now better at closing open file handles.

  • Ensure that any demo code in README.md is valid and works.

  • Embedded QPDF version in pikepdf Python wheels increased to 10.3.1.

v2.9.0

  • We now issue a warning when attempting to use pikepdf.open on a bytes object where it could be either a PDF loaded into memory or a filename.

  • pikepdf.Page.label will now return the “ordinary” page number if no special rules for pages are defined.

  • Many improvements to tests and test coverage. Code coverage for both Python and C++ is now automatically published to codecov.io; previously coverage was only checked on the developer’s machine.

  • An obsolete private function Object._roundtrip was removed.

v2.8.0

  • Fixed an issue with extracting data from images that had their DecodeParms structured as a list of dictionaries.

  • Fixed an issue where a dangling stream object is created if we fail to create the requested stream dictionary.

  • Calling Dictionary() and Array() on objects which are already of that type returns a shallow copy rather than throwing an exception, in keeping with Python semantics.

  • v2.8.0.post1: The CI system was changed from Azure Pipelines to GitHub Actions, a transition we made to support generating binary wheels for more platforms. This post-release was the first release made with GitHub Actions. It ought to be functionally identical, but could different in some subtle way, for example because parts of it may have been built with different compiler versions.

  • v2.8.0.post2: The previous .post1 release caused binary wheels for Linux to grow much larger, causing problems for AWS Lambda who require small file sizes. This change strips the binaries of debug symbols, also mitigates a rare PyPy test failure.

  • Unfortunately, it appears that the transition from Azure Pipelines to GitHub Actions broke compatibility with macOS 10.13 and older. macOS 10.13 and older are considered end of life by Apple. No version of pikepdf v2.x ever promised support for macOS 10.13 – 10.14+ has always been an explicit requirement. It just so happens that for some time, pikepdf did actually work on 10.13.

v2.7.0

  • Added an option to tell Pdf.save to recompress flate streams, and a global option to set the flate compression level. This option can be use to force the recompression of flate streams if they are not well compressed.

  • Fixed “TypeError: only pages can be inserted” when attempting to an insert an unowned page using QPDF 10.2.0 or later.

v2.6.0

  • Rebuild wheels against QPDF 10.2.0.

v2.5.2

  • Fixed support for PyPy 3.7 on macOS.

v2.5.1

  • Rebuild wheels against recently released pybind11 v2.6.2.

  • Improved support for building against PyPy 3.6/7.3.1.

v2.5.0

  • PyPy3 is now supported.

  • Improved test coverage for some metadata issues.

v2.4.0

  • The DocumentInfo dictionary can now be deleted with del pdf.docinfo.

  • Fixed issues with updating the dc:creator XMP metadata entry.

  • Improved error messages on attempting to encode strings containing Unicode surrogates.

  • Fixed a rare random test failure related to strings containing Unicode surrogates.

v2.3.0

  • Fixed two tests that failed with libqpdf 10.1.0.

  • Add new function pikepdf.Page.add_resource which helps with adding a new object to the /Resources dictionary.

  • Binary wheels now provide libqpdf 10.1.0.

v2.2.5

  • Changed how one C++ function is called to support libqpdf 10.1.0.

v2.2.4

  • Fixed another case where pikepdf should not be warning about metadata updates.

v2.2.3

  • Fixed a warning that was incorrectly issued in v2.2.2 when pikepdf updates XMP metadata on the user’s behalf.

  • Fixed a rare test suite failure that occurred if two test files were generated with a different timestamp, due to timing of the tests.

  • Hopefully fixed build on Cygwin (not tested, based on user report).

v2.2.2

  • Fixed #150, adding author metadata breaks PDF/A conformance. We now log an error when this metadata is set incorrectly.

  • Improve type checking in ocrmypdf.models.metadata module.

  • Improve documentation for custom builds.

v2.2.1

  • Fixed #143, PDF/A validation with veraPDF failing due to missing prefix on DocumentInfo dates.

v2.2.0

  • Added features to look up the index of an page in the document and page labels

  • Enable parallel compiling (again)

  • Make it easier to create a pikepdf.Stream with a dictionary or from an existing dictionary.

  • Converted most .format() strings to f-strings.

  • Fixed incorrect behavior when assigning Object.stream_dict; this use to create a dictionary in the wrong place instead of overriding a stream’s dictionary.

v2.1.2

  • Fixed an issue the XMP metadata would not have a timezone set when updated. According to the XMP specification, the timezone should be included. Note that pikepdf will include the local machine timezone, unless explicitly directed otherwise.

v2.1.1

  • The previous release inadvertently changed the type of exception in certain situations, notably throwing ForeignObjectError when this was not the correct error to throw. This release fixes that.

v2.1.0

  • Improved error messages and documentation around Pdf.copy_foreign.

  • Opt-in to mypy typing.

v2.0.0

This description includes changes in v2.0 beta releases.

Breaking changes

  • We now require at least these versions or newer: - Python 3.6 - pybind11 2.6.0 - QPDF 10.0.3 - For macOS users, macOS 10.14 (Mojave)

  • Attempting to modifying Stream.Length will raise an exception instead of a warning. pikepdf automatically calculates the length of the stream when a PDF is saved, so there is never a reason to modify this.

  • pikepdf.Stream() can no longer parse content streams. That never made sense, since this class supports streams in general, and many streams are not content streams. Use pikepdf.parse_content_stream to a parse a content stream.

  • pikepdf.Permissions is now represented as a NamedTuple. Probably not a concern unless some user made strong assumptions about this class and its superclass.

  • Fixed the behavior of the __eq__ on several classes to return NotImplemented for uncomparable objects, instead of False.

  • The instance variable PdfJpxImage.pil is now a private variable.

New features

  • Python 3.9 is supported.

  • Significantly improved type hinting, including hints for functions written in C++.

  • Documentation updates

Deprecations - Pdf.root is deprecated. Use Pdf.Root.

v2.0.0b2

  • We now require QPDF 10.0.3.

v2.0.0b1

Breaking changes

  • We now require at least these versions or newer: - Python 3.6 - pybind11 2.6.0 - QPDF 10.0.1 - For macOS users, macOS 10.14 (Mojave)

  • Attempting to modifying Stream.Length will raise an exception instead of a warning.

  • pikepdf.Stream() can no longer parse content streams. That never made sense, since this class supports streams in general, and many streams are not content streams. Use pikepdf.parse_content_stream to a parse a content stream.

  • pikepdf.Permissions is now represented as a NamedTuple. Probably not a concern unless some user made strong assumptions about this class and its superclass.

  • Fixed the behavior of the __eq__ on several classes to return NotImplemented for uncomparable objects, instead of False.

New features

  • Python 3.9 is supported.

  • Significantly improved type hinting, including hints for functions written in C++.

v1.19.4

  • Modify project settings to declare no support for Python 3.9 in pikepdf 1.x. pybind11 upstream has indicated there are stability problems when pybind11 2.5 (used by pikepdf 1.x) is used with Python 3.9. As such, we are marking Python 3.9 as unsupported by pikepdf 1.x. Python 3.9 users should switch to pikepdf 2.x.

v1.19.3

  • Fixed an exception that occurred when building the documentation, introduced in the previous release.

v1.19.2

  • Fixed an exception with setting metadata objects to unsupported RDF types. Instead we make a best effort to convert to an appropriate type.

  • Prevent creating certain illegal dictionary key names.

  • Document procedure to remove an image.

v1.19.1

  • Fixed an issue with unparse_content_stream: we now assume the second item of each step in the content stream is an Operator.

  • Fixed an issue with unparsing inline images.

v1.19.0

  • Learned how to export CCITT images from PDFs that have ICC profiles attached.

  • Cherry-picked a workaround to a possible use-after-free caused by pybind11 (pybind11 PR 2223).

  • Improved test coverage of code that handles inline images.

v1.18.0

  • You can now use pikepdf.open(...allow_overwriting_input=True) to allow overwriting the input file, which was previously forbidden because it can corrupt data. This is accomplished safely by loading the entire PDF into memory at the time it is opened rather than loading content as needed. The option is disabled by default, to avoid a performance hit.

  • Prevent setup.py from creating junk temporary files (finally!)

v1.17.3

  • Fixed crash when pikepdf.Pdf objects are used inside generators (#114) and not freed or closed before the generator exits.

v1.17.2

  • Fixed issue, “seek of closed file” where JBIG2 image data could not be accessed (only metadata could be) when a JBIG2 was extracted from a PDF.

v1.17.1

  • Fixed building against the oldest supported version of QPDF (8.4.2), and configure CI to test against the oldest version. (#109)

v1.17.0

  • Fixed a failure to extract PDF images, where the image had both a palette and colorspace set to an ICC profile. The iamge is now extracted with the profile embedded. (#108)

  • Added opt-in support for memory-mapped file access, using pikepdf.open(...access_mode=pikepdf.AccessMode.mmap). Memory mapping file access performance considerably, but may make application exception handling more difficult.

v1.16.1

  • Fixed an issue with JBIG2 extraction, where the version number of the jbig2dec software may be written to standard output as a side effect. This could interfere with test cases or software that expects pikepdf to be stdout-clean.

  • Fixed an error that occurred when updating DocumentInfo to match XMP metadata, when XMP metadata had unexpected empty tags.

  • Fixed setup.py to better support Python 3.8 and 3.9.

  • Documentation updates.

v1.16.0

  • Added support for extracting JBIG2 images with the image API. JBIG2 images are converted to PIL.Image. Requires a JBIG2 decoder such as jbig2dec.

  • Python 3.5 support is deprecated and will end when Python 3.5 itself reaches end of life, in September 2020. At the moment, some tests are skipped on Python 3.5 because they depend on Python 3.6.

  • Python 3.9beta is supported and is known to work on Fedora 33.

v1.15.1

  • Fixed a regression - Pdf.save(filename) may hold file handles open after the file is fully written.

  • Documentation updates.

v1.15.0

  • Fixed an issue where Decimal objects of precision exceeding the PDF specification could be written to output files, causing some PDF viewers, notably Acrobat, to parse the file incorrectly. We now limit precision to 15 digits, which ought to be enough to prevent rounding error and parsing errors.

  • We now refuse to create pikepdf objects from float or Decimal that are NaN or ±Infinity. These concepts have no equivalent in PDF.

  • pikepdf.Array objects now implement .append() and .extend() with familiar Python list semantics, making them easier to edit.

v1.14.0

  • Allowed use of .keys(), .items() on pikepdf.Stream objects.

  • We now warn on attempts to modify pikepdf.Stream.Length, which pikepdf will manage on its own when the stream is serialized. In the future attempting to change it will become an error.

  • Clarified documentation in some areas about behavior of pikepdf.Stream.

v1.13.0

  • Added support for editing PDF Outlines (also known as bookmarks or the table of contents). Many thanks to Matthias Erll for this contribution.

  • Added support for decoding run length encoded images.

  • Object.read_bytes() and Object.get_stream_buffer() can now request decoding of uncommon PDF filters.

  • Fixed test suite warnings related to pytest and hypothesis.

  • Fixed build on Cygwin. Thanks to @jhgarrison for report and testing.

v1.12.0

  • Microsoft Visual C++ Runtime libraries are now included in the pikepdf Windows wheel, to improve ease of use on Windows.

  • Defensive code added to prevent using .emplace() on objects from a foreign PDF without first copying the object. Previously, this would raise an exception when the file was saved.

v1.11.2

  • Fix “error caused by missing str function of Array” (#100, #101).

  • Lots of delinting and minor fixes.

v1.11.1

  • We now avoid creating an empty XMP metadata entry when files are saved.

  • Updated documentation to describe how to delete the document information dictionary.

v1.11.0

  • Prevent creation of dictionaries with invalid names (not beginning with /).

  • Allow pikepdf’s build to specify a qpdf source tree, allowing one to compile pikepdf against an unreleased/modified version of qpdf.

  • Improved behavior of pages.p() and pages.remove() when invalid parameters were given.

  • Fixed compatibility with libqpdf version 10.0.1, and build official wheels against this version.

  • Fixed compatibility with pytest 5.x.

  • Fixed the documentation build.

  • Fixed an issue with running tests in a non-Unicode locale.

  • Fixed a test that randomly failed due to a “deadline error”.

  • Removed a possibly nonfree test file.

v1.10.4

  • Rebuild Python wheels with newer version of libqpdf. Fixes problems with opening certain password-protected files (#87).

v1.10.3

  • Fixed isinstance(obj, pikepdf.Operator) not working. (#86)

  • Documentation updates.

v1.10.2

  • Fixed an issue where pages added from a foreign PDF were added as references rather than copies. (#80)

  • Documentation updates.

v1.10.1

  • Fixed build reproducibility (thanks to @lamby)

  • Fixed a broken link in documentation (thanks to @maxwell-k)

v1.10.0

  • Further attempts to recover malformed XMP packets.

  • Added missing functionality to extract 1-bit palette images from PDFs.

v1.9.0

  • Improved a few cases of malformed XMP recovery.

  • Added an unparse_content_stream API to assist with converting the previously parsed content streams back to binary.

v1.8.3

  • If the XMP metadata packet is not well-formed and we are confident that it is essentially empty apart from XML fluff, we fix the problem instead of raising an exception.

v1.8.2

  • Fixed an issue where QPDF 8.4.2 would report different errors from QPDF 9.0.0, causing a test to fail. (#71)

v1.8.1

  • Fixed an issue where files opened by name may not be closed correctly. Regression from v1.8.0.

  • Fixed test for readable/seekable streams evaluated to always true.

v1.8.0

  • Added API/property to iterate all objects in a PDF: pikepdf.Pdf.objects.

  • Added pikepdf.Pdf.check(), to check for problems in the PDF and return a text description of these problems, similar to qpdf --check.

  • Improved internal method for opening files so that the code is smaller and more portable.

  • Added missing licenses to account for other binaries that may be included in Python wheels.

  • Minor internal fixes and improvements to the continuous integration scripts.

v1.7.1

  • This release was incorrectly marked as a patch-level release when it actually introduced one minor new feature. It includes the API change to support pikepdf.Pdf.objects.

v1.7.0

  • Shallow object copy with copy.copy(pikepdf.Object) is now supported. (Deep copy is not yet supported.)

  • Support for building on C++11 has been removed. A C++14 compiler is now required.

  • pikepdf now generates manylinux2010 wheels on Linux.

  • Build and deploy infrastructure migrated to Azure Pipelines.

  • All wheels are now available for Python 3.5 through 3.8.

v1.6.5

  • Fixed build settings to support Python 3.8 on macOS and Linux. Windows support for Python 3.8 is not currently tested since continuous integration providers have not updated to Python 3.8 yet.

  • pybind11 2.4.3 is now required, to support Python 3.8.

v1.6.4

  • When images were encoded with CCITTFaxDecode, type G4, with the /EncodedByteAlign set to true (not default), the image extracted by pikepdf would be a corrupted form of the original, usually appearing as a small speckling of black pixels at the top of the page. Saving an image with pikepdf was not affected; this problem only occurred when attempting to extract images. We now refuse to extract images with these parameters, as there is not sufficient documentation to determine how to extract them. This image format is relatively rare.

v1.6.3

  • Fixed compatibility with libqpdf 9.0.0.

    • A new method introduced in libqpdf 9.0.0 overloaded an older method, making a reference to this method in pikepdf ambiguous.

    • A test relied on libqpdf raising an exception when a pikepdf user called Pdf.save(..., min_version='invalid'). libqpdf no longer raises an exception in this situation, but ignores the invalid version. In the interest of supporting both versions, we defer to libqpdf. The failing test is removed, and documentation updated.

  • Several warnings, most specific to the Visual C++ compiler, were fixed.

  • The Windows CI scripts were adjusted for the change in libqpdf ABI version.

  • Wheels are now built against libqpdf 9.0.0.

  • libqpdf 8.4.2 and 9.0.0 are both supported.

v1.6.2

  • Fixed another build problem on Alpine Linux - musl-libc defines struct FILE as an incomplete type, which breaks pybind11 metaprogramming that attempts to reason about the type.

  • Documentation improved to mention FreeBSD port.

v1.6.1

  • Dropped our one usage of QPDF’s C API so that we use only C++.

  • Documentation improvements.

v1.6.0

  • Added bindings for QPDF’s page object helpers and token filters. These enable: filtering content streams, capturing pages as Form XObjects, more convenient manipulation of page boxes.

  • Fixed a logic error on attempting to save a PDF created in memory in a way that overwrites an existing file.

  • Fixed Pdf.get_warnings() failed with an exception when attempting to return a warning or exception.

  • Improved manylinux1 binary wheels to compile all dependencies from source rather than using older versions.

  • More tests and more coverage.

  • libqpdf 8.4.2 is required.

v1.5.0

  • Improved interpretation of images within PDFs that use an ICC colorspace. Where possible we embed the ICC profile when extracting the image, and profile access to the ICC profile.

  • Fixed saving PDFs with their existing encryption.

  • Fixed documentation to reflect the fact that saving a PDF without specifying encryption settings will remove encryption.

  • Added a test to prevent overwriting the input PDF since overwriting corrupts lazy loading.

  • Object.write(filters=, decode_parms=) now detects invalid parameters instead of writing invalid values to Filters and DecodeParms.

  • We can now extract some images that had stacked compression, provided it is /FlateDecode.

  • Add convenience function Object.wrap_in_array().

v1.4.0

  • Added support for saving encrypted PDFs. (Reading them has been supported for a long time.)

  • Added support for setting the PDF extension level as well as version.

  • Added support converting strings to and from PDFDocEncoding, by registering a "pdfdoc" codec.

v1.3.1

  • Updated pybind11 to v2.3.0, fixing a possible GIL deadlock when pikepdf objects were shared across threads. (#27)

  • Fixed an issue where PDFs with valid XMP metadata but missing an element that is usually present would be rejected as malformed XMP.

v1.3.0

  • Remove dependency on defusedxml.lxml, because this library is deprecated. In the absence of other options for XML hardening we have reverted to standard lxml.

  • Fixed an issue where PdfImage.extract_to() would write a file in the wrong directory.

  • Eliminated an intermediate buffer that was used when saving to an IO stream (as opposed to a filename). We would previously write the entire output to a memory buffer and then write to the output buffer; we now write directly to the stream.

  • Added Object.emplace() as a workaround for when one wants to update a page without generating a new page object so that links/table of contents entries to the original page are preserved.

  • Improved documentation. Eliminated all arg0 placeholder variable names, which appeared when the documentation generator could not read a C++ variable name.

  • Added PageList.remove(p=1), so that it is possible to remove pages using counting numbers.

v1.2.0

  • Implemented Pdf.close() and with-block context manager, to allow Pdf objects to be closed without relying on del.

  • PdfImage.extract_to() has a new keyword argument fileprefix=, which to specify a filepath where an image should be extracted with pikepdf setting the appropriate file suffix. This simplifies the API for the most common case of extracting images to files.

  • Fixed an internal test that should have suppressed the extraction of JPEGs with a nonstandard ColorTransform parameter set. Without the proper color transform applied, the extracted JPEGs will typically look very pink. Now, these images should fail to extract as was intended.

  • Fixed that Pdf.save(object_stream_mode=...) was ignored if the default fix_metadata_version=True was also set.

  • Data from one Pdf is now copied to other Pdf objects immediately, instead of creating a reference that required source PDFs to remain available. Pdf objects no longer reference each other.

  • libqpdf 8.4.0 is now required

  • Various documentation improvements

v1.1.0

  • Added workaround for macOS/clang build problem of the wrong exception type being thrown in some cases.

  • Improved translation of certain system errors to their Python equivalents.

  • Fixed issues resulting from platform differences in datetime.strftime. (#25)

  • Added Pdf.new, Pdf.add_blank_page and Pdf.make_stream convenience methods for creating new PDFs from scratch.

  • Added binding for new QPDF JSON feature: Object.to_json.

  • We now automatically update the XMP PDFVersion metadata field to be consistent with the PDF’s declared version, if the field is present.

  • Made our Python-augmented C++ classes easier for Python code inspectors to understand.

  • Eliminated use of the imghdr library.

  • Autoformatted Python code with black.

  • Fixed handling of XMP metadata that omits the standard <x:xmpmeta> wrapper.

v1.0.5

  • Fixed an issue where an invalid date in XMP metadata would cause an exception when updating DocumentInfo. For now, we warn that some DocumentInfo is not convertible. (In the future, we should also check if the XMP date is valid, because it probably is not.)

  • Rebuilt the binary wheels with libqpdf 8.3.0. libqpdf 8.2.1 is still supported.

v1.0.4

  • Updates to tests/resources (provenance of one test file, replaced another test file with a synthetic one)

v1.0.3

  • Fixed regression on negative indexing of pages.

v1.0.2

  • Fixed an issue where invalid values such as out of range years (e.g. 1) in DocumentInfo would raise exceptions when using DocumentInfo to populate XMP metadata with .load_from_docinfo.

v1.0.1

  • Fixed an exception with handling metadata that contains the invalid XML entity &#0; (an escaped NUL)

v1.0.0

  • Changed version to 1.0.

v0.10.2

Fixes

  • Fixed segfault when overwriting the pikepdf file that is currently open on Linux.

  • Fixed removal of an attribute metadata value when values were present on the same node.

v0.10.1

Fixes

  • Avoid canonical XML since it is apparently too strict for XMP.

v0.10.0

Fixes

  • Fixed several issues related to generating XMP metadata that passed veraPDF validation.

  • Fixed a random test suite failure for very large negative integers.

  • The lxml library is now required.

v0.9.2

Fixes

  • Added all of the commonly used XML namespaces to XMP metadata handling, so we are less likely to name something ‘ns1’, etc.

  • Skip a test that fails on Windows.

  • Fixed build errors in documentation.

v0.9.1

Fixes

  • Fix Object.write() accepting positional arguments it wouldn’t use

  • Fix handling of XMP data with timezones (or missing timezone information) in a few cases

  • Fix generation of XMP with invalid XML characters if the invalid characters were inside a non-scalar object

v0.9.0

Updates

  • New API to access and edit PDF metadata and make consistent edits to the new and old style of PDF metadata.

  • 32-bit binary wheels are now available for Windows

  • PDFs can now be saved in QPDF’s “qdf” mode

  • The Python package defusedxml is now required

  • The Python package python-xmp-toolkit and its dependency libexempi are suggested for testing, but not required

Fixes

  • Fixed handling of filenames that contain multibyte characters on non-UTF-8 systems

Breaking

  • The Pdf.metadata property was removed, and replaced with the new metadata API

  • Pdf.attach() has been removed, because the interface as implemented had no way to deal with existing attachments.

v0.3.7

  • Add API for inline images to unparse themselves

v0.3.6

  • Performance of reading files from memory improved to avoid unnecessary copies.

  • It is finally possible to use for key in pdfobj to iterate contents of PDF Dictionary, Stream and Array objects. Generally these objects behave more like Python containers should now.

  • Package API declared beta.

v0.3.5

Breaking

  • Pdf.save(...stream_data_mode=...) has been dropped in favor of the newer compress_streams= and stream_decode_level parameters.

Fixes

  • A use-after-free memory error that caused occasional segfaults and “QPDFFakeName” errors when opening from stream objects has been resolved.

v0.3.4

Updates

  • pybind11 vendoring has ended now that v2.2.4 has been released

v0.3.3

Breaking

  • libqpdf 8.2.1 is now required

Updates

  • Improved support for working with JPEG2000 images in PDFs

  • Added progress callback for saving files, Pdf.save(..., progress=)

  • Updated pybind11 subtree

Fixes

  • del obj.AttributeName was not implemented. The attribute interface is now consistent

  • Deleting named attributes now defers to the attribute dictionary for Stream objects, as get/set do

  • Fixed handling of JPEG2000 images where metadata must be retrieved from the file

v0.3.2

Updates

  • Added support for direct image extraction of CMYK and grayscale JPEGs, where previously only RGB (internally YUV) was supported

  • Array() now creates an empty array properly

  • The syntax Name.Foo in Dictionary(), e.g. Name.XObject in page.Resources, now works

v0.3.1

Breaking

  • pikepdf.open now validates its keyword arguments properly, potentially breaking code that passed invalid arguments

  • libqpdf 8.1.0 is now required - libqpdf 8.1.0 API is now used for creating Unicode strings

  • If a non-existent file is opened with pikepdf.open, a FileNotFoundError is raised instead of a generic error

  • We are now temporarily vendoring a copy of pybind11 since its master branch contains unreleased and important fixes for Python 3.7.

Updates

  • The syntax Name.Thing (e.g. Name.DecodeParms) is now supported as equivalent to Name('/Thing') and is the recommended way to refer names within a PDF

  • New API Pdf.remove_unneeded_resources() which removes objects from each page’s resource dictionary that are not used in the page. This can be used to create smaller files.

Fixes

  • Fixed an error parsing inline images that have masks

  • Fixed several instances of catching C++ exceptions by value instead of by reference

v0.3.0

Breaking

  • Modified Object.write method signature to require filter and decode_parms as keyword arguments

  • Implement automatic type conversion from the PDF Null type to None

  • Removed Object.unparse_resolved in favor of Object.unparse(resolved=True)

  • libqpdf 8.0.2 is now required at minimum

Updates

  • Improved IPython/Jupyter interface to directly export temporary PDFs

  • Updated to qpdf 8.1.0 in wheels

  • Added Python 3.7 support for Windows

  • Added a number of missing options from QPDF to Pdf.open and Pdf.save

  • Added ability to delete a slice of pages

  • Began using Jupyter notebooks for documentation

v0.2.2

  • Added Python 3.7 support to build and test (not yet available for Windows, due to lack of availability on Appveyor)

  • Removed setter API from PdfImage because it never worked anyway

  • Improved handling of PdfImage with trivial palettes

v0.2.1

  • Object.check_owner renamed to Object.is_owned_by

  • Object.objgen and Object.get_object_id are now public functions

  • Major internal reorganization with pikepdf.models becoming the submodule that holds support code to ease access to PDF objects as opposed to wrapping QPDF.

v0.2.0

  • Implemented automatic type conversion for int, bool and Decimal, eliminating the pikepdf.{Integer,Boolean,Real} types. Removed a lot of associated numerical code.

Everything before v0.2.0 can be considered too old to document.