Main objects

class pikepdf.Pdf

In-memory representation of a PDF


The /Root object of the PDF.

add_blank_page(*, page_size=(612, 792))

Add a blank page to this PD. If pages already exist, the page will be added to the end. Pages may be reordered using Pdf.pages.

The caller may add content to the page by modifying its objects after creating it.

Parameters:page_size (tuple) – The size of the page in PDF units (1/72 inch or 0.35mm). Default size is set to a US Letter 8.5” x 11” page.

Report permissions associated with this PDF.

By default these permissions will be replicated when the PDF is saved. Permissions may also only be changed when a PDF is being saved, and are only available for encrypted PDFs. If a PDF is not encrypted, all operations are reported as allowed.

pikepdf has no way of enforcing permissions.


Check if PDF is well-formed. Similar to qpdf --check.

Returns:list of strings describing errors of warnings in the PDF
check_linearization(self: pikepdf.Pdf, stream: object = sys.stderr) → None

Reports information on the PDF’s linearization

Parameters:stream – A stream to write this information too; must implement .write() and .flush() method. Defaults to sys.stderr.

Close a Pdf object and release resources acquired by pikepdf.

If pikepdf opened the file handle it will close it (e.g. when opened with a file path). If the caller opened the file for pikepdf, the caller close the file.

pikepdf lazily loads data from PDFs, so some pikepdf.Object may implicitly depend on the pikepdf.Pdf being open. This is always the case for pikepdf.Stream but can be true for any object. Do not close the Pdf object if you might still be accessing content from it.

When an Object is copied from one Pdf to another, the Object is copied into the destination Pdf immediately, so after accessing all desired information from the source Pdf it may be closed.


Closing the Pdf is currently implemented by resetting it to an empty sentinel. It is currently possible to edit the sentinel as if it were a live object. This behavior should not be relied on and is subject to change.

copy_foreign(self: pikepdf.Pdf, h: pikepdf.Object) → pikepdf.Object

Copy object from foreign PDF to this one.


Access the (deprecated) document information dictionary.

The document information dictionary is a brief metadata record that can store some information about the origin of a PDF. It is deprecated and removed in the PDF 2.0 specification. Use the .open_metadata() API instead, which will edit the modern (and unfortunately, more complicated) XMP metadata object and synchronize changes to the document information dictionary.

This property simplifies access to the actual document information dictionary and ensures that it is created correctly if it needs to be created. A new dictionary will be created if this property is accessed and dictionary does not exist. To delete the dictionary use del pdf.trailer.Info.


Report encryption information for this PDF.

Encryption settings may only be changed when a PDF is saved.

Returns: pikepdf.models.EncryptionInfo


The source filename of an existing PDF, when available.

get_object(*args, **kwargs)

Overloaded function.

  1. get_object(self: pikepdf.Pdf, objgen: Tuple[int, int]) -> pikepdf.Object

    Look up an object by ID and generation number

    Return type:


  2. get_object(self: pikepdf.Pdf, objid: int, gen: int) -> pikepdf.Object

    Look up an object by ID and generation number

    Return type:


get_warnings(self: pikepdf.Pdf) → list

Returns True if the PDF is linearized.

Specifically returns True iff the file starts with a linearization parameter dictionary. Does no additional validation.

make_indirect(*args, **kwargs)

Overloaded function.

  1. make_indirect(self: pikepdf.Pdf, h: pikepdf.Object) -> pikepdf.Object

    Attach an object to the Pdf as an indirect object

    Direct objects appear inline in the binary encoding of the PDF. Indirect objects appear inline as references (in English, “look up object 4 generation 0”) and then read from another location in the file. The PDF specification requires that certain objects are indirect - consult the PDF specification to confirm.

    Generally a resource that is shared should be attached as an indirect object. pikepdf.Stream objects are always indirect, and creating them will automatically attach it to the Pdf.

    See Also:


    Return type:


  2. make_indirect(self: pikepdf.Pdf, obj: object) -> pikepdf.Object

    Encode a Python object and attach to this Pdf as an indirect object

    Return type:



Create a new pikepdf.Stream object that is attached to this PDF.

Parameters:data (bytes) – Binary data for the stream object
static new() → pikepdf.Pdf

Create a new empty PDF from stratch.


Return an iterable list of all objects in the PDF.

After deleting content from a PDF such as pages, objects related to that page, such as images on the page, may still be present.

Retun type:
static open(filename_or_stream: object, password: str = '', hex_password: bool = False, ignore_xref_streams: bool = False, suppress_warnings: bool = True, attempt_recovery: bool = True, inherit_page_attributes: bool = True) → pikepdf.Pdf

Open an existing file at filename_or_stream.

If filename_or_stream is path-like, the file will be opened for reading. The file should not be modified by another process while it is open in pikepdf, or undefined behavior may occur. This is because the file may be lazily loaded. Despite this restriction, pikepdf does not try to use any OS services to obtain an exclusive lock on the file. Some applications may want to attempt this or copy the file to a temporary location before editing.

Any changes to the file must be persisted by using .save().

If filename_or_stream has .read() and .seek() methods, the file will be accessed as a readable binary stream. pikepdf will read the entire stream into a private buffer.

.open() may be used in a with-block; .close() will be called when the block exits.


>>> with"test.pdf") as pdf:
>>> pdf ="test.pdf", password="rosebud")
  • filename_or_stream (os.PathLike) – Filename of PDF to open
  • password (str or bytes) – User or owner password to open an encrypted PDF. If the type of this parameter is str it will be encoded as UTF-8. If the type is bytes it will be saved verbatim. Passwords are always padded or truncated to 32 bytes internally. Use ASCII passwords for maximum compatibility.
  • hex_password (bool) – If True, interpret the password as a hex-encoded version of the exact encryption key to use, without performing the normal key computation. Useful in forensics.
  • ignore_xref_streams (bool) – If True, ignore cross-reference streams. See qpdf documentation.
  • suppress_warnings (bool) – If True (default), warnings are not printed to stderr. Use pikepdf.Pdf.get_warnings() to retrieve warnings.
  • attempt_recovery (bool) – If True (default), attempt to recover from PDF parsing errors.
  • inherit_page_attributes (bool) – If True (default), push attributes set on a group of pages to individual pages
open_metadata(set_pikepdf_as_editor=True, update_docinfo=True, strict=False)

Open the PDF’s XMP metadata for editing.

There is no .close() function on the metadata object, since this is intended to be used inside a with block only.

For historical reasons, certain parts of PDF metadata are stored in two different locations and formats. This feature coordinates edits so that both types of metadata are updated consistently and “atomically” (assuming single threaded access). It operates on the Pdf in memory, not any file on disk. To persist metadata changes, you must still use


>>> with pdf.open_metadata() as meta:
        meta['dc:title'] = 'Set the Dublic Core Title'
        meta['dc:description'] = 'Put the Abstract here'
  • set_pikepdf_as_editor (bool) – Update the metadata to show that this version of pikepdf is the most recent software to modify the metadata. Recommended, except for testing.
  • update_docinfo (bool) – Update the standard fields of DocumentInfo (the old PDF metadata dictionary) to match the corresponding XMP fields. The mapping is described in PdfMetadata.DOCINFO_MAPPING. Nonstandard DocumentInfo fields and XMP metadata fields with no DocumentInfo equivalent are ignored.
  • strict (bool) – If False (the default), we aggressively attempt to recover from any parse errors in XMP, and if that fails we overwrite the XMP with an empty XMP record. If True, raise errors when either metadata bytes are not valid and well-formed XMP (and thus, XML). Some trivial cases that are equivalent to empty or incomplete “XMP skeletons” are never treated as errors, and always replaced with a proper empty XMP block. Certain errors may be logged.


open_outline(max_depth=15, strict=False)

Open the PDF outline (“bookmarks”) for editing.

Recommend for use in a with block. Changes are committed to the PDF when the block exits. (The Pdf must still be opened.)


>>> with pdf.open_outline() as outline:
        outline.root.insert(0, OutlineItem('Intro', 0))
  • max_depth (int) – Maximum recursion depth of the outline to be imported and re-written to the document. 0 means only considering the root level, 1 the first-level sub-outline of each root element, and so on. Items beyond this depth will be silently ignored. Default is 15.
  • strict (bool) – With the default behavior (set to False), structural errors (e.g. reference loops) in the PDF document will only cancel processing further nodes on that particular level, recovering the valid parts of the document outline without raising an exception. When set to True, any such error will raise an OutlineStructureError, leaving the invalid parts in place. Similarly, outline objects that have been accidentally duplicated in the Outline container will be silently fixed (i.e. reproduced as new objects) or raise an OutlineStructureError.



Returns the list of pages.

Return type:

The version of the PDF specification used for this file, such as ‘1.7’.

remove_unreferenced_resources(self: pikepdf.Pdf) → None

Remove from /Resources of each page any object not referenced in page’s contents

PDF pages may share resource dictionaries with other pages. If pikepdf is used for page splitting, pages may reference resources in their /Resources dictionary that are not actually required. This purges all unnecessary resource entries.

Suggested before saving.


Alias for .Root, the /Root object of the PDF.

save(self: pikepdf.Pdf, filename: object, static_id: bool = False, preserve_pdfa: bool = True, min_version: object = '', force_version: object = '', fix_metadata_version: bool = True, compress_streams: bool = True, stream_decode_level: object = None, object_stream_mode: pikepdf._qpdf.ObjectStreamMode = ObjectStreamMode.preserve, normalize_content: bool = False, linearize: bool = False, qdf: bool = False, progress: object = None, encryption: object = None) → None

Save all modifications to this pikepdf.Pdf.

  • filename (str or stream) – Where to write the output. If a file exists in this location it will be overwritten. The file should not be the same as the input file, because data from the input file may be lazily loaded; as such overwriting in place will null-out objects.
  • static_id (bool) – Indicates that the /ID metadata, normally calculated as a hash of certain PDF contents and metadata including the current time, should instead be generated deterministically. Normally for debugging.
  • preserve_pdfa (bool) – Ensures that the file is generated in a manner compliant with PDF/A and other stricter variants. This should be True, the default, in most cases.
  • min_version (str or tuple) – Sets the minimum version of PDF specification that should be required. If left alone QPDF will decide. If a tuple, the second element is an integer, the extension level. If the version number is not a valid format, QPDF will decide what to do.
  • force_version (str or tuple) – Override the version recommend by QPDF, potentially creating an invalid file that does not display in old versions. See QPDF manual for details. If a tuple, the second element is an integer, the extension level.
  • fix_metadata_version (bool) – If True (default) and the XMP metadata contains the optional PDF version field, ensure the version in metadata is correct. If the XMP metadata does not contain a PDF version field, none will be added. To ensure that the field is added, edit the metadata and insert a placeholder value in pdf:PDFVersion. If XMP metadata does not exist, it will not be created regardless of the value of this argument.
  • object_stream_mode (pikepdf.ObjectStreamMode) – disable prevents the use of object streams. preserve keeps object streams from the input file. generate uses object streams wherever possible, creating the smallest files but requiring PDF 1.5+.
  • compress_streams (bool) – Enables or disables the compression of stream objects in the PDF. Metadata is never compressed. By default this is set to True, and should be except for debugging.
  • stream_decode_level (pikepdf.StreamDecodeLevel) – Specifies how to encode stream objects. See documentation for StreamDecodeLevel.
  • normalize_content (bool) – Enables parsing and reformatting the content stream within PDFs. This may debugging PDFs easier.
  • linearize (bool) – Enables creating linear or “fast web view”, where the file’s contents are organized sequentially so that a viewer can begin rendering before it has the whole file. As a drawback, it tends to make files larger.
  • qdf (bool) – Save output QDF mode. QDF mode is a special output mode in QPDF to allow editing of PDFs in a text editor. Use the program fix-qdf to fix convert back to a standard PDF.
  • progress (callable) – Specify a callback function that is called as the PDF is written. The function will be called with an integer between 0-100 as the sole parameter, the progress percentage. This function may not access or modify the PDF while it is being written, or data corruption will almost certainly occur.
  • encryption (pikepdf.models.Encryption or bool) – If False or omitted, existing encryption will be removed. If True encryption settings are copied from the originating PDF. Alternately, an Encryption object may be provided that sets the parameters for new encryption.

You may call .save() multiple times with different parameters to generate different versions of a file, and you may continue to modify the file after saving it. .save() does not modify the Pdf object in memory, except possibly by updating the XMP metadata version with fix_metadata_version.


pikepdf.Pdf.remove_unreferenced_resources() before saving may eliminate unnecessary resources from the output file, so calling this method before saving is recommended. This is not done automatically because .save() is intended to be idempotent.


pikepdf can read PDFs will incremental updates, but always any coalesces incremental updates into a single non-incremental PDF file when saving.

show_xref_table(self: pikepdf.Pdf) → None

Pretty-print the Pdf’s xref (cross-reference table)


Provides access to the PDF trailer object.

See section 7.5.5 of the PDF reference manual. Generally speaking, the trailer should not be modified with pikepdf, and modifying it may not work. Some of the values in the trailer are automatically changed when a file is saved.*args, **kwargs)

Alias for Open a PDF.*args, **kwargs)

Alias for Create a new empty PDF.

class pikepdf.ObjectStreamMode

Options for saving streams within PDFs, which are more a compact way of saving certains types of data that was added in PDF 1.5. All modern PDF viewers support object streams, but some third party tools and libraries cannot read them.


Disable the use of object streams. If any object streams exist in the file, remove them when the file is saved.


Preserve any existing object streams in the original file. This is the default behavior.


Generate object streams.

class pikepdf.StreamDecodeLevel

Options for decoding streams within PDFs.


Do not attempt to apply any filters. Streams remain as they appear in the original file. Note that uncompressed streams may still be compressed on output. You can disable that by saving with .save(..., compress_streams=False).


This is the default. libqpdf will apply LZWDecode, ASCII85Decode, ASCIIHexDecode, and FlateDecode filters on the input. When saved with compress_streams=True, the default, the effect of this is that streams filtered with these older and less efficient filters will be recompressed with the Flate filter. As a special case, if a stream is already compressed with FlateDecode and compress_streams=True, the original compressed data will be preserved.


In addition to uncompressing the generalized compression formats, supported non-lossy compression will also be be decoded. At present, this includes the RunLengthDecode filter.


In addition to generalized and non-lossy specialized filters, supported lossy compression filters will be applied. At present, this includes DCTDecode (JPEG) compression. Note that compressing the resulting data with DCTDecode again will accumulate loss, so avoid multiple compression and decompression cycles. This is mostly useful for (low-level) retrieving image data; see pikepdf.PdfImage for the preferred method.

class pikepdf.Encryption(*, owner, user, R=6, allow=Permissions(accessibility=True, extract=True, modify_annotation=True, modify_assembly=False, modify_form=True, modify_other=True, print_highres=True, print_lowres=True), aes=True, metadata=True)

Specify the encryption settings to apply when a PDF is saved.

  • owner (str) – The owner password to use. This allows full control of the file. If blank, the PDF will be encrypted and present as “(SECURED)” in PDF viewers. If the owner password is blank, the user password should be as well.
  • user (str) – The user password to use. With this password, some restrictions will be imposed by a typical PDF reader. If blank, the PDF can be opened by anyone, but only modified as allowed by the permissions in allow.
  • R (int) – Select the security handler algorithm to use. Choose from: 2, 3, 4 or 6. By default, the highest version of is selected (6). 5 is a deprecated algorithm that should not be used.
  • allow (pikepdf.Permissions) – The permissions to set. If omitted, all permissions are granted to the user.
  • aes (bool) – If True, request the AES algorithm. If False, use RC4. If omitted, AES is selected whenever possible (R >= 4).
  • metadata (bool) – If True, also encrypt the PDF metadata. If False, metadata is not encrypted. Reading document metadata without decryption may be desirable in some cases. Requires aes=True. If omitted, metadata is encrypted whenever possible.
exception pikepdf.PdfError
exception pikepdf.PasswordError

Object construction

class pikepdf.Object
append(self: pikepdf.Object, arg0: object) → None

Append another object to an array; fails if the object is not an array.

as_dict(self: pikepdf.Object) → pikepdf._qpdf._ObjectMapping
as_list(self: pikepdf.Object) → pikepdf._qpdf._ObjectList

Copy all items from other without making a new object.

Particularly when working with pages, it may be desirable to remove all of the existing page’s contents and emplace (insert) a new page on top of it, in a way that preserves all links and references to the original page. (Or similarly, for other Dictionary objects in a PDF.)

When a page is assigned (pdf.pages[0] = new_page), only the application knows if references to the original the original page are still valid. For example, a PDF optimizer might restructure a page object into another visually similar one, and references would be valid; but for a program that reorganizes page contents such as a N-up compositor, references may not be valid anymore.

This method takes precautions to ensure that child objects in common with self and other are not inadvertently deleted.


>>> pdf.pages[0].objgen
(16, 0)
>>> pdf.pages[0].emplace(pdf.pages[1])
>>> pdf.pages[0].objgen
(16, 0)  # Same object
extend(self: pikepdf.Object, arg0: iterable) → None

Extend a pikepdf.Array with an iterable of other objects.

get(*args, **kwargs)

Overloaded function.

  1. get(self: pikepdf.Object, key: str, default: object = None) -> object

For pikepdf.Dictionary or pikepdf.Stream objects, behave as dict.get(key, default=None)

  1. get(self: pikepdf.Object, key: pikepdf.Object, default: object = None) -> object

For pikepdf.Dictionary or pikepdf.Stream objects, behave as dict.get(key, default=None)

get_raw_stream_buffer(self: pikepdf.Object) → pikepdf._qpdf.Buffer

Return a buffer protocol buffer describing the raw, encoded stream

get_stream_buffer(self: pikepdf.Object, decode_level: pikepdf._qpdf.StreamDecodeLevel = StreamDecodeLevel.generalized) → pikepdf._qpdf.Buffer

Return a buffer protocol buffer describing the decoded stream.

is_owned_by(self: pikepdf.Object, possible_owner: pikepdf.Pdf) → bool

Test if this object is owned by the indicated possible_owner.


Returns True if the object is a rectangle (an array of 4 numbers)

items(self: pikepdf.Object) → iterable
keys(self: pikepdf.Object) → Set[str]

For pikepdf.Dictionary or pikepdf.Stream objects, obtain the keys.


Return the object-generation number pair for this object.

If this is a direct object, then the returned value is (0, 0). By definition, if this is an indirect object, it has a “objgen”, and can be looked up using this in the cross-reference (xref) table. Direct objects cannot necessarily be looked up.

The generation number is usually 0, except for PDFs that have been incrementally updated. Incrementally updated PDFs are now uncommon, since it does not take too long for modern CPUs to reconstruct an entire PDF. pikepdf will consolidate all incremental updates when saving.

page_contents_add(self: pikepdf.Object, contents: pikepdf.Object, prepend: bool = False) → None

Append or prepend to an existing page’s content stream.

page_contents_coalesce(self: pikepdf.Object) → None

Coalesce an array of page content streams into a single content stream.

The PDF specification allows the /Contents object to contain either an array of content streams or a single content stream. However, it simplifies parsing and editing if there is only a single content stream. This function merges all content streams.

static parse(stream: str, description: str = '') → pikepdf.Object

Parse PDF binary representation into PDF objects.

read_bytes(self: pikepdf.Object, decode_level: pikepdf._qpdf.StreamDecodeLevel = StreamDecodeLevel.generalized) → bytes

Decode and read the content stream associated with this object.

read_raw_bytes(self: pikepdf.Object) → bytes

Read the content stream associated with this object without decoding

same_owner_as(self: pikepdf.Object, arg0: pikepdf.Object) → bool

Test if two objects are owned by the same pikepdf.Pdf.


Access the dictionary key-values for a pikepdf.Stream.

to_json(self: pikepdf.Object, dereference: bool = False) → bytes

Convert to a QPDF JSON representation of the object.

See the QPDF manual for a description of its JSON representation.

Not necessarily compatible with other PDF-JSON representations that exist in the wild.

  • Names are encoded as UTF-8 strings
  • Indirect references are encoded as strings containing obj gen R
  • Strings are encoded as UTF-8 strings with unrepresentable binary characters encoded as \uHHHH
  • Encoding streams just encodes the stream’s dictionary; the stream data is not represented
  • Object types that are only valid in content streams (inline image, operator) as well as “reserved” objects are not representable and will be serialized as null.
Parameters:dereference (bool) – If True, dereference the object is this is an indirect object.
Returns:JSON bytestring of object. The object is UTF-8 encoded and may be decoded to a Python str that represents the binary values \x00-\xFF as U+0000 to U+00FF; that is, it may contain mojibake.
Return type:bytes
unparse(self: pikepdf.Object, resolved: bool = False) → bytes

Convert PDF objects into their binary representation, optionally resolving indirect objects.

wrap_in_array(self: pikepdf.Object) → pikepdf.Object

Return the object wrapped in an array if not already an array.

write(data, *, filter=None, decode_parms=None, type_check=True)

Replace stream object’s data with new (possibly compressed) data.

filter and decode_parms specify that compression that is present on the input data.

When writing the PDF in, pikepdf may change the compression or apply compression to data that was not compressed, depending on the parameters given to that function. It will never change lossless to lossy encoding.

PNG and TIFF images, even if compressed, cannot be directly inserted into a PDF and displayed as images.

  • data (bytes) – the new data to use for replacement
  • filter (pikepdf.Name or pikepdf.Array) – The filter(s) with which the data is (already) encoded
  • decode_parms (pikepdf.Dictionary or pikepdf.Array) – Parameters for the filters with which the object is encode
  • type_check (bool) – Check arguments; use False only if you want to intentionally create malformed PDFs.

If only one filter is specified, it may be a name such as Name(‘/FlateDecode’). If there are multiple filters, then array of names should be given.

If there is only one filter, decode_parms is a Dictionary of parameters for that filter. If there are multiple filters, then decode_parms is an Array of Dictionary, where each array index is corresponds to the filter.

class pikepdf.Name

Constructs a PDF Name object

Names can be constructed with two notations:

  1. Name.Resources
  2. Name('/Resources')

The two are semantically equivalent. The former is preferred for names that are normally expected to be in a PDF. The latter is preferred for dynamic names and attributes.

static __new__(cls, name)

Create and return a new object. See help(type) for accurate signature.

class pikepdf.String

Constructs a PDF String object

static __new__(cls, s)
Parameters:s (str or bytes) – The string to use. String will be encoded for PDF, bytes will be constructed without encoding.
class pikepdf.Array

Constructs a PDF Array object

static __new__(cls, a=None)
Parameters:a (iterable) – An iterable of objects. All objects must be either pikepdf.Object or convertible to pikepdf.Object.
class pikepdf.Dictionary

Constructs a PDF Dictionary object

static __new__(cls, d=None, **kwargs)

Constructs a PDF Dictionary from either a Python dict or keyword arguments.

These two examples are equivalent:

pikepdf.Dictionary({'/NameOne': 1, '/NameTwo': 'Two'})

pikepdf.Dictionary(NameOne=1, NameTwo='Two')

In either case, the keys must be strings, and the strings correspond to the desired Names in the PDF Dictionary. The values must all be convertible to pikepdf.Object.

class pikepdf.Stream

Constructs a PDF Stream object

static __new__(cls, owner, obj)


class pikepdf.Operator

Internal objects

These objects are returned by other pikepdf objects. They are part of the API, but not intended to be created explicitly.

class pikepdf._qpdf.PageList

A list-like object enumerating all pages in a pikepdf.Pdf.

append(self: pikepdf._qpdf.PageList, page: object) → None

Add another page to the end.

extend(*args, **kwargs)

Overloaded function.

  1. extend(self: pikepdf._qpdf.PageList, other: pikepdf._qpdf.PageList) -> None

Extend the Pdf by adding pages from another Pdf.pages.

  1. extend(self: pikepdf._qpdf.PageList, iterable: iterable) -> None

Extend the Pdf by adding pages from an iterable of pages.

insert(self: pikepdf._qpdf.PageList, index: int, obj: object) → None

Insert a page at the specified location.

  • index (int) – location at which to insert page, 0-based indexing
  • obj (pikepdf.Object) – page object to insert
p(self: pikepdf._qpdf.PageList, pnum: int) → pikepdf.Object

Convenience - look up page number in ordinal numbering, .p(1) is first page

remove(self: pikepdf._qpdf.PageList, **kwargs) → None

Remove a page (using 1-based numbering)

Parameters:p (int) – 1-based page number
reverse(self: pikepdf._qpdf.PageList) → None

Reverse the order of pages.