Support models¶
Support models are abstracts over “raw” objects within a Pdf. For example, a page
in a PDF is a Dictionary with set to /Type
of /Page
. The Dictionary in
that case is the “raw” object. Upon establishing what type of object it is, we
can wrap it with a support model that adds features to ensure consistency with
the PDF specification.
In version 2.x, did not apply support models to “raw” objects automatically.
Version 3.x automatically applies support models to /Page
objects.
- class pikepdf.ObjectHelper¶
Base class for wrapper/helper around an Object.
Used to expose additional functionality specific to that object type.
- property obj¶
Get the underlying
pikepdf.Object
.
- class pikepdf.Page¶
Support model wrapper around a page dictionary object.
- add_content_token_filter(self: pikepdf.Page, tf: pikepdf.TokenFilter) None ¶
Attach a
pikepdf.TokenFilter
to a page’s content stream.This function applies token filters lazily, if/when the page’s content stream is read for any reason, such as when the PDF is saved. If never access, the token filter is not applied.
Multiple token filters may be added to a page/content stream.
Token filters may not be removed after being attached to a Pdf. Close and reopen the Pdf to remove token filters.
If the page’s contents is an array of streams, it is coalesced.
- add_overlay(other, rect=None, *, push_stack=True, shrink=True, expand=True)¶
Overlay another object on this page.
Overlays will be drawn after all previous content, potentially drawing on top of existing content.
- Parameters
other (pikepdf.objects.Object | pikepdf._core.Page) – A Page or Form XObject to render as an overlay on top of this page.
rect (pikepdf._core.Rectangle | None) – The PDF rectangle (in PDF units) in which to draw the overlay. If omitted, this page’s trimbox, cropbox or mediabox (in that order) will be used.
push_stack (bool) – If True (default), push the graphics stack of the existing content stream to ensure that the overlay is rendered correctly. Officially PDF limits the graphics stack depth to 32. Most viewers will tolerate more, but excessive pushes may cause problems. Multiple content streams may also be coalesced into a single content stream where this parameter is True, since the PDF specification permits PDF writers to coalesce streams as they see fit.
shrink (bool) – If True (default), allow the object to shrink to fit inside the rectangle. The aspect ratio will be preserved.
expand (bool) – If True (default), allow the object to expand to fit inside the rectangle. The aspect ratio will be preserved.
- Returns
The name of the Form XObject that contains the overlay.
- Return type
New in version 2.14.
Changed in version 4.0.0: Added the push_stack parameter. Previously, this method behaved as if push_stack were False.
Changed in version 4.2.0: Added the shrink and expand parameters. Previously, this method behaved as if
shrink=True, expand=False
.Changed in version 4.3.0: Returns the name of the overlay in the resources dictionary instead of returning None.
- add_resource(res, res_type, name=None, *, prefix='', replace_existing=True)¶
Add a new resource to the page’s Resources dictionary.
If the Resources dictionaries do not exist, they will be created.
- Parameters
self – The object to add to the resources dictionary.
res (Object) – The dictionary object to insert into the resources dictionary.
res_type (Name) – Should be one of the following Resource dictionary types: ExtGState, ColorSpace, Pattern, Shading, XObject, Font, Properties.
name (pikepdf.objects.Name | None) – The name of the object. If omitted, a random name will be generated with enough randomness to be globally unique.
prefix (str) – A prefix for the name of the object. Allows conveniently namespacing when using random names, e.g. prefix=”Im” for images. Mutually exclusive with name parameter.
replace_existing (bool) – If the name already exists in one of the resource dictionaries, remove it.
- Return type
Example
>>> resource_name = pdf.pages[0].add_resource(formxobj, Name.XObject)
New in version 2.3.
Changed in version 2.14: If res does not belong to the same Pdf that owns this page, a copy of res is automatically created and added instead. In previous versions, it was necessary to change for this case manually.
Changed in version 4.3.0: Returns the name of the overlay in the resources dictionary instead of returning None.
- add_underlay(other, rect=None, *, shrink=True, expand=True)¶
Underlay another object beneath this page.
Underlays will be drawn before all other content, so they may be overdrawn partially or completely.
There is no push_stack parameter for this function, since adding an underlay can be done without manipulating the graphics stack.
- Parameters
other (pikepdf.objects.Object | pikepdf._core.Page) – A Page or Form XObject to render as an underlay underneath this page.
rect (pikepdf._core.Rectangle | None) – The PDF rectangle (in PDF units) in which to draw the underlay. If omitted, this page’s trimbox, cropbox or mediabox (in that order) will be used.
shrink (bool) – If True (default), allow the object to shrink to fit inside the rectangle. The aspect ratio will be preserved.
expand (bool) – If True (default), allow the object to expand to fit inside the rectangle. The aspect ratio will be preserved.
- Returns
The name of the Form XObject that contains the underlay.
- Return type
New in version 2.14.
Changed in version 4.2.0: Added the shrink and expand parameters. Previously, this method behaved as if
shrink=True, expand=False
. Fixed issue with wrong page rect being selected.
- as_form_xobject(self: pikepdf.Page, handle_transformations: bool = True) pikepdf.Object ¶
Return a form XObject that draws this page.
This is useful for n-up operations, underlay, overlay, thumbnail generation, or any other case in which it is useful to replicate the contents of a page in some other context. The dictionaries are shallow copies of the original page dictionary, and the contents are coalesced from the page’s contents. The resulting object handle is not referenced anywhere.
- Parameters
handle_transformations (bool) – If True, the resulting form XObject’s
/Matrix
will be set to replicate rotation (/Rotate
) and scaling (/UserUnit
) in the page’s dictionary. In this way, the page’s transformations will be preserved when placing this object on another page.
- calc_form_xobject_placement(self: pikepdf.Page, formx: pikepdf.Object, name: pikepdf.Object, rect: pikepdf.Rectangle, *, invert_transformations: bool = True, allow_shrink: bool = True, allow_expand: bool = False) bytes ¶
Generate content stream segment to place a Form XObject on this page.
The content stream segment must then be added to the page’s content stream.
The default keyword parameters will preserve the aspect ratio.
- Parameters
formx – The Form XObject to place.
name – The name of the Form XObject in this page’s /Resources dictionary.
rect – Rectangle describing the desired placement of the Form XObject.
invert_transformations – Apply /Rotate and /UserUnit scaling when determining FormX Object placement.
allow_shrink – Allow the Form XObject to take less than the full dimensions of rect.
allow_expand – Expand the Form XObject to occupy all of rect.
New in version 2.14.
- contents_add(contents, *, prepend=False)¶
Append or prepend to an existing page’s content stream.
- Parameters
contents (pikepdf.objects.Stream | bytes) – An existing content stream to append or prepend.
prepend (bool) – Prepend if true, append if false (default).
New in version 2.14.
- contents_coalesce(self: pikepdf.Page) None ¶
Coalesce a page’s content streams.
A page’s content may be a stream or an array of streams. If this page’s content is an array, concatenate the streams into a single stream. This can be useful when working with files that split content streams in arbitrary spots, such as in the middle of a token, as that can confuse some software.
- property cropbox¶
Return page’s effective /CropBox, in PDF units.
If the /CropBox is not defined, the /MediaBox is returned.
- externalize_inline_images(self: pikepdf.Page, min_size: int = 0, shallow: bool = False) None ¶
Convert inline image to normal (external) images.
- property form_xobjects: _ObjectMapping¶
Return all Form XObjects associated with this page.
This method does not recurse into nested Form XObjects.
New in version 7.0.0.
- get_filtered_contents(self: pikepdf.Page, tf: pikepdf.TokenFilter) bytes ¶
Apply a
pikepdf.TokenFilter
to a content stream, without modifying it.This may be used when the results of a token filter do not need to be applied, such as when filtering is being used to retrieve information rather than edit the content stream.
Note that it is possible to create a subclassed
TokenFilter
that saves information of interest to its object attributes; it is not necessary to return data in the content stream.To modify the content stream, use
pikepdf.Page.add_content_token_filter()
.- Returns
The modified content stream.
- property images: _ObjectMapping¶
Return all regular images associated with this page.
This method does not search for Form XObjects that contain images, and does not attempt to find inline images.
- property index¶
Returns the zero-based index of this page in the pages list.
That is, returns
n
such thatpdf.pages[n] == this_page
. AValueError
exception is thrown if the page is not attached to thisPdf
.New in version 2.2.
- property label¶
Returns the page label for this page, accounting for section numbers.
For example, if the PDF defines a preface with lower case Roman numerals (i, ii, iii…), followed by standard numbers, followed by an appendix (A-1, A-2, …), this function returns the appropriate label as a string.
It is possible for a PDF to define page labels such that multiple pages have the same labels. Labels are not guaranteed to be unique.
New in version 2.2.
Changed in version 2.9: Returns the ordinary page number if no special rules for page numbers are defined.
- property mediabox¶
Return page’s /MediaBox, in PDF units.
- property obj¶
Get the underlying
pikepdf.Object
.
- parse_contents(self: pikepdf.Page, arg0: pikepdf.StreamParser) None ¶
Parse a page’s content streams using a
pikepdf.StreamParser
.The content stream may be interpreted by the StreamParser but is not altered.
If the page’s contents is an array of streams, it is coalesced.
- remove_unreferenced_resources(self: pikepdf.Page) None ¶
Removes from the resources dictionary any object not referenced in the content stream.
A page’s resources dictionary maps names to objects elsewhere in the file. This method walks through a page’s contents and keeps tracks of which resources are referenced somewhere in the contents. Then it removes from the resources dictionary any object that is not referenced in the contents. This method is used by page splitting code to avoid copying unused objects in files that use shared resource dictionaries across multiple pages.
- property resources: Dictionary¶
Return this page’s resources dictionary.
Changed in version 7.0.0: If the resources dictionary does not exist, an empty one will be created. A TypeError is raised if a page has a /Resources key but it is not a dictionary.
- rotate(self: pikepdf.Page, angle: int, relative: bool) None ¶
Rotate a page.
If
relative
isFalse
, set the rotation of the page to angle. Otherwise, add angle to the rotation of the page.angle
must be a multiple of90
. Adding90
to the rotation rotates clockwise by90
degrees.
- property trimbox¶
Return page’s effective /TrimBox, in PDF units.
If the /TrimBox is not defined, the /CropBox is returned (and if /CropBox is not defined, /MediaBox is returned).
- class pikepdf.PdfMatrix(*args)¶
Support class for PDF content stream matrices.
PDF content stream matrices are 3x3 matrices summarized by a shorthand
(a, b, c, d, e, f)
, where the first column vector is(a, c, e)
and the second column vector is(b, d, f)
. The final column vector is always(0, 0, 1)
since PDF uses homogenous coordinates.a
is the horizontal scaling factor.b
is horizontal skewing.c
is vertical skewing.d
is the vertical scaling factor.e
is the horizontal translation.f
is the vertical translation.For scaling,
a
andd
are the scaling factors in the horizontal and vertical directions, respectively; for pure scaling,b
andc
are zero.PDF uses row vectors. That is,
vr @ A'
gives the effect of transforming a row vectorvr=(x, y, 1)
by the matrixA'
. Most textbook treatments useA @ vc
where the column vectorvc=(x, y, 1)'
.Matrices should be premultipled with other matrices to concatenate transformations.
(
@
is the Python matrix multiplication operator.)Addition and other operations are not implemented because they’re not that meaningful in a PDF context (they can be defined and are mathematically meaningful in general).
PdfMatrix objects are immutable. All transformations on them produce a new matrix.
- a¶
- b¶
- c¶
- d¶
- e¶
- f¶
Return one of the six “active values” of the affine matrix.
e
andf
correspond to x- and y-axis translation respectively. The other four letters are a 2×2 matrix that can express rotation, scaling and skewing;a=1 b=0 c=0 d=1
is the identity matrix.
- property a¶
Return the horizontal scaling factor.
- property b¶
Return horizontal skew.
- property c¶
Return vertical skew.
- property d¶
Return the vertical scaling factor.
- property e¶
Return the horizontal translation.
Typically corresponds to translation on the x-axis.
- encode()¶
Encode this matrix in binary suitable for including in a PDF.
- property f¶
Return the vertical translation.
Typically corresponds to translation on the y-axis.
- static identity()¶
Return an identity matrix.
- inverse()¶
Return the inverse of this matrix.
The inverse matrix reverses the transformation of the original matrix.
This function requires numpy, which is an optional dependency of pikepdf. If numpy is not installed, an ImportError will be raised.
- rotated(angle_degrees_ccw)¶
Concatenate a rotation matrix to this matrix.
- scaled(x, y)¶
Concatenate a scaling matrix to this matrix.
- property shorthand¶
Return the 6-tuple (a,b,c,d,e,f) that describes this matrix.
- translated(x, y)¶
Translate this matrix.
- class pikepdf.PdfImage(obj)¶
Support class to provide a consistent API for manipulating PDF images.
The data structure for images inside PDFs is irregular and complex, making it difficult to use without introducing errors for less typical cases. This class addresses these difficulties by providing a regular, Pythonic API similar in spirit (and convertible to) the Python Pillow imaging library.
- Parameters
obj (Stream) –
- as_pil_image()¶
Extract the image as a Pillow Image, using decompression as necessary.
Caller must close the image.
- Return type
Image
- property decode_parms¶
List of the /DecodeParms, arguments to filters.
- extract_to(*, stream=None, fileprefix='')¶
Extract the image directly to a usable image file.
If possible, the compressed data is extracted and inserted into a compressed image file format without transcoding the compressed content. If this is not possible, the data will be decompressed and extracted to an appropriate format.
Because it is not known until attempted what image format will be extracted, users should not assume what format they are getting back. When saving the image to a file, use a temporary filename, and then rename the file to its final name based on the returned file extension.
Images might be saved as any of .png, .jpg, or .tiff.
Examples
>>> im.extract_to(stream=bytes_io) '.png'
>>> im.extract_to(fileprefix='/tmp/image00') '/tmp/image00.jpg'
- Parameters
stream (BinaryIO | None) – Writable stream to write data to.
fileprefix (str or Path) – The path to write the extracted image to, without the file extension.
- Returns
If fileprefix was provided, then the fileprefix with the appropriate extension. If no fileprefix, then an extension indicating the file type.
- Return type
- property filter_decodeparms¶
Return normalized the Filter and DecodeParms data.
PDF has a lot of possible data structures concerning /Filter and /DecodeParms. /Filter can be absent or a name or an array, /DecodeParms can be absent or a dictionary (if /Filter is a name) or an array (if /Filter is an array). When both are arrays the lengths match.
Normalize this into: [(/FilterName, {/DecodeParmName: Value, …}), …]
The order of /Filter matters as indicates the encoding/decoding sequence.
- property filters¶
List of names of the filters that we applied to encode this image.
- get_stream_buffer(decode_level=<StreamDecodeLevel.specialized: 2>)¶
Access this image with the buffer protocol.
- Parameters
decode_level (StreamDecodeLevel) –
- Return type
Buffer
- property icc: PIL.ImageCms.ImageCmsProfile | None¶
If an ICC profile is attached, return a Pillow object that describe it.
Most of the information may be found in
icc.profile
.
- property mode: str¶
PIL.Image.mode
equivalent for this image, where possible.If an ICC profile is attached to the image, we still attempt to resolve a Pillow mode.
- property palette: pikepdf.models.image.PaletteData | None¶
Retrieve the color palette for this image if applicable.
- read_bytes(decode_level=<StreamDecodeLevel.specialized: 2>)¶
Decompress this image and return it as unencoded bytes.
- Parameters
decode_level (StreamDecodeLevel) –
- Return type
- show()¶
Show the image however PIL wants to.
- class pikepdf.PdfInlineImage(*, image_data, image_object)¶
Support class for PDF inline images.
- class pikepdf.models.PdfMetadata(pdf, pikepdf_mark=True, sync_docinfo=True, overwrite_invalid_xml=True)¶
Read and edit the metadata associated with a PDF.
The PDF specification contain two types of metadata, the newer XMP (Extensible Metadata Platform, XML-based) and older DocumentInformation dictionary. The PDF 2.0 specification removes the DocumentInformation dictionary.
This primarily works with XMP metadata, but includes methods to generate XMP from DocumentInformation and will also coordinate updates to DocumentInformation so that the two are kept consistent.
XMP metadata fields may be accessed using the full XML namespace URI or the short name. For example
metadata['dc:description']
andmetadata['{http://purl.org/dc/elements/1.1/}description']
both refer to the same field. Several common XML namespaces are registered automatically.See the XMP specification for details of allowable fields.
To update metadata, use a with block.
Example
>>> with pdf.open_metadata() as records: records['dc:title'] = 'New Title'
See also
- load_from_docinfo(docinfo, delete_missing=False, raise_failure=False)¶
Populate the XMP metadata object with DocumentInfo.
- Parameters
- Return type
None
A few entries in the deprecated DocumentInfo dictionary are considered approximately equivalent to certain XMP records. This method copies those entries into the XMP metadata.
- property pdfa_status: str¶
Return the PDF/A conformance level claimed by this PDF, or False.
A PDF may claim to PDF/A compliant without this being true. Use an independent verifier such as veraPDF to test if a PDF is truly conformant.
- Returns
The conformance level of the PDF/A, or an empty string if the PDF does not claim PDF/A conformance. Possible valid values are: 1A, 1B, 2A, 2B, 2U, 3A, 3B, 3U.
- property pdfx_status: str¶
Return the PDF/X conformance level claimed by this PDF, or False.
A PDF may claim to PDF/X compliant without this being true. Use an independent verifier such as veraPDF to test if a PDF is truly conformant.
- Returns
The conformance level of the PDF/X, or an empty string if the PDF does not claim PDF/X conformance.
- classmethod register_xml_namespace(uri, prefix)¶
Register a new XML/XMP namespace.
- Parameters
uri – The long form of the namespace.
prefix – The alias to use when interpreting XMP.
- class pikepdf.models.Encryption(owner='', user='', R=6, allow=Permissions(accessibility=True, extract=True, modify_annotation=True, modify_assembly=False, modify_form=True, modify_other=True, print_lowres=True, print_highres=True), aes=True, metadata=True)¶
Specify the encryption settings to apply when a PDF is saved.
- Parameters
- R: Literal[2, 3, 4, 5, 6]¶
Select the security handler algorithm to use. Choose from:
2
,3
,4
or6
. By default, the highest version of is selected (6
).5
is a deprecated algorithm that should not be used.
- aes: bool¶
If True, request the AES algorithm. If False, use RC4. If omitted, AES is selected whenever possible (R >= 4).
- allow: Permissions¶
The permissions to set. If omitted, all permissions are granted to the user.
- metadata: bool¶
If True, also encrypt the PDF metadata. If False, metadata is not encrypted. Reading document metadata without decryption may be desirable in some cases. Requires
aes=True
. If omitted, metadata is encrypted whenever possible.
- class pikepdf.models.Outline(pdf, max_depth=15, strict=False)¶
Maintains a intuitive interface for creating and editing PDF document outlines.
See PDF 1.7 Reference Manual section 12.3.
- Parameters
pdf (Pdf) – PDF document object.
max_depth (int) – Maximum recursion depth to consider when reading the outline.
strict (bool) – If set to
False
(default) silently ignores structural errors. Setting it toTrue
raises apikepdf.OutlineStructureError
if any object references re-occur while the outline is being read or written.
See also
- add(title, destination)¶
Add an item to the outline.
- Parameters
title (str) – Title of the outline item.
destination (pikepdf.objects.Array | int | None) – Destination to jump to when the item is selected.
- Returns
The newly created
OutlineItem
.- Return type
- property root: list[pikepdf.models.outlines.OutlineItem]¶
Return the root node of the outline.
- class pikepdf.models.OutlineItem(title, destination=None, page_location=None, action=None, obj=None, *, left=None, top=None, right=None, bottom=None, zoom=None)¶
Manage a single item in a PDF document outlines structure.
Includes nested items.
- Parameters
title (str) – Title of the outlines item.
destination (Array | String | Name | int | None) – Page number, destination name, or any other PDF object to be used as a reference when clicking on the outlines entry. Note this should be
None
if an action is used instead. If set to a page number, it will be resolved to a reference at the time of writing the outlines back to the document.page_location (PageLocation | str | None) – Supplemental page location for a page number in
destination
, e.g.PageLocation.Fit
. May also be a simple string such as'FitH'
.action (Dictionary | None) – Action to perform when clicking on this item. Will be ignored during writing if
destination
is also set.obj (Dictionary | None) –
Dictionary
object representing this outlines item in aPdf
. May beNone
for creating a new object. If present, an existing object is modified in-place during writing and original attributes are retained.left (float | None) – Describes the viewport position associated with a destination.
top (float | None) – Describes the viewport position associated with a destination.
bottom (float | None) – Describes the viewport position associated with a destination.
right (float | None) – Describes the viewport position associated with a destination.
zoom (float | None) – Describes the viewport position associated with a destination.
This object does not contain any information about higher-level or neighboring elements.
- Valid destination arrays:
[page /XYZ left top zoom] generally [page, PageLocationEntry, 0 to 4 ints]
- classmethod from_dictionary_object(obj)¶
Create a
OutlineItem
from aDictionary
.Does not process nested items.
- Parameters
obj (Dictionary) –
Dictionary
object representing a single outline node.
- to_dictionary_object(pdf, create_new=False)¶
Create/update a
Dictionary
object from this outline node.Page numbers are resolved to a page reference on the input
Pdf
object.- Parameters
- Return type
- class pikepdf.Permissions(accessibility=True, extract=True, modify_annotation=True, modify_assembly=False, modify_form=True, modify_other=True, print_lowres=True, print_highres=True)¶
Stores the user-level permissions for an encrypted PDF.
A compliant PDF reader/writer should enforce these restrictions on people who have the user password and not the owner password. In practice, either password is sufficient to decrypt all document contents. A person who has the owner password should be allowed to modify the document in any way. pikepdf does not enforce the restrictions in any way; it is up to application developers to enforce them as they see fit.
Unencrypted PDFs implicitly have all permissions allowed. Permissions can only be changed when a PDF is saved.
- Parameters
- class pikepdf.models.EncryptionMethod¶
Describes which encryption method was used on a particular part of a PDF. These values are returned by
pikepdf.EncryptionInfo
but are not currently used to specify how encryption is requested.- none¶
Data was not encrypted.
- unknown¶
An unknown algorithm was used.
- rc4¶
The RC4 encryption algorithm was used (obsolete).
- aes¶
The AES-based algorithm was used as described in the PDF 1.7 Reference Manual.
- aesv3¶
An improved version of the AES-based algorithm was used as described in the Adobe Supplement to the ISO 32000, requiring PDF 1.7 extension level 3. This algorithm still uses AES, but allows both AES-128 and AES-256, and improves how the key is derived from the password.
- class pikepdf.models.EncryptionInfo(encdict)¶
Reports encryption information for an encrypted PDF.
This information may not be changed, except when a PDF is saved. This object is not used to specify the encryption settings to save a PDF, due to non-overlapping information requirements.
- property bits: int¶
Return the number of bits in the encryption algorithm.
e.g. if the algorithm is AES-256, this returns 256.
- property file_method: EncryptionMethod¶
Encryption method used to encode the whole file.
- property stream_method: EncryptionMethod¶
Encryption method used to encode streams.
- property string_method: EncryptionMethod¶
Encryption method used to encode strings.
- property user_password: bytes¶
If possible, return the user password.
The user password can only be retrieved when a PDF is opened with the owner password and when older versions of the encryption algorithm are used.
The password is always returned as
bytes
even if it has a clear Unicode representation.
- class pikepdf.Annotation¶
Describes an annotation in a PDF, such as a comment, underline, copy editing marks, interactive widgets, redactions, 3D objects, sound and video clips.
See the PDF 1.7 Reference Manual section 12.5.6 for the full list of annotation types and definition of terminology.
New in version 2.12.
- property appearance_dict¶
Returns the annotations appearance dictionary.
- property appearance_state¶
Returns the annotation’s appearance state (or None).
For a checkbox or radio button, the appearance state may be
pikepdf.Name.On
orpikepdf.Name.Off
.
- property flags¶
Returns the annotation’s flags.
- get_appearance_stream(*args, **kwargs)¶
Overloaded function.
get_appearance_stream(self: pikepdf.Annotation, which: pikepdf.Object) -> pikepdf.Object
Returns one of the appearance streams associated with an annotation.
- Args:
- which: Usually one of
pikepdf.Name.N
,pikepdf.Name.R
or pikepdf.Name.D
, indicating the normal, rollover or down appearance stream, respectively. If any other name is passed, an appearance stream with that name is returned.
- which: Usually one of
get_appearance_stream(self: pikepdf.Annotation, which: pikepdf.Object, state: pikepdf.Object) -> pikepdf.Object
Returns one of the appearance streams associated with an annotation.
- Args:
- which: Usually one of
pikepdf.Name.N
,pikepdf.Name.R
or pikepdf.Name.D
, indicating the normal, rollover or down appearance stream, respectively. If any other name is passed, an appearance stream with that name is returned.- state: The appearance state. For checkboxes or radio buttons, the
appearance state is usually whether the button is on or off.
- which: Usually one of
- get_page_content_for_appearance(self: pikepdf.Annotation, name: pikepdf.Object, rotate: int, required_flags: int = 0, forbidden_flags: int = 3) bytes ¶
Generate content stream text that draws this annotation as a Form XObject.
- Parameters
name (pikepdf.Name) – What to call the object we create.
rotate – Should be set to the page’s /Rotate value or 0.
Note
This method is done mainly with QPDF. Its behavior may change when different QPDF versions are used.
- property subtype¶
Returns the subtype of this annotation.
- class pikepdf._core.Attachments¶
This interface provides access to any files that are attached to this PDF, exposed as a Python
collections.abc.MutableMapping
interface.The keys (virtual filenames) are always
str
, and values are alwayspikepdf.AttachedFileSpec
.Use this interface through
pikepdf.Pdf.attachments
.New in version 3.0.
- clear() None. Remove all items from D. ¶
- get(k[, d]) D[k] if k in D, else d. d defaults to None. ¶
- items() a set-like object providing a view on D's items ¶
- keys() a set-like object providing a view on D's keys ¶
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair ¶
as a 2-tuple; but raise KeyError if D is empty.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D ¶
- update([E, ]**F) None. Update D from mapping/iterable E and F. ¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- values() an object providing a view on D's values ¶
- class pikepdf.AttachedFileSpec¶
In a PDF, a file specification provides name and metadata for a target file.
Most file specifications are simple file specifications, and contain only one attached file. Call
get_file()
to get the attached file:pdf = Pdf.open(...) fs = pdf.attachments['example.txt'] stream = fs.get_file()
To attach a new file to a PDF, you may construct a
AttachedFileSpec
.pdf = Pdf.open(...) fs = AttachedFileSpec.from_filepath(pdf, Path('somewhere/spreadsheet.xlsx')) pdf.attachments['spreadsheet.xlsx'] = fs
PDF supports the concept of having multiple, platform-specialized versions of the attached file (similar to resource forks on some operating systems). In theory, this attachment ought to be the same file, but encoded in different ways. For example, perhaps a PDF includes a text file encoded with Windows line endings (
\r\n
) and a different one with POSIX line endings (\n
). Similarly, PDF allows for the possibility that you need to encode platform-specific filenames. pikepdf cannot directly create these, because they are arguably obsolete; it can provide access to them, however.If you have to deal with platform-specialized versions, use
get_all_filenames()
to enumerate those available.Described in the PDF 1.7 Reference Manual section 7.11.3.
New in version 3.0.
- __init__(self: pikepdf.AttachedFileSpec, q: pikepdf.Pdf, data: bytes, *, description: str = '', filename: str = '', mime_type: str = '', creation_date: str = '', mod_date: str = '') None ¶
Construct a attached file spec from data in memory.
To construct a file spec from a file on the computer’s file system, use
from_filepath()
.- Parameters
data – Resource to load.
description – Any description text for the attachment. May be shown in PDF viewers.
filename – Filename to display in PDF viewers.
mime_type – Helps PDF viewers decide how to display the information.
creation_date – PDF date string for when this file was created.
mod_date – PDF date string for when this file was last modified.
- property description¶
Description text associated with the embedded file.
- property filename¶
The main filename for this file spec.
In priority order, getting this returns the first of /UF, /F, /Unix, /DOS, /Mac if multiple filenames are set. Setting this will set a UTF-8 encoded Unicode filename and write it to /UF.
- from_filepath(path, *, description='')¶
Construct a file specification from a file path.
This function will automatically add a creation and modified date using the file system, and a MIME type inferred from the file’s extension.
If the data required for the attach is in memory, use
pikepdf.AttachedFileSpec()
instead.- Parameters
pdf (Pdf) – The Pdf to attach this file specification to.
path (pathlib.Path | str) – A file path for the file to attach to this Pdf.
description (str) – An optional description. May be shown to the user in PDF viewers.
- get_all_filenames(self: pikepdf.AttachedFileSpec) dict ¶
Return a Python dictionary that describes all filenames.
The returned dictionary is not a pikepdf Object.
Multiple filenames are generally a holdover from the pre-Unicode era. Modern PDFs can generally set UTF-8 filenames and avoid using punctuation or other marks that are forbidden in filenames.
- get_file(*args, **kwargs)¶
Overloaded function.
get_file(self: pikepdf.AttachedFileSpec) -> pikepdf._core.AttachedFile
Return the primary (usually only) attached file.
get_file(self: pikepdf.AttachedFileSpec, arg0: pikepdf.Object) -> pikepdf._core.AttachedFile
Return an attached file selected by
pikepdf.Name
.Typical names would be
/UF
and/F
. See PDF 1.7 Reference Manual for other obsolete names.
- property obj¶
Get the underlying
pikepdf.Object
.
- class pikepdf._core.AttachedFile¶
An object that contains an actual attached file. These objects do not need to be created manually; they are normally part of an AttachedFileSpec.
New in version 3.0.
- property md5¶
Get the MD5 checksum of the attached file according to the PDF creator.
- property mime_type¶
Get the MIME type of the attached file according to the PDF creator.
- property obj¶
Get the underlying
pikepdf.Object
.
- property size¶
Get length of the attached file in bytes according to the PDF creator.
- class pikepdf.NameTree¶
An object for managing name tree data structures in PDFs.
A name tree is a key-value data structure. The keys are any binary strings (that is, Python
bytes
). Ifstr
selected is provided as a key, the UTF-8 encoding of that string is tested. Name trees are (confusingly) not indexed bypikepdf.Name
objects. They behave likeDictMapping[bytes, pikepdf.Object]
.The keys are sorted; pikepdf will ensure that the order is preserved.
The value may be any PDF object. Typically it will be a dictionary or array.
Internally in the PDF, a name tree can be a fairly complex tree data structure implemented with many dictionaries and arrays. pikepdf (using libqpdf) will automatically read, repair and maintain this tree for you. There should not be any reason to access the internal nodes of a number tree; use this interface instead.
NameTrees are used to store certain objects like file attachments in a PDF. Where a more specific interface exists, use that instead, and it will manipulate the name tree in a semantic correct manner for you.
Do not modify the internal structure of a name tree while you have a
NameTree
referencing it. Access it only through theNameTree
object.Names trees are described in the PDF 1.7 Reference Manual section 7.9.6. See section 7.7.4 for a list of PDF objects that are stored in name trees.
New in version 3.0.
- clear() None. Remove all items from D. ¶
- get(k[, d]) D[k] if k in D, else d. d defaults to None. ¶
- static new(pdf: pikepdf.Pdf, *, auto_repair: bool = True) pikepdf.NameTree ¶
Create a new NameTree in the provided Pdf.
You will probably need to insert the name tree in the PDF’s catalog. For example, to insert this name tree in /Root /Names /Dests:
nt = NameTree.new(pdf) pdf.Root.Names.Dests = nt.obj
- property obj¶
Returns the underlying root object for this name tree.
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair ¶
as a 2-tuple; but raise KeyError if D is empty.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D ¶
- update([E, ]**F) None. Update D from mapping/iterable E and F. ¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- class pikepdf.NumberTree¶
An object for managing number tree data structures in PDFs.
A number tree is a key-value data structure, like name trees, except that the key is an integer. It behaves like
Dict[int, pikepdf.Object]
.The keys can be sparse - not all integers positions will be populated. Keys are also always sorted; pikepdf will ensure that the order is preserved.
The value may be any PDF object. Typically it will be a dictionary or array.
Internally in the PDF, a number tree can be a fairly complex tree data structure implemented with many dictionaries and arrays. pikepdf (using libqpdf) will automatically read, repair and maintain this tree for you. There should not be any reason to access the internal nodes of a number tree; use this interface instead.
NumberTrees are not used much in PDF. The main thing they provide is a mapping between 0-based page numbers and user-facing page numbers (which pikepdf also exposes as
Page.label
). The/PageLabels
number tree is where the page numbering rules are defined.Number trees are described in the PDF 1.7 Reference Manual section 7.9.7. See section 12.4.2 for a description of the page labels number tree. Here is an example of modifying an existing page labels number tree:
pagelabels = NumberTree(pdf.Root.PageLabels) # Label pages starting at 0 with lowercase Roman numerals pagelabels[0] = Dictionary(S=Name.r) # Label pages starting at 6 with decimal numbers pagelabels[6] = Dictionary(S=Name.D) # Page labels will now be: # i, ii, iii, iv, v, 1, 2, 3, ...
Do not modify the internal structure of a name tree while you have a
NumberTree
referencing it. Access it only through theNumberTree
object.New in version 5.4.
- clear() None. Remove all items from D. ¶
- get(k[, d]) D[k] if k in D, else d. d defaults to None. ¶
- static new(pdf: pikepdf.Pdf, *, auto_repair: bool = True) pikepdf.NumberTree ¶
Create a new NumberTree in the provided Pdf.
You will probably need to insert the number tree in the PDF’s catalog. For example, to insert this number tree in /Root /PageLabels:
nt = NumberTree.new(pdf) pdf.Root.PageLabels = nt.obj
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair ¶
as a 2-tuple; but raise KeyError if D is empty.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D ¶
- update([E, ]**F) None. Update D from mapping/iterable E and F. ¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v