Architecture
pikepdf uses nanobind to bind the C++ interface of QPDF. Earlier versions used pybind11, which was selected after evaluating Cython, CFFI and SWIG; pikepdf migrated to nanobind in v10 for better performance and free-threaded Python support.
In addition to bindings pikepdf includes support code written in a mix of C++ and Python, mainly to present a clean Pythonic interface to C++ and implement higher level functionality.
Internals
Internally the package presents a module named pikepdf from which objects
can be imported. The C++ extension module is currently named pikepdf._core.
Users of pikepdf should not directly access _core since it is an
internal interface. In previous versions, this library was named _qpdf.
In general, modules or objects behind an underscore are private (although they may be returned in some situations).
Thread safety
pikepdf supports free-threaded (no-GIL) CPython and publishes cp314t wheels.
On a GIL-enabled interpreter, it is safe to read pikepdf objects across Python threads, though the GIL means there may not be much performance gain from doing so. On a free-threaded interpreter there is no GIL serializing access, so even concurrent reads of an object that another thread may be modifying require synchronization.
In either case, if one or more threads will be modifying pikepdf objects, you
must coordinate read and write access with a threading.Lock.
It is not currently possible to pickle pikepdf objects or marshall them across
process boundaries (as would be required to use pikepdf in
multiprocessing). If this were implemented, it would not be much more
efficient than saving a full PDF and sending it to another process.
Parallelizing work (for example, by dividing work by PDF pages) can still be
achieved by having each worker process open the same file.
File handles
Because of technical limitations in underlying libraries, pikepdf keeps the source PDF file open when a content is copied from it to another PDF, even when all Python variables pointing to the source are removed. If a PDF is being assembled from many sources, then all of those sources are held open in memory.