When pycryptodome is not installed, pypdf fails to read some PDFs, and gives this error:
pypdf.errors.DependencyError: PyCryptodome is required for AES algorithm
Because I wasn't familiar with pycryptodome, I wasn't sure what I needed to do to get it working. Eventually I figured out that pycryptodome was a Python library, and all I had to do was run pip3 install pycryptodome
to fix the error.
If possible, it would be nice if pypdf could 1) install pycryptodome as a dependency as part of the installation process for pypdf, OR 2) provide more information in the error, letting the user know that pycryptodome is a Python library than can be installed via pip.
Environment
Which environment were you using when you encountered the problem?
$ python3 -m platform
macOS-13.1-arm64-arm-64bit
$ python3 -c "import pypdf;print(pypdf.__version__)"
3.1.0
Code + PDF
This is a minimal, complete example that shows the issue:
- Install pypdf (
pip3 install pypdf
).
- Make sure pycryptodome is not installed (
pip3 uninstall pycryptodome
).
- Run the following Python script:
from pypdf import PdfReader
from urllib.request import urlopen
from io import BytesIO
# Get the PDF and convert it into a byte stream
pdf_url = 'https://web.archive.org/web/30000101000000if_/http://www.latterdaytruth.org/pdf/100846.pdf'
pdf_file = urlopen(pdf_url).read()
pdf_bytes_stream = BytesIO(pdf_file)
# Load the file with pypdf
pdf_reader = PdfReader(pdf_bytes_stream)
# Print the number of pages
pages_count = len(pdf_reader.pages)
print('Number of pages: {0}'.format(pages_count))
This is the PDF I'm attempting to read: https://web.archive.org/web/30000101000000if_/http://www.latterdaytruth.org/pdf/100846.pdf
Traceback
Traceback (most recent call last):
File "/Users/sbradshaw/Desktop/test-pypdf-pages.py", line 14, in <module>
pages_count = len(pdf_reader.pages)
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_page.py", line 2063, in __len__
return self.length_function()
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 445, in _get_num_pages
return self.trailer[TK.ROOT]["/Pages"]["/Count"] # type: ignore
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 266, in __getitem__
return dict.__getitem__(self, key).get_object()
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/generic/_base.py", line 259, in get_object
obj = self.pdf.get_object(self)
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 1205, in get_object
retval = self._get_object_from_stream(indirect_reference) # type: ignore
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 1136, in _get_object_from_stream
obj_stm: EncodedStreamObject = IndirectObject(stmnum, 0, self).get_object() # type: ignore
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/generic/_base.py", line 259, in get_object
obj = self.pdf.get_object(self)
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 1269, in get_object
retval = self._encryption.decrypt_object(
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_encryption.py", line 761, in decrypt_object
return cf.decrypt_object(obj)
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_encryption.py", line 185, in decrypt_object
obj._data = self.stmCrypt.decrypt(obj._data)
File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_encryption.py", line 147, in decrypt
raise DependencyError("PyCryptodome is required for AES algorithm")
pypdf.errors.DependencyError: PyCryptodome is required for AES algorithm