Bug 2279080

Summary: Review Request: python-pypdf - Pure-Python PDF library
Product: [Fedora] Fedora Reporter: Davide Cavalca <davide>
Component: Package ReviewAssignee: Felix Schwarz <fschwarz>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: fschwarz, itsme_410, package-review, panemade
Target Milestone: ---Keywords: AutomationTriaged
Target Release: ---Flags: fschwarz: fedora-review+
Hardware: Unspecified   
OS: Unspecified   
URL: https://pypdf.readthedocs.io
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-04 20:17:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Davide Cavalca 2024-05-04 17:35:12 UTC
Spec URL: https://dcavalca.fedorapeople.org/review/python-pypdf/python-pypdf.spec
SRPM URL: https://dcavalca.fedorapeople.org/review/python-pypdf/python-pypdf-4.2.0-1.fc41.src.rpm

Description:
pypdf is a free and open-source pure-python PDF library capable of splitting,
merging, cropping, and transforming the pages of PDF files. It can also add
custom data, viewing options, and passwords to PDF files. pypdf can retrieve
text and metadata from PDFs as well.

Fedora Account System Username: dcavalca

Comment 1 Davide Cavalca 2024-05-04 17:35:14 UTC
This package built on koji:  https://koji.fedoraproject.org/koji/taskinfo?taskID=117247728

Comment 2 Fedora Review Service 2024-05-04 17:41:56 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/7409010
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2279080-python-pypdf/fedora-rawhide-x86_64/07409010-python-pypdf/fedora-review/review.txt

Please take a look if any issues were found.


---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 3 Parag AN(पराग) 2024-05-14 12:44:54 UTC
Is not this already packaged as https://src.fedoraproject.org/rpms/python-PyPDF2/blob/rawhide/f/python-PyPDF2.spec ?

Comment 4 Davide Cavalca 2024-05-14 15:05:38 UTC
They're both maintained by the same upstream but they're different projects: pypdf is the successor to PyPDF2 (which is deprecated).

Comment 5 Felix Schwarz 2024-05-21 07:12:08 UTC
Hi Davide,

thank you for taking care of pypdf. It is high time to update the package (pypdf2 in Fedora is still at version 1.26).

Did you contact the current maintainer of python-PyPDF2 (aarem) for coordination? IMHO it would be good to deprecate python-PyPDF2 for rawhide once your package goes in. Can both packages be installed at the same time?

I can try to review this in the coming days, feel free to ping me if this falls through the cracks.

Comment 6 Davide Cavalca 2024-05-21 22:34:14 UTC
> Did you contact the current maintainer of python-PyPDF2 (aarem) for coordination?

I did not but would be more than happy to collaborate, adding them in CC here now.

> IMHO it would be good to deprecate python-PyPDF2 for rawhide once your package goes in. Can both packages be installed at the same time?

pypdf is not a drop-in replacement for PyPDF2 (the API is somewhat different in places) and the packages have different names, so they should be coinstallable with no trouble. Replacing PyPDF2 with pypdf would involve updating all the projects depending on it to use pypdf, which could be a significant effort.

Comment 7 Felix Schwarz 2024-05-26 08:21:00 UTC
> pypdf is not a drop-in replacement for PyPDF2 (the API is somewhat different in places)

My idea was that the latest PyPDF2 is basically pypdf because the authors of the pypdf2 fork were able to merge their changes back to the original project and that the latest pypdf2 is basically the same as pypdf. However now I see that Fedora's PyPDF2 is really ancient so API breaks are to be expected. Anyway, I think it would be good to retire python-PyPDF2 at some point because it is unmaintained upstream. But that's unrelated to this ticket...

Anyway, I'll take a look at your review request now.

Comment 8 Felix Schwarz 2024-05-26 08:44:34 UTC
The package looks mostly ok, except for one issue: The main package does not install the license file, that must be fixed.

Oh, and I learned that fedora-review does not like failed copr builds so I had to build this again, even though just the F38 build failed in your copr.


Package Review
==============

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated
[ ] = Manual review needed


===== MUST items =====

Generic:
[x]: Package successfully compiles and builds into binary rpms on at least
     one supported primary architecture.
[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
[x]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. No licenses
     found. Please check the source files for licenses manually.
[!]: License file installed when any subpackage combination is installed.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package contains no bundled libraries without FPC exception.
[x]: Changelog in prescribed format.
[x]: Sources contain only permissible code or content.
[x]: Macros in Summary, %description expandable at SRPM build time.
     Note: Macros in: python3-pypdf (description)
[-]: Package contains desktop file if it is a GUI application.
[-]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory
     names).
[x]: Package is named according to the Package Naming Guidelines.
[x]: Package does not generate any conflict.
[x]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[x]: Requires correct, justified where necessary.
[x]: Spec file is legible and written in American English.
[-]: Package contains systemd file(s) if in need.
[x]: Package is not known to require an ExcludeArch tag.
[x]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 60797 bytes in 3 files.
[x]: Package complies to the Packaging Guidelines
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: There are rpmlint messages (see attachment).
[x]: If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %license.
[x]: The License field must be a valid SPDX expression.
[x]: Package does not own files or directories owned by other packages.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Dist tag is present.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package must not depend on deprecated() packages.
[x]: Package use %makeinstall only when make install DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package does not use a name that already exists.
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as
     provided in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

Python:
[-]: Binary eggs must be removed in %prep
[x]: Python eggs must not download any dependencies during the build
     process.
[x]: A package which is used by another package via an egg interface should
     provide egg info.
[x]: Package meets the Packaging Guidelines::Python
[x]: Package contains BR: python2-devel or python3-devel
[x]: Packages MUST NOT have dependencies (either build-time or runtime) on
     packages named with the unversioned python- prefix unless no properly
     versioned package exists. Dependencies on Python packages instead MUST
     use names beginning with python2- or python3- as appropriate.
[x]: Python packages must not contain %{pythonX_site(lib|arch)}/* in %files

===== SHOULD items =====

Generic:
[x]: Reviewer should test that the package builds in mock.
[-]: If the source package does not include license text(s) as a separate
     file from upstream, the packager SHOULD query upstream to include it.
[x]: Final provides and requires are sane (see attachments).
[x]: Fully versioned dependency in subpackages if applicable.
     Note: No Requires: %{name}%{?_isa} = %{version}-%{release} in
     python3-pypdf
[x]: Package functions as described.
[x]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[-]: Sources are verified with gpgverify first in %prep if upstream
     publishes signatures.
     Note: gpgverify is not used.
[x]: Package should compile and build into binary rpms on all supported
     architectures.
[x]: %check is present and all tests pass.
[x]: Packages should try to preserve timestamps of original installed
     files.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

===== EXTRA items =====

Generic:
[x]: Rpmlint is run on all installed packages.
     Note: No rpmlint messages.


Rpmlint
-------
Checking: python3-pypdf-4.2.0-1.fc41.noarch.rpm
          python-pypdf-doc-4.2.0-1.fc41.noarch.rpm
          python-pypdf-4.2.0-1.fc41.src.rpm
============================ rpmlint session starts ============================
rpmlint: 2.5.0
configuration:
    /usr/lib/python3.12/site-packages/rpmlint/configdefaults.toml
    /etc/xdg/rpmlint/fedora-legacy-licenses.toml
    /etc/xdg/rpmlint/fedora-spdx-licenses.toml
    /etc/xdg/rpmlint/fedora.toml
    /etc/xdg/rpmlint/scoring.toml
    /etc/xdg/rpmlint/users-groups.toml
    /etc/xdg/rpmlint/warn-on-functions.toml
rpmlintrc: [PosixPath('/tmp/tmp8xqn41tr')]
checks: 32, packages: 3

python-pypdf-doc.noarch: E: files-duplicated-waste 297108
python-pypdf-doc.noarch: W: files-duplicate /usr/share/doc/python-pypdf-doc/html/_static/releasing.drawio.png /usr/share/doc/python-pypdf-doc/html/_images/releasing.drawio.png
 3 packages and 0 specfiles checked; 1 errors, 1 warnings, 11 filtered, 1 badness; has taken 0.4 s 

Rpmlint (installed packages)
----------------------------
============================ rpmlint session starts ============================
rpmlint: 2.5.0
configuration:
    /usr/lib/python3.12/site-packages/rpmlint/configdefaults.toml
    /etc/xdg/rpmlint/fedora-legacy-licenses.toml
    /etc/xdg/rpmlint/fedora-spdx-licenses.toml
    /etc/xdg/rpmlint/fedora.toml
    /etc/xdg/rpmlint/scoring.toml
    /etc/xdg/rpmlint/users-groups.toml
    /etc/xdg/rpmlint/warn-on-functions.toml
checks: 32, packages: 2

 0 packages and 0 specfiles checked; 0 errors, 0 warnings, 0 filtered, 0 badness; has taken 0.0 s 

Source checksums
----------------
https://github.com/py-pdf/pypdf/archive/4.2.0/pypdf-4.2.0.tar.gz :
  CHECKSUM(SHA256) this package     : 4096459bdb19df0231360617f2266d8068a40b9eb202bbea9c54274a320f0c55
  CHECKSUM(SHA256) upstream package : 4096459bdb19df0231360617f2266d8068a40b9eb202bbea9c54274a320f0c55


Requires
--------
python3-pypdf (rpmlib, GLIBC filtered):
    python(abi)

python-pypdf-doc (rpmlib, GLIBC filtered):
    python3-docs



Provides
--------
python3-pypdf:
    python-pypdf
    python3-pypdf
    python3.12-pypdf
    python3.12dist(pypdf)
    python3dist(pypdf)

python-pypdf-doc:
    python-pypdf-doc



Generated by fedora-review 0.10.0 (e79b66b) last change: 2023-07-24
Command line :/bin/fedora-review --no-colors --prebuilt --rpm-spec --name python-pypdf --mock-config /var/lib/copr-rpmbuild/results/configs/child.cfg
Buildroot used: fedora-rawhide-x86_64
Active plugins: Shell-api, Generic, Python
Disabled plugins: Java, fonts, Perl, Ocaml, C/C++, PHP, Haskell, R, SugarActivity
Disabled flags: EXARCH, EPEL6, EPEL7, DISTTAG, BATCH

Comment 9 Ranjan Maitra 2024-05-28 13:45:33 UTC
I am happy to collaborate. I maintain PyPDF2 simply because it is a prerequisite for pdf-stapler, however it would be nice if pdf-stapler could itself be updated to use pypdf (In reply to Davide Cavalca from comment #6)
> > Did you contact the current maintainer of python-PyPDF2 (aarem) for coordination?
> 
> I did not but would be more than happy to collaborate, adding them in CC
> here now.
> 
> > IMHO it would be good to deprecate python-PyPDF2 for rawhide once your package goes in. Can both packages be installed at the same time?
> 
> pypdf is not a drop-in replacement for PyPDF2 (the API is somewhat different
> in places) and the packages have different names, so they should be
> coinstallable with no trouble. Replacing PyPDF2 with pypdf would involve
> updating all the projects depending on it to use pypdf, which could be a
> significant effort.

I am happy to collaborate. I maintain PyPDF2 simply because it is a prerequisite for pdf-stapler, however it would be nice if pdf-stapler could itself be updated to use pypdf but I would appreciate some help for that then. (The original upstream maintainer is not interested, and it could be forked with a different name if needed.) pdf-stapler (stapler in PyPI is a very nice application.)

Comment 10 Felix Schwarz 2024-05-28 15:07:08 UTC
Seems like there are open upstream issues regarding pypdf2 compatibility (but no commits for more than 2 years):

https://github.com/hellerbarde/stapler/issues/92
https://github.com/hellerbarde/stapler/issues/97
and even a PR https://github.com/hellerbarde/stapler/pull/95

Once pdf-stapler supports the latest pypdf2, switching to pypdf is basically trivial.

Comment 11 Davide Cavalca 2024-05-28 15:38:11 UTC
> The package looks mostly ok, except for one issue: The main package does not install the license file, that must be fixed.

The license is picked up automatically by the pyproject macros:

$ rpm -qlp /var/lib/mock/fedora-rawhide-aarch64/result/python3-pypdf-4.2.0-1.fc41.noarch.rpm | grep LICENSE
/usr/lib/python3.12/site-packages/pypdf-4.2.0.dist-info/LICENSE

Comment 12 Felix Schwarz 2024-05-29 06:29:31 UTC
The problem is that the LICENSE file is not marked as such in rpm so it only ends up in the dist-info metadata.

https://docs.fedoraproject.org/en-US/packaging-guidelines/LicensingGuidelines/#_license_text
"If the source package includes the text of the license(s) in its own file, then that file, containing the text of the license(s) for the package must be included in the %files list flagged with the %license directive."
...
"What is important is not the visible presence of the %license directive but instead that all relevant license files included in a package appear when using rpm -q --licensefiles."

Here are the contents of the extracted python3-pypdf rpm:

.
├── lib
│   └── python3.12
│       └── site-packages
│           ├── pypdf
                ...
│           └── pypdf-4.2.0.dist-info
│               ├── INSTALLER
│               ├── LICENSE
│               ├── METADATA
│               └── WHEEL
└── share
    └── doc
        └── python3-pypdf
            ├── CHANGELOG.md
            ├── CONTRIBUTORS.md
            └── README.md

Once the file is recognized via '%license', it should show up as "/usr/share/licenses/python3-pypdf/LICENSE". Not sure if this is supposed to happen automatically via the Python macros but currently the LICENSE file is not recognized as such by rpm.

Comment 13 Felix Schwarz 2024-05-29 06:36:11 UTC
btw: I just sent a message to Fedora's Python mailing list. Maybe it is just a shortcoming in the pyproject macros.

Comment 15 Davide Cavalca 2024-05-29 14:48:56 UTC
Thanks for root causing the LICENSE issue, much appreciated.

Spec URL: https://dcavalca.fedorapeople.org/review/python-pypdf/python-pypdf.spec
SRPM URL: https://dcavalca.fedorapeople.org/review/python-pypdf/python-pypdf-4.2.0-1.fc41.src.rpm

Changelog:
- tag LICENSE file to workaround upstream issue

Comment 16 Felix Schwarz 2024-06-03 06:05:50 UTC
Thank you, package APPROVED.

Comment 17 Davide Cavalca 2024-06-03 16:15:30 UTC
I think there might be something wrong with your FAS account, trying to get a repo for this fails with:

The email address "fschwarz" of the Bugzilla reviewer is not tied to a user in FAS or FAS check failed. Group membership can't be validated.

See https://pagure.io/releng/fedora-scm-requests/issue/62780 and https://pagure.io/releng/fedora-scm-requests/issue/62780

Comment 18 Felix Schwarz 2024-06-04 06:00:39 UTC
Not sure why this failed but I filed https://pagure.io/releng/issue/12148 to get some help.

Comment 19 Felix Schwarz 2024-06-04 07:27:35 UTC
There was a missing setting in my FAS (not sure why this happened) but I hope everything is fine now. Please try again requesting the repo.

Comment 20 Fedora Admin user for bugzilla script actions 2024-06-04 20:05:51 UTC
The Pagure repository was created at https://src.fedoraproject.org/rpms/python-pypdf

Comment 21 Fedora Update System 2024-06-04 20:14:27 UTC
FEDORA-2024-484612395d (python-pypdf-4.2.0-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-484612395d

Comment 22 Fedora Update System 2024-06-04 20:17:23 UTC
FEDORA-2024-484612395d (python-pypdf-4.2.0-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 23 Fedora Update System 2024-06-04 20:51:48 UTC
FEDORA-2024-26b6017442 (python-pypdf-4.2.0-1.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-26b6017442

Comment 24 Fedora Update System 2024-06-04 21:01:44 UTC
FEDORA-2024-333bfe80f8 (python-pypdf-4.2.0-1.fc39) has been submitted as an update to Fedora 39.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-333bfe80f8

Comment 25 Fedora Update System 2024-06-05 02:04:05 UTC
FEDORA-2024-26b6017442 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-26b6017442 \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-26b6017442

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 26 Fedora Update System 2024-06-05 02:29:51 UTC
FEDORA-2024-333bfe80f8 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-333bfe80f8 \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-333bfe80f8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 27 Fedora Update System 2024-06-13 03:03:18 UTC
FEDORA-2024-333bfe80f8 (python-pypdf-4.2.0-1.fc39) has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 28 Fedora Update System 2024-06-13 04:05:06 UTC
FEDORA-2024-26b6017442 (python-pypdf-4.2.0-1.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.