Bug 823679
Summary: | Review Request: python-pdfminer - PDF data parsing library and tool | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ben Rosser <rosser.bjr> |
Component: | Package Review | Assignee: | Lubomir Rintel <lkundrak> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | j, lkundrak, package-review |
Target Milestone: | --- | Flags: | lkundrak:
fedora-review+
gwync: fedora-cvs+ |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | python-pdfminer-20140328-2.fc19 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-09-09 22:06:49 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 859246 | ||
Bug Blocks: |
Description
Ben Rosser
2012-05-21 21:00:20 UTC
Just a brief look. There are a few eyebrow raisers in the spec file: > %prep > %setup -q -n %{srcname}-%{version} > export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/lib/pkgconfig What's the purpose of this export? Nonarch pkgconfig files are stored in /usr/share/pkgconfig/, so there should not be any reason to alter the search path like that. > %build > CFLAGS="$RPM_OPT_FLAGS" This isn't needed at all, is it? If any C code/data files were compiled in this Python based package, that would deserve a comment in the spec file and would ask for a closer look during review. > %{__python} setup.py build > make cmap > chmod +x build/lib/pdfminer/* > mv build/scripts-2.7/dumppdf.py build/scripts-2.7/dumppdf > mv build/scripts-2.7/pdf2txt.py build/scripts-2.7/pdf2txt > mv build/scripts-2.7/latin2ascii.py build/scripts-2.7/latin2ascii I would move the "chmod" line and the three "mv" lines into the %install section. In order to distinguish between the steps, which are needed to build the upstream software, and the additional steps, which may only be necessary to bring into shape the files for installation/packaging. > %install > [...] > mkdir -p %{buildroot}%{_mandir}/man1/ No manual pages are available, however. > %files > [...] > %{python_sitelib}/%{srcname}-20110515-py2.7.egg-info > %{python_sitelib}/%{srcname}/* https://fedoraproject.org/wiki/Packaging:UnownedDirectories > %defattr(-,root,root,-) > %doc docs/* https://fedoraproject.org/wiki/Packaging:Guidelines#File_Permissions If there's a special/secret reason for putting the %defattr there, please document it. Fixed the unowned directories problem, and moved the chmod and mv commands to %install. Got rid of the %defattr (I guess I pulled that from an older RPM reference somewhere), the export, the CFLAGS, and the man mkdir- none of these were needed. Reposted spec file and SRPM; they're at the same URLs: Spec URL: http://venus.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM URL: http://venus.arosser.com/fedora/pdfminer/python-pdfminer-20110515-0.fc16.src.rpm As I realized I completely forgot to increment the release tag with these changes (thus changing the version number to 20110515-1).. I've now reuploaded the spec and a new SRPM: Spec: http://venus.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM: http://venus.arosser.com/fedora/pdfminer/python-pdfminer-20110515-1.fc16.src.rpm Just a note that this code appears to bundle some data from Adobe that is not under the MIT license. According to cmapsrc/README.txt it appears to be under the 3-clause BSD license. So there are two issues: The License: tag is wrong; it needs to be something like "MIT and BSD" and then indicate which parts of the final package are under which license. (And looking at the built package that's not immediately clear to me.) The bundling may or may not run afoul of the general bundling prohibition: https://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries. Since, again, it's not immediately clear to me what happens to that cmap data, I can't really say what should happen here. Also, if you do not intend to package this for RHEL5, you can remove BuildRoot:, the entire %clean section, and the first line of %install. Update, with RHEL 5 legacy stuff removed and license modified to reflect the cmap content. Spec: http://venus.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM: http://venus.arosser.com/fedora/pdfminer/python-pdfminer-20110515-2.fc17.src.rpm (In reply to comment #4) > Just a note that this code appears to bundle some data from Adobe that is > not under the MIT license. According to cmapsrc/README.txt it appears to be > under the 3-clause BSD license. > > So there are two issues: > > The License: tag is wrong; it needs to be something like "MIT and BSD" and > then indicate which parts of the final package are under which license. > (And looking at the built package that's not immediately clear to me.) > > The bundling may or may not run afoul of the general bundling prohibition: > https://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries. Since, > again, it's not immediately clear to me what happens to that cmap data, I > can't really say what should happen here. So... it looks like, because I was calling "make cmap" *after* "python setup.py build", the cmap data is not being installed into the package. I've fixed this, and now the cmap data is being stored in: "/usr/lib/python2.7/site-packages/pdfminer/cmap". So this folder would be under the BSD license, then- I've updated the License tag and put a comment there explaining this. I'm not sure about the bundling though. One option would be for me to simply remove it, as it's not a vital component of the package. The maintainer marks it as an optional step for CJK language support here- http://www.unixuser.org/~euske/python/pdfminer/index.html#install. So, if the bundling of the cmap data does violate the bundling prohibition, it can just be removed. > Also, if you do not intend to package this for RHEL5, you can remove > BuildRoot:, the entire %clean section, and the first line of %install. Ah, okay. Yeah, I have no intention of packaging for RHEL5, so I've removed all that. Update: effectively removed cmap bundling (by creating a separate cmap package here: https://bugzilla.redhat.com/show_bug.cgi?id=859246). Spec: http://venus.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM: http://venus.arosser.com/fedora/pdfminer/python-pdfminer-20110515-3.fc17.src.rpm Additional update: these are now the valid URLs. Spec: http://mars.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM: http://mars.arosser.com/fedora/pdfminer/python-pdfminer-20110515-3.fc17.src.rpm I'll need to update this spec to the latest version of pdfminer, and make it depend on the right cmap packages too. I guess you'd want to update the SPEC file for newly packaged cmap files; something along the lines of BuildRequires: cmap-japan1-6 BuildRequires: cmap-korean1-2 BuildRequires: cmap-gb1-5 BuildRequires: cmap-cns1-6 ... cp %{_datadir}/cmap/cmap-japan*/cid2code.txt cmaprsrc/cid2code_Adobe_Japan1.txt cp %{_datadir}/cmap/cmap-korea*/cid2code.txt cmaprsrc/cid2code_Adobe_Korea1.txt cp %{_datadir}/cmap/cmap-gb*/cid2code.txt cmaprsrc/cid2code_Adobe_GB1.txt cp %{_datadir}/cmap/cmap-cns*/cid2code.txt cmaprsrc/cid2code_Adobe_CNS1.txt (Maybe cmap-* packaging could be changed to avoid use of wildcards above?) Hmm, yes, the version should probably no longer be in the /usr/share/cmap/cmap-* path. That's an artifact of when "1.6" was the version, rather than the file modification date. I think it should probably become /usr/share/cmap/cmap-japan1-6/ (for example). I'll push updates to the other packages, but for now, here's an updated spec. Spec: http://mars.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM: http://mars.arosser.com/fedora/pdfminer/python-pdfminer-20140328-0.fc20.src.rpm 0.) rpmlint unhappy: python-pdfminer.src:17: W: mixed-use-of-spaces-and-tabs (spaces: line 3, tab: line 17) The specfile mixes use of spaces and tabs for indentation, which is a cosmetic annoyance. Use either spaces or tabs for indentation, not both. 1.) You don't need to quote your name with single quotes in changelog: * Sat Aug 16 2014 'Ben Rosser' <rosser.bjr> 20140328-0 2.) Please move the files only after actual install in %{buildroot}. This makes %install non-idempotent and thus breaks short-circuit of install phase (rpmbuild -bi --short-circuit). %install ... # Rename the python scripts to get rid of the *.py suffix mv build/scripts-2.7/dumppdf.py build/scripts-2.7/dumppdf mv build/scripts-2.7/pdf2txt.py build/scripts-2.7/pdf2txt mv build/scripts-2.7/latin2ascii.py build/scripts-2.7/latin2ascii 3.) The initial release number should be >= 1. Release numbers < 1 are reserved to pre-release snapshots. Release: 0%{?dist} Fixed the release numbers, the quotes in changelog, and the tabs. rpmlint is now happy. I changed the script renaming to this (and it happens after "python setup.py install") now: %install chmod +x build/lib/pdfminer/* %{__python} setup.py install --skip-build --root %{buildroot} # Rename the python scripts to get rid of the *.py suffix mv %{buildroot}/usr/bin/dumppdf.py %{buildroot}/usr/bin/dumppdf mv %{buildroot}/usr/bin/pdf2txt.py %{buildroot}/usr/bin/pdf2txt mv %{buildroot}/usr/bin/latin2ascii.py %{buildroot}/usr/bin/latin2ascii Is this better? Spec: http://mars.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM: http://mars.arosser.com/fedora/pdfminer/python-pdfminer-20140328-1.fc20.src.rpm Er, updated again to use %{_bindir} in those lines instead of /usr/bin. Spec: http://mars.arosser.com/fedora/pdfminer/python-pdfminer.spec SRPM: http://mars.arosser.com/fedora/pdfminer/python-pdfminer-20140328-2.fc20.src.rpm * Package named correctly * Package version is correct * Packaging the latest version * Source file checksum matches * SPEC file clean and legible * License tag correct * License fine for fedora * Filelist sane * Provides & requirements okay * Builds fine in mock * rpmlint happy APPROVED New Package SCM Request ======================= Package Name: python-pdfminer Short Description: PDF data parsing library and tool Upstream URL: http://www.unixuser.org/~euske/python/pdfminer/index.html Owners: tc01 Branches: f19 f20 f21 InitialCC: Git done (by process-git-requests). python-pdfminer-20140328-2.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/python-pdfminer-20140328-2.fc19 python-pdfminer-20140328-2.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/python-pdfminer-20140328-2.fc20 python-pdfminer-20140328-2.fc19 has been pushed to the Fedora 19 testing repository. python-pdfminer-20140328-2.fc20 has been pushed to the Fedora 20 stable repository. python-pdfminer-20140328-2.fc19 has been pushed to the Fedora 19 stable repository. |