Bug 1481046 - Review Request: python-html5-parser - A fast, standards compliant, C based, HTML 5 parser for python
Summary: Review Request: python-html5-parser - A fast, standards compliant, C based, H...
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Raphael Groner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-13 21:02 UTC by Kevin Fenzi
Modified: 2017-11-21 23:27 UTC (History)
2 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-11-21 23:27:46 UTC
projects.rg: fedora-review+


Attachments (Terms of Use)
gumbo.diff (53.61 KB, patch)
2017-09-18 13:11 UTC, Raphael Groner
no flags Details | Diff

Description Kevin Fenzi 2017-08-13 21:02:05 UTC
Spec URL: http://www.scrye.com/~kevin/fedora/review/python-html5-parser/python-html5-parser.spec
SRPM URL: http://www.scrye.com/~kevin/fedora/review/python-html5-parser/python-html5-parser-0.4.4-1.fc27.src.rpm
Description: A fast, standards compliant, C based, HTML 5 parser for python
Fedora Account System Username: kevin

scratch build at: http://koji.fedoraproject.org/koji/taskinfo?taskID=21215878

Comment 1 Raphael Groner 2017-08-15 21:27:46 UTC
Hi Kevin,
are you interested in a review swap, maybe you can look into bug #1481775?

Comment 2 Kevin Fenzi 2017-08-17 00:05:19 UTC
Sure. I can... pretty busy, but I can try and get it in the next few days.

Comment 3 Raphael Groner 2017-09-02 16:56:13 UTC
Before I can run fedora-review successfully, some general hints:

- gumbo[-parser] is available as a separate package, please try to unbundle.

- There's no generation of useful debuginfo. As cpython extension is used, we need some debuginfo.

Sorry for the delay of my answer.

Comment 4 Raphael Groner 2017-09-02 16:58:20 UTC
- You may want to use %{sum} also in/for the descriptions.

Comment 5 Kevin Fenzi 2017-09-16 21:17:39 UTC
So, the version of gumbo here is not the normal upstream one, but instead it's https://github.com/Sigil-Ebook/sigil-gumbo which is a modified fork. They have had no releases and warn against trying to replace their version with the orig one. :( So, I fear at least for now we need to bundle it. Added a note about that to the spec. 

The debuginfo issue is due to it overriding ldflags to add -O3. I think I have fixed that by telling it to do a debug build. 

Spec URL: http://www.scrye.com/~kevin/fedora/review/python-html5-parser/python-html5-parser.spec
SRPM URL: http://www.scrye.com/~kevin/fedora/review/python-html5-parser/python-html5-parser-0.4.4-2.fc28.src.rpm

Comment 6 Raphael Groner 2017-09-18 13:08:13 UTC
IMHO according to the guidelines, we must add at least a virtual provides:
Provides: gumbo-parser-static

There's no possibility to properly unbundle because of obvious modifications in html5-parser/gumbo compared to the available API in gumbo-parser/src, visible by diff'ing over the headers (see following attachment).

It's not only about bundling gumbo but also gumbo gets compiled as a static library *and linked internally* by help from Extension of setuptools. Therefore, it's impossible to have a separate devel subpackage with any explicit gumbo library (and headers not installed by build tool, so useless anyhow), as assumed in the guidelines: see 2. Static libraries only.

https://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries

Comment 7 Raphael Groner 2017-09-18 13:09:47 UTC
cd html5-parser/gumbo ; find . -name \*.h -exec diff -rNu '{}' ../../gumbo-parser/src/'{}' \; >gumbo.diff

Comment 8 Raphael Groner 2017-09-18 13:11 UTC
Created attachment 1327399 [details]
gumbo.diff

cd html5-parser/gumbo ; find . -name \*.h -exec diff -rNu '{}' ../../gumbo-parser/src/'{}' \; >/tmp/gumbo.diff

Comment 9 Kevin Fenzi 2017-10-18 20:58:26 UTC
So, I never got any replies on my post to the packaging list. ;( 

I can file a ticket, but now that I come back to it, I am not sure what the outstanding issue is here? Is "Provides: bundled(sigil-gumbo)" not right/good enough?

Comment 10 Raphael Groner 2017-10-18 21:09:23 UTC
(In reply to Kevin Fenzi from comment #9)
> So, I never got any replies on my post to the packaging list. ;( 
> 
> I can file a ticket, but now that I come back to it, I am not sure what the
> outstanding issue is here? Is "Provides: bundled(sigil-gumbo)" not
> right/good enough?

There's no package sigil-gumbo. So your suggestion is pointless.

My suggestion is explained in comment #6:
Provides: gumbo-parser-static

Comment 11 Kevin Fenzi 2017-10-18 21:22:48 UTC
(In reply to Raphael Groner from comment #10)
>
> There's no package sigil-gumbo. So your suggestion is pointless.

Well, there's not currently, there may be someday. 

It gets back to what you think the reason for adding the Provides is. 
IMHO it's to allow someone to see whats bundled in case they want to check all packages for a known issue. That doesn't require that the package is in Fedora only that packages that bundle it indicate so. 

> 
> My suggestion is explained in comment #6:
> Provides: gumbo-parser-static

But thats just incorrect, because it's not gumbo-parser, also that makes it look like this thing provides a static lib someone could link to per https://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries

So, I guess I should file a FPC ticket and ask them what to put in Provides in this case.

Comment 12 Kevin Fenzi 2017-10-20 19:19:40 UTC
ok. I filed the ticket and got a suggestion to just add both projects at the time they were forked. 

FPC ticket: https://pagure.io/packaging-committee/issue/722

Here's what I am adding to the spec: 

# This package bundles sigil-gumbo a fork of gumbo
# Base project: https://github.com/google/gumbo-parser
# Forked from above: https://github.com/Sigil-Ebook/sigil-gumbo
# It also patches that bundled copy with other changes.
# sigul-gumbo bundled here was added 20170601
Provides:      bundled(sigil-gumbo) = 0.9.3-20170601git0830e1145fe08
# sigul-gumbo forked off grumbo-parser at this commit in 20160216
Provides:      bundled(gumbo-parser) = 0.9.3-20160216git69b580ab4de04

New spec/src.rpm:

Spec URL: http://www.scrye.com/~kevin/fedora/review/python-html5-parser/python-html5-parser.spec
SRPM URL: http://www.scrye.com/~kevin/fedora/review/python-html5-parser/python-html5-parser-0.4.4-2.fc28.src.rpm

New scratch build: 
https://koji.fedoraproject.org/koji/taskinfo?taskID=22571236

Comment 13 Raphael Groner 2017-10-20 21:10:27 UTC
APPROVED
There are no real blockers. Please consider to fix hints marked with [!] and commented below while importing the package.


Package Review
==============

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated
[ ] = Manual review needed



===== MUST items =====

C/C++:
[x]: Package does not contain kernel modules.
[x]: Package contains no static executables.
[x]: Development (unversioned) .so files in -devel subpackage, if present.
     Note: Unversioned so-files in private %_libdir subdirectory (see
     attachment). Verify they are not in ld path.
=> OK. Because python.

[x]: Package does not contain any libtool archives (.la)
[x]: Rpath absent or only used for internal libs.

Generic:
[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
[!]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses
     found: "Apache (v2.0)", "Unknown or generated", "*No copyright*
     Apache", "MIT/X11 (BSD like) Apache (v2.0)", "Apache", "*No copyright*
     Apache (v2.0)". 41 files have unknown license. Detailed output of
     licensecheck in /home/builder/fedora-review/1481046-python-
     html5-parser/licensecheck.txt
=> Add MIT for html5-parser-0.4.4/gumbo/utf8.c and mention as comment.
   https://fedoraproject.org/wiki/Packaging:LicensingGuidelines#Multiple_Licensing_Scenarios

[x]: License file installed when any subpackage combination is installed.
[x]: %build honors applicable compiler flags or justifies otherwise.
[x]: Package contains no bundled libraries without FPC exception.
=> OK, see discussion below.

[x]: Changelog in prescribed format.
[x]: Sources contain only permissible code or content.
[-]: Package contains desktop file if it is a GUI application.
[-]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory
     names).
[x]: Package is named according to the Package Naming Guidelines.
[x]: Package does not generate any conflict.
[x]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[x]: Requires correct, justified where necessary.
[!]: Spec file is legible and written in American English.
=> Please correct two typos: s|grumbo-parser|gumbo-parser|g s|sigul-gumbo|sigil-gumbo|g

[-]: Package contains systemd file(s) if in need.
[x]: Useful -debuginfo package or justification otherwise.
=> Sources found in debugsource subpackage.

[x]: Package is not known to require an ExcludeArch tag.
[-]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 20480 bytes in 2 files.
[!]: Package complies to the Packaging Guidelines
=> See comments above.

[x]: Package successfully compiles and builds into binary rpms on at least
     one supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: No rpmlint messages.
[x]: If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %license.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package does not own files or directories owned by other packages.
[x]: All build dependencies are listed in BuildRequires, except for any
     that are listed in the exceptions section of Packaging Guidelines.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Dist tag is present.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package use %makeinstall only when make install DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package does not use a name that already exists.
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as
     provided in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

Python:
[x]: Python eggs must not download any dependencies during the build
     process.
[x]: A package which is used by another package via an egg interface should
     provide egg info.
[x]: Package meets the Packaging Guidelines::Python
[x]: Package contains BR: python2-devel or python3-devel
[x]: Binary eggs must be removed in %prep

===== SHOULD items =====

Generic:
[-]: If the source package does not include license text(s) as a separate
     file from upstream, the packager SHOULD query upstream to include it.
[x]: Final provides and requires are sane (see attachments).
[x]: Fully versioned dependency in subpackages if applicable.
     Note: No Requires: %{name}%{?_isa} = %{version}-%{release} in
     python2-html5-parser , python3-html5-parser , python-html5-parser-
     debuginfo
=> Ignore. There's no base package, because python.

[?]: Package functions as described.
[x]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[-]: Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
[x]: Package should compile and build into binary rpms on all supported
     architectures.
[-]: %check is present and all tests pass.
[x]: Packages should try to preserve timestamps of original installed
     files.
[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

===== EXTRA items =====

Generic:
[x]: Rpmlint is run on debuginfo package(s).
     Note: No rpmlint messages.
[x]: Rpmlint is run on all installed packages.
     Note: No rpmlint messages.
[x]: Large data in /usr/share should live in a noarch subpackage if package
     is arched.
[x]: Spec file according to URL is the same as in SRPM.


Rpmlint
-------
Checking: python2-html5-parser-0.4.4-2.fc28.x86_64.rpm
          python3-html5-parser-0.4.4-2.fc28.x86_64.rpm
          python-html5-parser-debuginfo-0.4.4-2.fc28.x86_64.rpm
          python-html5-parser-0.4.4-2.fc28.src.rpm
4 packages and 0 specfiles checked; 0 errors, 0 warnings.




Rpmlint (debuginfo)
-------------------
Checking: python-html5-parser-debuginfo-0.4.4-2.fc28.x86_64.rpm
1 packages and 0 specfiles checked; 0 errors, 0 warnings.





Rpmlint (installed packages)
----------------------------
3 packages and 0 specfiles checked; 0 errors, 0 warnings.



Requires
--------
python-html5-parser-debuginfo (rpmlib, GLIBC filtered):

python2-html5-parser (rpmlib, GLIBC filtered):
    libc.so.6()(64bit)
    libpthread.so.0()(64bit)
    libpython2.7.so.1.0()(64bit)
    libxml2.so.2()(64bit)
    libxml2.so.2(LIBXML2_2.4.30)(64bit)
    libxml2.so.2(LIBXML2_2.6.0)(64bit)
    python(abi)
    rtld(GNU_HASH)

python3-html5-parser (rpmlib, GLIBC filtered):
    libc.so.6()(64bit)
    libpthread.so.0()(64bit)
    libpython3.6m.so.1.0()(64bit)
    libxml2.so.2()(64bit)
    libxml2.so.2(LIBXML2_2.4.30)(64bit)
    libxml2.so.2(LIBXML2_2.6.0)(64bit)
    python(abi)
    rtld(GNU_HASH)



Provides
--------
python-html5-parser-debuginfo:
    python-html5-parser-debuginfo
    python-html5-parser-debuginfo(x86-64)

python2-html5-parser:
    bundled(gumbo-parser)
    bundled(sigil-gumbo)
    python-html5-parser
    python-html5-parser(x86-64)
    python2-html5-parser
    python2-html5-parser(x86-64)
    python2.7dist(html5-parser)
    python2dist(html5-parser)

python3-html5-parser:
    bundled(gumbo-parser)
    bundled(sigil-gumbo)
    python3-html5-parser
    python3-html5-parser(x86-64)
    python3.6dist(html5-parser)
    python3dist(html5-parser)



Unversioned so-files
--------------------
python2-html5-parser: /usr/lib64/python2.7/site-packages/html5_parser/html_parser.so
python3-html5-parser: /usr/lib64/python3.6/site-packages/html5_parser/html_parser.cpython-36m-x86_64-linux-gnu.so

Source checksums
----------------
https://files.pythonhosted.org/packages/source/h/html5-parser/html5-parser-0.4.4.tar.gz :
  CHECKSUM(SHA256) this package     : b9f3a1d4cdb8742e8e4ecafab04bff541bde4ff09af233293ed0b94028ec1ab5
  CHECKSUM(SHA256) upstream package : b9f3a1d4cdb8742e8e4ecafab04bff541bde4ff09af233293ed0b94028ec1ab5


Generated by fedora-review 0.6.1 (f03e4e7) last change: 2016-05-02
Command line :/usr/bin/fedora-review -v -m fedora-rawhide-x86_64 -b 1481046
Buildroot used: fedora-rawhide-x86_64
Active plugins: Python, Generic, Shell-api, C/C++
Disabled plugins: Java, SugarActivity, fonts, Haskell, Ocaml, Perl, R, PHP
Disabled flags: EXARCH, DISTTAG, EPEL5, BATCH, EPEL6

Comment 14 Kevin Fenzi 2017-10-20 21:19:37 UTC
Thanks! Will fix those things before import...

Comment 15 Gwyn Ciesla 2017-10-20 21:31:06 UTC
(fedrepo-req-admin):  The Pagure repository was created at https://src.fedoraproject.org/rpms/python-html5-parser

Comment 16 Fedora Update System 2017-10-21 02:44:10 UTC
python-html5-parser-0.4.4-3.fc27 calibre-3.10.0-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 17 Fedora Update System 2017-10-21 19:29:59 UTC
calibre-3.10.0-1.fc27, python-html5-parser-0.4.4-3.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 18 Fedora Update System 2017-10-23 19:18:09 UTC
calibre-3.10.0-2.fc27 python-html5-parser-0.4.4-3.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 19 Fedora Update System 2017-10-25 15:22:48 UTC
calibre-3.10.0-2.fc27, python-html5-parser-0.4.4-3.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 20 Fedora Update System 2017-11-06 00:00:23 UTC
calibre-3.11.1-1.fc27 python-html5-parser-0.4.4-3.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 21 Fedora Update System 2017-11-06 21:13:59 UTC
calibre-3.11.1-1.fc27, python-html5-parser-0.4.4-3.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 22 Fedora Update System 2017-11-08 23:06:06 UTC
calibre-3.11.1-2.fc27 python-html5-parser-0.4.4-4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 23 Fedora Update System 2017-11-09 19:54:15 UTC
calibre-3.11.1-2.fc27, python-html5-parser-0.4.4-4.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4c28e33e56

Comment 24 Fedora Update System 2017-11-21 23:27:46 UTC
calibre-3.11.1-2.fc27, python-html5-parser-0.4.4-4.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.