Bug 2244406 - Review Request: python-rtfde - A library for extracting HTML content from RTF encapsulated HTML
Summary: Review Request: python-rtfde - A library for extracting HTML content from RTF...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Maxwell G
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2244400
Blocks: 2244408
TreeView+ depends on / blocked
 
Reported: 2023-10-16 11:34 UTC by Sandro
Modified: 2023-12-31 13:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-31 13:08:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
The .spec file difference from Copr build 6674061 to 6737570 (2.54 KB, patch)
2023-12-10 01:40 UTC, Fedora Review Service
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Github seamustuohy RTFDE issues 24 0 None open License clarification needed 2023-11-05 23:27:42 UTC

Description Sandro 2023-10-16 11:34:32 UTC
Spec URL: https://download.copr.fedorainfracloud.org/results/gui1ty/extract-msg/fedora-rawhide-x86_64/06529620-python-RTFDE/python-RTFDE.spec
SRPM URL: https://download.copr.fedorainfracloud.org/results/gui1ty/extract-msg/fedora-rawhide-x86_64/06529620-python-RTFDE/python-RTFDE-0.1.0-1.20231015git66780b8.fc40.src.rpm

Description:
RTFDE: RTF De-Encapsulator

A python3 library for extracting encapsulated HTML & plain text content
from the RTF bodies of .msg files.

De-encapsulation enables previously encapsulated HTML and plain text
content to be extracted and rendered as HTML and plain text instead of
the encapsulating RTF content. After de-encapsulation, the HTML and
plain text should differ only minimally from the original HTML or plain
text content.

Features

 - De-encapsulate HTML from RTF encapsulated HTML
 - De-encapsulate plain text from RTF encapsulated text

Known Issues

 - This library _fully_ unquotes text it de-encapsulates because it does
 not know which text was quoted in the RTF conversion process and which
 text was quoted in the original html/text. So, for instance escaped
 Quoted-Printable text will be returned un-escaped.
 - This library currently can't combine attachments from a .MSG Message
 object with the de-encapsulated HTML. This is mostly because I could
 not get a good set of examples of encapsulated HTML which had
 attachment objects that needed to be integrated back into the body of
 the HTML.

Anti-Features (I don't intend to have this library do this.)

 - Extract plain text from RTF encapsulated HTML. If you want this,
 then you will have to parse the HTML using another library.

Fedora Account System Username: gui1ty

Copr Build: https://copr.fedorainfracloud.org/coprs/gui1ty/extract-msg/build/6529620/

Comment 1 Maxwell G 2023-11-05 19:53:01 UTC
Maxwell's Python Package Review Template
==============

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated


- [!] The License tag reflects the package contents and uses the correct identifiers.
- [!] The license text is included and marked with %license.
NOTE: Files in the project include GPLv3+ headers, but the metadata and LICENSE claims LGPLv3. Can you clarify this with upstream?

- [x] The package builds successfully in mock.
- [x] The package is installable (checked by fedora-review).
- [x] There are no relevant rpmlint errors.
- [x] The package runs tests in %check.
- [x] The latest version is packaged or packaging an earlier version is justified.

- [!] The packager considers avoiding confusing `%foo_name` macros. (Not a blocker)

- [!] Libraries: The package name has a `python3-` prefix and uses the canonical project name
NOTE: The SRPM should be named python-rtfde and the binary package should be named python3-rftde.

- [x] The pyproject macros are used.
- [x] The package complies with the Python and general Packaging Guidelines.


Other Notes
===========

- You should add `%global distprefix %{nil}` so the forge macros don't add .20231015git66780b8 to the disttag. This is not a git snapshot; upstream just doesn't tag releases.
- It'd be better to use the `%{...}` syntax when referencing the `%forge*` macros in the specfile.
- The doc subpackage should not require the python3-... subpackage.

Thanks!

Comment 2 Sandro 2023-11-05 23:27:42 UTC
> - [!] The License tag reflects the package contents and uses the correct
> identifiers.
> - [!] The license text is included and marked with %license.
> NOTE: Files in the project include GPLv3+ headers, but the metadata and
> LICENSE claims LGPLv3. Can you clarify this with upstream?

I asked upstream: https://github.com/seamustuohy/RTFDE/issues/24
Let's hope they respond. So far I haven't received any response to my PRs for other issues.


> - [!] The packager considers avoiding confusing `%foo_name` macros. (Not a
> blocker)

I will consider it, if you tell what macros are confusing you. Was it `%pypi_name` by any chance?


> - [!] Libraries: The package name has a `python3-` prefix and uses the
> canonical project name
> NOTE: The SRPM should be named python-rtfde and the binary package should be
> named python3-rftde.

That part keeps confusing me. Why call the package `python3-rtfde` when the importable module is called `RTFDE`. Anyway, I fixed it.


> - You should add `%global distprefix %{nil}` so the forge macros don't add
> .20231015git66780b8 to the disttag. This is not a git snapshot; upstream
> just doesn't tag releases.

I fixed it by using `%autorelease -n`. My intention was to make it clear that this is build from a commit and not a tag.

> - It'd be better to use the `%{...}` syntax when referencing the `%forge*`
> macros in the specfile.
> - The doc subpackage should not require the python3-... subpackage.

Well, if that is a blocking requirement, I'd rather get rid of the doc sub package. I don't see how it is useful installing the doc sub package without the package it documents.

Since the license still needs to be clarified, I guess this review is stuck for the time being. I've implemented the changes I could:

Spec URL: https://download.copr.fedorainfracloud.org/results/gui1ty/extract-msg/fedora-rawhide-x86_64/06600937-python-rtfde/python-rtfde.spec
SRPM URL: https://download.copr.fedorainfracloud.org/results/gui1ty/extract-msg/fedora-rawhide-x86_64/06600937-python-rtfde/python-rtfde-0.1.0-1.src.rpm

Comment 3 Sandro 2023-11-11 15:06:46 UTC
I just found out that the LICENSE file previously _did_ contain the GPL-3.0-only text. It was replaced. Which seems odd.

https://github.com/seamustuohy/RTFDE/commit/2ffdd05afa9a630ea8c79935c992e22443469bc7

If upstream does not get back to my question regarding the license, would it be permissible to revert that commit bringing the license in sync with the headers?

Comment 4 Sandro 2023-11-21 11:48:04 UTC
Upstream has clarified the license situation. The package is licensed under LGPL-3.0-only. They will fix the individual file headers. Pending that patch and possibly a new release, this is what we have now:

Spec URL: https://download.copr.fedorainfracloud.org/results/gui1ty/extract-msg/fedora-rawhide-x86_64/06674018-python-rtfde/python-rtfde.spec
SRPM URL: https://download.copr.fedorainfracloud.org/results/gui1ty/extract-msg/fedora-rawhide-x86_64/06674018-python-rtfde/python-rtfde-0.1.0-2.fc40.src.rpm

Comment 5 Fedora Review Service 2023-11-21 11:57:19 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/6674061
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2244406-python-rtfde/fedora-rawhide-x86_64/06674061-python-rtfde/fedora-review/review.txt

Please take a look if any issues were found.


---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 7 Fedora Review Service 2023-12-10 01:40:10 UTC
Created attachment 2003527 [details]
The .spec file difference from Copr build 6674061 to 6737570

Comment 8 Fedora Review Service 2023-12-10 01:40:13 UTC
Copr build:
https://copr.fedorainfracloud.org/coprs/build/6737570
(succeeded)

Review template:
https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-review-2244406-python-rtfde/fedora-rawhide-x86_64/06737570-python-rtfde/fedora-review/review.txt

Please take a look if any issues were found.


---
This comment was created by the fedora-review-service
https://github.com/FrostyX/fedora-review-service

If you want to trigger a new Copr build, add a comment containing new
Spec and SRPM URLs or [fedora-review-service-build] string.

Comment 9 Sandro 2023-12-31 13:08:14 UTC
I'm closing this review requests in favor of maintaining extract-msg and its dependencies in Copr [1]. In the end extract-msg is just a niche package. I would have liked to make it available in Fedora proper, but my time is limited. What's in Copr now is usable. So, I decided to leave that in place and stop pursuing an import into Fedora repositories.

Thanks for your time!

[1] https://copr.fedorainfracloud.org/coprs/gui1ty/extract-msg/


Note You need to log in before you can comment on or make changes to this bug.