Bug 2252057 - python-webscrapbook fails to build in Fedora Rawhide: 3 tests fail
Summary: python-webscrapbook fails to build in Fedora Rawhide: 3 tests fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: python-webscrapbook
Version: 40
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: "FeRD" (Frank Dana)
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F40FTBFS PYTHON3.13
TreeView+ depends on / blocked
 
Reported: 2023-11-29 09:22 UTC by Karolina Surma
Modified: 2024-09-23 14:27 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-09-23 14:27:45 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Karolina Surma 2023-11-29 09:22:10 UTC
python-webscrapbook fails to build in Fedora Rawhide.

We've discovered it during the Python 3.13 ongoing rebuild but it's not limited to it.
See Koschei: https://koschei.fedoraproject.org/package/python-webscrapbook?

FAIL: test_html_charset01 (tests.test_scrapbook_cache.TestFulltextCacheGenerator.test_html_charset01)
Detect charset from BOM. (UTF-16-LE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/PyWebScrapBook-1.16.0/tests/test_scrapbook_cache.py", line 1761, in test_html_charset01
    self.assertEqual(book.fulltext, {
AssertionError: {'202[40 chars]t': 'ÿþ<! D O C T Y P E h t m l > <h t m l > <[121 chars]>'}}} != {'202[40 chars]t': 'English 中文'}}}
- {'20200101000000000': {'index.html': {'content': 'ÿþ<! D O C T Y P E h t m l > '
?                                                   ----------------- ^  ^^^^^^^^

+ {'20200101000000000': {'index.html': {'content': 'English 中文'}}}
?                                                    ^^^^^  ^^ +++

-                                                  '<h t m l > <h e a d > <m e t '
-                                                  'a c h a r s e t = " U T F - '
-                                                  '8 " > </ h e a d > <b o d y '
-                                                  '> E n g l i s h -N\x87e </ b '
-                                                  'o d y > </ h t m l >'}}}

======================================================================
FAIL: test_html_charset02 (tests.test_scrapbook_cache.TestFulltextCacheGenerator.test_html_charset02)
Detect charset from BOM. (UTF-16-BE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/PyWebScrapBook-1.16.0/tests/test_scrapbook_cache.py", line 1790, in test_html_charset02
    self.assertEqual(book.fulltext, {
AssertionError: {'202[40 chars]t': 'þÿ <! D O C T Y P E h t m l > <h t m l > [122 chars]>'}}} != {'202[40 chars]t': 'English 中文'}}}
- {'20200101000000000': {'index.html': {'content': 'þÿ <! D O C T Y P E h t m l '
?                                                   ------------------ ^  ^^^^^^

+ {'20200101000000000': {'index.html': {'content': 'English 中文'}}}
?                                                    ^^^^^  ^^ +++

-                                                  '> <h t m l > <h e a d > <m e '
-                                                  't a c h a r s e t = " U T F '
-                                                  '- 8 " > </ h e a d > <b o d '
-                                                  'y > E n g l i s h N-e\x87 </ '
-                                                  'b o d y > </ h t m l >'}}}

======================================================================
FAIL: test_html_iframe_srcdoc01 (tests.test_scrapbook_cache.TestFulltextCacheGenerator.test_html_iframe_srcdoc01)
Include srcdoc content
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/PyWebScrapBook-1.16.0/tests/test_scrapbook_cache.py", line 2246, in test_html_iframe_srcdoc01
    self.assertEqual(book.fulltext, {
AssertionError: {'202[40 chars]t': 'XYZ987 ä¸\xadæ\x96\x87'}, 'linked.html[35 chars].'}}} != {'202[40 chars]t': 'XYZ987 中文'}, 'linked.html': {'content': '[19 chars].'}}}
- {'20200101000000000': {'index.html': {'content': 'XYZ987 ä¸\xadæ\x96\x87'},
?                                                   ---       ^^^^^^^^^^^^^^^

+ {'20200101000000000': {'index.html': {'content': 'XYZ987 中文'},
?                                                          ^^

                         'linked.html': {'content': 'Linked page content.'}}}

----------------------------------------------------------------------
Ran 1168 tests in 2.420s

FAILED (failures=3, skipped=26)


For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.13/fedora-rawhide-x86_64/06692265-python-webscrapbook/

For all our attempts to build python-webscrapbook with Python 3.13, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.13/package/python-webscrapbook/

Testing and mass rebuild of packages is happening in copr.
You can follow these instructions to test locally in mock if your package builds with Python 3.13:
https://copr.fedorainfracloud.org/coprs/g/python/python3.13/

Comment 1 Aoife Moloney 2024-02-15 23:06:12 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 40 development cycle.
Changing version to 40.

Comment 2 "FeRD" (Frank Dana) 2024-09-23 14:27:45 UTC
Python-webscrapbook has since been updated to a git commit based on version 2.3.3, but with unreleased fixes in place for the broken tests.


Note You need to log in before you can comment on or make changes to this bug.