Bug 2252057

Summary: python-webscrapbook fails to build in Fedora Rawhide: 3 tests fail
Product: [Fedora] Fedora Reporter: Karolina Surma <ksurma>
Component: python-webscrapbookAssignee: "FeRD" (Frank Dana) <ferdnyc>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 40CC: ferdnyc, ksurma, mhroncok
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-09-23 14:27:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2231791, 2244836    

Description Karolina Surma 2023-11-29 09:22:10 UTC
python-webscrapbook fails to build in Fedora Rawhide.

We've discovered it during the Python 3.13 ongoing rebuild but it's not limited to it.
See Koschei: https://koschei.fedoraproject.org/package/python-webscrapbook?

FAIL: test_html_charset01 (tests.test_scrapbook_cache.TestFulltextCacheGenerator.test_html_charset01)
Detect charset from BOM. (UTF-16-LE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/PyWebScrapBook-1.16.0/tests/test_scrapbook_cache.py", line 1761, in test_html_charset01
    self.assertEqual(book.fulltext, {
AssertionError: {'202[40 chars]t': 'ÿþ<! D O C T Y P E h t m l > <h t m l > <[121 chars]>'}}} != {'202[40 chars]t': 'English 中文'}}}
- {'20200101000000000': {'index.html': {'content': 'ÿþ<! D O C T Y P E h t m l > '
?                                                   ----------------- ^  ^^^^^^^^

+ {'20200101000000000': {'index.html': {'content': 'English 中文'}}}
?                                                    ^^^^^  ^^ +++

-                                                  '<h t m l > <h e a d > <m e t '
-                                                  'a c h a r s e t = " U T F - '
-                                                  '8 " > </ h e a d > <b o d y '
-                                                  '> E n g l i s h -N\x87e </ b '
-                                                  'o d y > </ h t m l >'}}}

======================================================================
FAIL: test_html_charset02 (tests.test_scrapbook_cache.TestFulltextCacheGenerator.test_html_charset02)
Detect charset from BOM. (UTF-16-BE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/PyWebScrapBook-1.16.0/tests/test_scrapbook_cache.py", line 1790, in test_html_charset02
    self.assertEqual(book.fulltext, {
AssertionError: {'202[40 chars]t': 'þÿ <! D O C T Y P E h t m l > <h t m l > [122 chars]>'}}} != {'202[40 chars]t': 'English 中文'}}}
- {'20200101000000000': {'index.html': {'content': 'þÿ <! D O C T Y P E h t m l '
?                                                   ------------------ ^  ^^^^^^

+ {'20200101000000000': {'index.html': {'content': 'English 中文'}}}
?                                                    ^^^^^  ^^ +++

-                                                  '> <h t m l > <h e a d > <m e '
-                                                  't a c h a r s e t = " U T F '
-                                                  '- 8 " > </ h e a d > <b o d '
-                                                  'y > E n g l i s h N-e\x87 </ '
-                                                  'b o d y > </ h t m l >'}}}

======================================================================
FAIL: test_html_iframe_srcdoc01 (tests.test_scrapbook_cache.TestFulltextCacheGenerator.test_html_iframe_srcdoc01)
Include srcdoc content
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/PyWebScrapBook-1.16.0/tests/test_scrapbook_cache.py", line 2246, in test_html_iframe_srcdoc01
    self.assertEqual(book.fulltext, {
AssertionError: {'202[40 chars]t': 'XYZ987 ä¸\xadæ\x96\x87'}, 'linked.html[35 chars].'}}} != {'202[40 chars]t': 'XYZ987 中文'}, 'linked.html': {'content': '[19 chars].'}}}
- {'20200101000000000': {'index.html': {'content': 'XYZ987 ä¸\xadæ\x96\x87'},
?                                                   ---       ^^^^^^^^^^^^^^^

+ {'20200101000000000': {'index.html': {'content': 'XYZ987 中文'},
?                                                          ^^

                         'linked.html': {'content': 'Linked page content.'}}}

----------------------------------------------------------------------
Ran 1168 tests in 2.420s

FAILED (failures=3, skipped=26)


For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.13/fedora-rawhide-x86_64/06692265-python-webscrapbook/

For all our attempts to build python-webscrapbook with Python 3.13, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.13/package/python-webscrapbook/

Testing and mass rebuild of packages is happening in copr.
You can follow these instructions to test locally in mock if your package builds with Python 3.13:
https://copr.fedorainfracloud.org/coprs/g/python/python3.13/

Comment 1 Aoife Moloney 2024-02-15 23:06:12 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 40 development cycle.
Changing version to 40.

Comment 2 "FeRD" (Frank Dana) 2024-09-23 14:27:45 UTC
Python-webscrapbook has since been updated to a git commit based on version 2.3.3, but with unreleased fixes in place for the broken tests.