Bug 2458687 - ocrmypdf fails to build with Python 3.15: test_xml_metadata_preserved: AssertionError: acquired unexpected property dc:title with value 'Untitled'
Summary: ocrmypdf fails to build with Python 3.15: test_xml_metadata_preserved: Assert...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: ocrmypdf
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Elliott Sales de Andrade
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2485727 (view as bug list)
Depends On:
Blocks: PYTHON3.15 F45FTBFS, RAWHIDEFTBFS F45FailsToInstall, RAWHIDEFailsToInstall
TreeView+ depends on / blocked
 
Reported: 2026-04-15 14:18 UTC by Karolina Surma
Modified: 2026-06-06 18:28 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Karolina Surma 2026-04-15 14:18:03 UTC
ocrmypdf fails to build with Python 3.15.0a8.

_________________ test_xml_metadata_preserved[3small.pdf-pdfa] _________________
[gw0] linux -- Python 3.15.0 /usr/bin/python3

libxmp_file_to_dict = <function file_to_dict at 0x7fc5f8c1d430>
test_file = '3small.pdf', output_type = 'pdfa'
resources = PosixPath('/builddir/build/BUILD/ocrmypdf-16.12.0-build/ocrmypdf-16.12.0/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_xml_metadata_preserved_3s0/out.pdf')

    @pytest.mark.parametrize(
        'test_file,output_type',
        [
            ('graph.pdf', 'pdf'),  # PDF with full metadata
            ('graph.pdf', 'pdfa'),  # PDF/A with full metadata
            ('overlay.pdf', 'pdfa'),  # /Title()
            ('3small.pdf', 'pdfa'),
        ],
    )
    def test_xml_metadata_preserved(
        libxmp_file_to_dict, test_file, output_type, resources, outpdf
    ):
        input_file = resources / test_file
    
        before = libxmp_file_to_dict(str(input_file))
    
        check_ocrmypdf(
            input_file,
            outpdf,
            '--output-type',
            output_type,
            '--skip-text',
            '--plugin',
            'tests/plugins/tesseract_noop.py',
        )
    
        after = libxmp_file_to_dict(str(outpdf))
    
        equal_properties = [
            'dc:contributor',
            'dc:coverage',
            'dc:creator',
            'dc:description',
            'dc:format',
            'dc:identifier',
            'dc:language',
            'dc:publisher',
            'dc:relation',
            'dc:rights',
            'dc:source',
            'dc:subject',
            'dc:title',
            'dc:type',
            'pdf:keywords',
        ]
        acquired_properties = ['dc:format']
    
        # Cleanup messy data structure
        # Top level is key-value mapping of namespaces to keys under namespace,
        # so we put everything in the same namespace
        def unify_namespaces(xmpdict):
            for entries in xmpdict.values():
                yield from entries
    
        # Now we have a list of (key, value, {infodict}). We don't care about
        # infodict. Just flatten to keys and values
        def keyval_from_tuple(list_of_tuples):
            for k, v, *_ in list_of_tuples:
                yield k, v
    
        before = dict(keyval_from_tuple(unify_namespaces(before)))
        after = dict(keyval_from_tuple(unify_namespaces(after)))
    
        for prop in equal_properties:
            if prop in before:
                assert prop in after, f'{prop} dropped from xmp'
                assert before[prop] == after[prop]
    
            # libxmp presents multivalued entries (e.g. dc:title) as:
            # 'dc:title': '' <- there's a title
            # 'dc:title[1]: 'The Title' <- the actual title
            # 'dc:title[1]/?xml:lang': 'x-default' <- language info
            propidx = f'{prop}[1]'
            if propidx in before:
                assert (
                    after.get(propidx) == before[propidx]
                    or after.get(prop) == before[propidx]
                )
    
            if prop in after and prop not in before:
>               assert prop in acquired_properties, (
                    f"acquired unexpected property {prop} with value "
                    f"{after.get(propidx) or after.get(prop)}"
                )
E               AssertionError: acquired unexpected property dc:title with value 'Untitled'
E               assert 'dc:title' in ['dc:format']
https://docs.python.org/3.15/whatsnew/3.15.html

For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.15/fedora-rawhide-x86_64/10318578-ocrmypdf/

For all our attempts to build ocrmypdf with Python 3.15, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.15/package/ocrmypdf/

Testing and mass rebuild of packages is happening in copr.
You can follow these instructions to test locally in mock if your package builds with Python 3.15:
https://copr.fedorainfracloud.org/coprs/g/python/python3.15/

Let us know here if you have any questions.

Python 3.15 is planned to be included in Fedora 45.
To make that update smoother, we're building Fedora packages with all pre-releases of Python 3.15.
A build failure prevents us from testing all dependent packages (transitive [Build]Requires),
so if this package is required a lot, it's important for us to get it fixed soon.

We'd appreciate help from the people who know this package best,
but if you don't want to work on this now, let us know so we can try to work around it on our side.

Comment 1 Karolina Surma 2026-06-06 18:20:21 UTC
*** Bug 2485727 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.