Bug 2357508

Summary: ocrmypdf fails to build with Python 3.14: multiprocessing.Process now starts with forkserver method instead of fork, causing pickling error
Product: [Fedora] Fedora Reporter: Karolina Surma <ksurma>
Component: ocrmypdfAssignee: Elliott Sales de Andrade <quantum.analyst>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: ksurma, mhroncok, python-packagers-sig, quantum.analyst
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2322407    

Description Karolina Surma 2025-04-04 15:34:24 UTC
ocrmypdf fails to build with Python 3.14.0a6.

_________________________________ test_semfree _________________________________
[gw2] linux -- Python 3.14.0 /usr/bin/python3

resources = PosixPath('/builddir/build/BUILD/ocrmypdf-16.7.0-build/ocrmypdf-16.7.0/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-mockbuild/pytest-0/popen-gw2/test_semfree0/out.pdf')

    @pytest.mark.skipif(not is_linux(), reason='semfree plugin only works on Linux')
    def test_semfree(resources, outpdf):
        exitcode = run_ocrmypdf_api(
            resources / 'multipage.pdf',
            outpdf,
            '--skip-text',
            '--skip-big',
            '2',
            '--plugin',
            'ocrmypdf.extra_plugins.semfree',
            '--plugin',
            'tests/plugins/tesseract_noop.py',
        )
>       assert exitcode in (ExitCode.ok, ExitCode.pdfa_conversion_failed)
E       assert <ExitCode.other_error: 15> in (<ExitCode.ok: 0>, <ExitCode.pdfa_conversion_failed: 10>)

tests/test_semfree.py:26: AssertionError
------------------------------ Captured log call -------------------------------
WARNING  ocrmypdf._pipeline:_pipeline.py:374 page too big, skipping OCR (81.0 MPixels > 2.0 MPixels --skip-big)
WARNING  ocrmypdf._pipeline:_pipeline.py:374 page too big, skipping OCR (2.0 MPixels > 2.0 MPixels --skip-big)
WARNING  ocrmypdf._metadata:_metadata.py:63 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
ERROR    ocrmypdf._pipelines._common:_common.py:296 An exception occurred while executing the pipeline
Traceback (most recent call last):
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 261, in cli_exception_handler
    return fn(options, plugin_manager)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
    pdf, messages = postprocess(pdf, context, executor)
                    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 460, in postprocess
    return optimize_pdf(pdf_out, context, executor)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipeline.py", line 984, in optimize_pdf
    output_pdf, messages = context.plugin_manager.hook.optimize_pdf(
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        input_pdf=input_file,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        linearize=should_linearize(input_file, context),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.14/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.14/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/builtin_plugins/optimize.py", line 145, in optimize_pdf
    result_path = optimize(input_pdf, output_pdf, context, save_settings, executor)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/optimize.py", line 705, in optimize
    deflate_jpegs(pdf, root, options, executor)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/optimize.py", line 575, in deflate_jpegs
    executor(
    ~~~~~~~~^
        use_threads=True,  # We're sharing the pdf directly, must use threads
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        task_finished=finish,
        ^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_concurrent.py", line 78, in __call__
    self._execute(
    ~~~~~~~~~~~~~^
        use_threads=use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        task_finished=task_finished,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/extra_plugins/semfree.py", line 157, in _execute
    process.start()
    ~~~~~~~~~~~~~^^
  File "/usr/lib64/python3.14/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ~~~~~~~~~~~^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/context.py", line 300, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.14/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
    ~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
TypeError: cannot pickle 'pikepdf._core.Pdf' object
when serializing tuple item 0
when serializing list item 0
when serializing tuple item 4
when serializing dict item '_args'
when serializing multiprocessing.context.Process state
when serializing multiprocessing.context.Process object

https://docs.python.org/3.14/whatsnew/3.14.html

The default start method changed from fork to forkserver on platforms other than macOS and Windows where it was already spawn.

If the threading incompatible fork method is required, you must explicitly request it via a context from multiprocessing.get_context() (preferred) or change the default via multiprocessing.set_start_method().

See forkserver restrictions for information and differences with the fork method and how this change may affect existing code with mutable global shared variables and/or shared objects that can not be automatically pickled.

For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.14/fedora-rawhide-x86_64/08848362-ocrmypdf/

For all our attempts to build ocrmypdf with Python 3.14, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.14/package/ocrmypdf/

Testing and mass rebuild of packages is happening in copr.
You can follow these instructions to test locally in mock if your package builds with Python 3.14:
https://copr.fedorainfracloud.org/coprs/g/python/python3.14/

Let us know here if you have any questions.

Python 3.14 is planned to be included in Fedora 43.
To make that update smoother, we're building Fedora packages with all pre-releases of Python 3.14.
A build failure prevents us from testing all dependent packages (transitive [Build]Requires),
so if this package is required a lot, it's important for us to get it fixed soon.

We'd appreciate help from the people who know this package best,
but if you don't want to work on this now, let us know so we can try to work around it on our side.