Bug 2357508 - ocrmypdf fails to build with Python 3.14: multiprocessing.Process now starts with forkserver method instead of fork, causing pickling error
Summary: ocrmypdf fails to build with Python 3.14: multiprocessing.Process now starts ...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: ocrmypdf
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Elliott Sales de Andrade
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PYTHON3.14
TreeView+ depends on / blocked
 
Reported: 2025-04-04 15:34 UTC by Karolina Surma
Modified: 2025-04-04 15:34 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Karolina Surma 2025-04-04 15:34:24 UTC
ocrmypdf fails to build with Python 3.14.0a6.

_________________________________ test_semfree _________________________________
[gw2] linux -- Python 3.14.0 /usr/bin/python3

resources = PosixPath('/builddir/build/BUILD/ocrmypdf-16.7.0-build/ocrmypdf-16.7.0/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-mockbuild/pytest-0/popen-gw2/test_semfree0/out.pdf')

    @pytest.mark.skipif(not is_linux(), reason='semfree plugin only works on Linux')
    def test_semfree(resources, outpdf):
        exitcode = run_ocrmypdf_api(
            resources / 'multipage.pdf',
            outpdf,
            '--skip-text',
            '--skip-big',
            '2',
            '--plugin',
            'ocrmypdf.extra_plugins.semfree',
            '--plugin',
            'tests/plugins/tesseract_noop.py',
        )
>       assert exitcode in (ExitCode.ok, ExitCode.pdfa_conversion_failed)
E       assert <ExitCode.other_error: 15> in (<ExitCode.ok: 0>, <ExitCode.pdfa_conversion_failed: 10>)

tests/test_semfree.py:26: AssertionError
------------------------------ Captured log call -------------------------------
WARNING  ocrmypdf._pipeline:_pipeline.py:374 page too big, skipping OCR (81.0 MPixels > 2.0 MPixels --skip-big)
WARNING  ocrmypdf._pipeline:_pipeline.py:374 page too big, skipping OCR (2.0 MPixels > 2.0 MPixels --skip-big)
WARNING  ocrmypdf._metadata:_metadata.py:63 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
ERROR    ocrmypdf._pipelines._common:_common.py:296 An exception occurred while executing the pipeline
Traceback (most recent call last):
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 261, in cli_exception_handler
    return fn(options, plugin_manager)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
    pdf, messages = postprocess(pdf, context, executor)
                    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 460, in postprocess
    return optimize_pdf(pdf_out, context, executor)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_pipeline.py", line 984, in optimize_pdf
    output_pdf, messages = context.plugin_manager.hook.optimize_pdf(
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        input_pdf=input_file,
        ^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        linearize=should_linearize(input_file, context),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.14/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.14/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/builtin_plugins/optimize.py", line 145, in optimize_pdf
    result_path = optimize(input_pdf, output_pdf, context, save_settings, executor)
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/optimize.py", line 705, in optimize
    deflate_jpegs(pdf, root, options, executor)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/optimize.py", line 575, in deflate_jpegs
    executor(
    ~~~~~~~~^
        use_threads=True,  # We're sharing the pdf directly, must use threads
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        task_finished=finish,
        ^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/_concurrent.py", line 78, in __call__
    self._execute(
    ~~~~~~~~~~~~~^
        use_threads=use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        task_finished=task_finished,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/builddir/build/BUILD/ocrmypdf-16.7.0-build/BUILDROOT/usr/lib/python3.14/site-packages/ocrmypdf/extra_plugins/semfree.py", line 157, in _execute
    process.start()
    ~~~~~~~~~~~~~^^
  File "/usr/lib64/python3.14/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ~~~~~~~~~~~^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/context.py", line 300, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.14/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
    ~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
TypeError: cannot pickle 'pikepdf._core.Pdf' object
when serializing tuple item 0
when serializing list item 0
when serializing tuple item 4
when serializing dict item '_args'
when serializing multiprocessing.context.Process state
when serializing multiprocessing.context.Process object

https://docs.python.org/3.14/whatsnew/3.14.html

The default start method changed from fork to forkserver on platforms other than macOS and Windows where it was already spawn.

If the threading incompatible fork method is required, you must explicitly request it via a context from multiprocessing.get_context() (preferred) or change the default via multiprocessing.set_start_method().

See forkserver restrictions for information and differences with the fork method and how this change may affect existing code with mutable global shared variables and/or shared objects that can not be automatically pickled.

For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.14/fedora-rawhide-x86_64/08848362-ocrmypdf/

For all our attempts to build ocrmypdf with Python 3.14, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.14/package/ocrmypdf/

Testing and mass rebuild of packages is happening in copr.
You can follow these instructions to test locally in mock if your package builds with Python 3.14:
https://copr.fedorainfracloud.org/coprs/g/python/python3.14/

Let us know here if you have any questions.

Python 3.14 is planned to be included in Fedora 43.
To make that update smoother, we're building Fedora packages with all pre-releases of Python 3.14.
A build failure prevents us from testing all dependent packages (transitive [Build]Requires),
so if this package is required a lot, it's important for us to get it fixed soon.

We'd appreciate help from the people who know this package best,
but if you don't want to work on this now, let us know so we can try to work around it on our side.


Note You need to log in before you can comment on or make changes to this bug.