Bug 2253018 - python-torch: fails to import in Rawhide/F40
Summary: python-torch: fails to import in Rawhide/F40
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: python-torch
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Tom Rix
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-12-05 15:54 UTC by Ben Beasley
Modified: 2023-12-21 13:55 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-12-21 13:55:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Ben Beasley 2023-12-05 15:54:42 UTC
Trying to import torch fails.

Reproducible: Always

Steps to Reproduce:
1. mock -r fedora-rawhide-x86_64 --clean
2. mock -r fedora-rawhide-x86_64 -i python3-torch
3. mock -r fedora-rawhide-x86_64 --chroot 'python3 -c "import torch"'
Actual Results:  
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python3.12/site-packages/torch/__init__.py", line 234, in <module>
    _load_global_deps()
  File "/usr/lib64/python3.12/site-packages/torch/__init__.py", line 193, in _load_global_deps
    raise err
  File "/usr/lib64/python3.12/site-packages/torch/__init__.py", line 174, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib64/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /usr/lib64/python3.12/site-packages/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory

Expected Results:  
Successful import; no output

Comment 1 Tom Rix 2023-12-05 16:27:11 UTC
BuildRequires: pytorch-devel
should fix this.

Comment 2 Ben Beasley 2023-12-05 16:58:55 UTC
Thanks for your response.

It looks like the problem is because the build was patched to version these .so files. I don’t think that should be necessary because the libraries in question are not in the linker search path. See the paragraph beginning “As an additional complication, […]” in https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages. Unversioned .so files in /usr/lib64/python3.12/site-packages/torch/lib/ should not be any more problematic than compiled Python extension files like /usr/lib64/python3.12/site-packages/torch/_C.cpython-312-x86_64-linux-gnu.so. These are part of the Python package and are not system shared libraries.

If we must version these shared libraries for some reason, then I think the ctypes code in PyTorch should be patched so that it can find the versioned .so files.

It doesn’t make sense for “import torch” to fail when the devel package is not installed, and it doesn’t make sense to require all dependent packages to add both BuildRequires: python3-torch-devel and Requires: python3-torch-devel. The python3-torch package should handle its own dependencies one way or another.

Comment 3 Tom Rix 2023-12-12 14:33:43 UTC
An earlier version did not have a devel package, but because c-headers are delivered we have a devel package.
So i am conflicted on how best to solve this.
Would having just the main package be acceptable ?

Comment 4 Tom Rix 2023-12-13 16:04:48 UTC
Can you give this a try ?
https://koji.fedoraproject.org/koji/taskinfo?taskID=110278523

Comment 5 Ben Beasley 2023-12-13 19:17:00 UTC
I didn’t investigate *what* you changed, but I tested the RPM:

$ mock -r fedora-rawhide-x86_64 --clean && mock -r fedora-rawhide-x86_64 -i ./python3-torch-2.1.0-11.fc40.x86_64.rpm && mock -r fedora-rawhide-x86_64 --chroot "python3 -c 'import torch'"

Result:

Start: chroot ["python3 -c 'import torch'"]
/usr/lib64/python3.12/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /builddir/build/BUILD/pytorch-v2.1.0/torch/csrc/utils/tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),

That looks OK.

Then I tried it with https://src.fedoraproject.org/rpms/python-ratinabox, which has optional PyTorch support. There are no tests for the PyTorch support, but I am able to do the %pyproject_check_import smoke test without excluding the modules that import PyTorch.

So I can at least say that the Koji build you linked does fix the import failure reported here.

Comment 6 Tom Rix 2023-12-17 14:29:59 UTC
Here is the official build
https://koji.fedoraproject.org/koji/buildinfo?buildID=2333559

Comment 7 Ben Beasley 2023-12-17 16:51:38 UTC
Thank you!


Note You need to log in before you can comment on or make changes to this bug.