Bug 2015027 - atopgpu fails to start because pynvml is not installed
Summary: atopgpu fails to start because pynvml is not installed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: atop
Version: epel8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Gwyn Ciesla
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-18 09:18 UTC by jbdenis
Modified: 2022-01-20 12:08 UTC (History)
2 users (show)

Fixed In Version: atop-2.7.1-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-20 12:08:00 UTC
Type: Bug


Attachments (Terms of Use)

Description jbdenis 2021-10-18 09:18:28 UTC
Description of problem:

atopgpu fails to start because pynvml is not installed. python3-py3nvml is installed as a dependencie though.


Version-Release number of selected component (if applicable):

atop-2.6.0-6.el8.x86_64

How reproducible:

Always

Steps to Reproduce:

# yum install atop

# systemctl start atopgpu.service
Job for atopgpu.service failed because the control process exited with error code.
See "systemctl status atopgpu.service" and "journalctl -xe" for details.

# # systemctl status atopgpu.service
● atopgpu.service - Atop GPU stats daemon
   Loaded: loaded (/usr/lib/systemd/system/atopgpu.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2021-10-18 11:05:20 CEST; 1s ago
     Docs: man:atopgpud(8)
  Process: 191751 ExecStart=/usr/sbin/atopgpud (code=exited, status=1/FAILURE)
 Main PID: 191751 (code=exited, status=1/FAILURE)

Oct 18 11:05:20 maestro-3002 systemd[1]: Starting Atop GPU stats daemon...
Oct 18 11:05:20 maestro-3002 atopgpud[191751]: atopgpud ERROR: Python module 'pynvml' not installed!
Oct 18 11:05:20 maestro-3002 systemd[1]: atopgpu.service: Main process exited, code=exited, status=1/FAILURE
Oct 18 11:05:20 maestro-3002 systemd[1]: atopgpu.service: Failed with result 'exit-code'.
Oct 18 11:05:20 maestro-3002 systemd[1]: Failed to start Atop GPU stats daemon.

pynvml is not available:

# python3 -c "import pynvml"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pynvml'

But python3-py3nvml
# python3 -c "import py3nvml"

My guess is that py3nvml is not a drop-in replacement for nvidia-ml-py. Atop upstream mentions nvidia-ml-py as a dependencie, not py3nvml. 

I've tried to modified the import naively with:

#import pynvml
import py3nvml.py3nvml as pynvml

But it is not that simple (from journalctl -u atopgpu):

systemd[1]: Starting Atop GPU stats daemon...
atopgpud[192025]: atopgpud INFO: Number of GPUs: 4
systemd[1]: Started Atop GPU stats daemon.
atopgpud[192024]: Traceback (most recent call last):
atopgpud[192024]:   File "/usr/sbin/atopgpud", line 583, in <module>
atopgpud[192024]:     main()
atopgpud[192024]:   File "/usr/sbin/atopgpud", line 288, in main
atopgpud[192024]:     gpulist.append( GpuProp(i) ) 
atopgpud[192024]:   File "/usr/sbin/atopgpud", line 74, in __init__
atopgpud[192024]:     self.stats.devname     = pynvml.nvmlDeviceGetName(gpuhandle).decode(
atopgpud[192024]: AttributeError: 'str' object has no attribute 'decode'

Using nvidia-ml-py-11.470.66 and installing it using "python3 setup.py install --prefix=/usr", everything works.

Comment 1 Gwyn Ciesla 2021-10-18 17:38:37 UTC
This breaks Fedora too, which I didn't realize since I lack an nvidia card. I'll package nvidia-ml-py.

Comment 2 Gwyn Ciesla 2021-10-18 19:59:19 UTC
We have a problem. Both nvidia-ml-py and py3nvml are Free Software, but they both, when used with atopgpud, require the libnvidia-ml library, which is not. I assume you installed this from rpmforge?

Comment 3 jbdenis 2021-10-19 17:36:08 UTC
It comes from the official nvidia-driver-devel package.

Comment 4 Gwyn Ciesla 2021-10-19 19:03:00 UTC
Ok. So that's not something we can ship in Fedora. What I should really do, then, is stop shipping atopgpud:

https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-packaged/#_packages_which_are_not_useful_without_external_code

I don't love it, but I can't force the atop developers to write code that doesn't rely on proprietary software.

Comment 5 jbdenis 2021-10-19 20:37:26 UTC
You're the one in control and I won't argue. But I do think that atopgpud should be shipped along its .service file somewhere, even if it is not working if you don't have the required nvidia software installed.

Comment 6 jbdenis 2021-10-20 07:26:39 UTC
> You're the one in control and I won't argue. But I do think that atopgpud should be shipped along its .service file somewhere, even if it is not working if you don't have the required nvidia software installed.

I'll argue ;) From my point of view, since atopgpu handles the missing library in its code with a clean exit, it does harm the system to keep it.

Comment 7 Gwyn Ciesla 2021-10-20 13:26:47 UTC
So you think removing it is the best way to go?

Comment 8 jbdenis 2021-10-20 13:30:59 UTC
Whoops. I meant it does *NOT* harm the system to keep it, sorry.

Comment 9 Fedora Update System 2022-01-11 16:44:56 UTC
FEDORA-EPEL-2022-8d256a0ff8 has been submitted as an update to Fedora EPEL 8. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2022-8d256a0ff8

Comment 10 Fedora Update System 2022-01-12 02:10:13 UTC
FEDORA-EPEL-2022-8d256a0ff8 has been pushed to the Fedora EPEL 8 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2022-8d256a0ff8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 11 Fedora Update System 2022-01-20 12:08:00 UTC
FEDORA-EPEL-2022-8d256a0ff8 has been pushed to the Fedora EPEL 8 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.