Description of problem:
atopgpu fails to start because pynvml is not installed. python3-py3nvml is installed as a dependencie though.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
# yum install atop
# systemctl start atopgpu.service
Job for atopgpu.service failed because the control process exited with error code.
See "systemctl status atopgpu.service" and "journalctl -xe" for details.
# # systemctl status atopgpu.service
● atopgpu.service - Atop GPU stats daemon
Loaded: loaded (/usr/lib/systemd/system/atopgpu.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2021-10-18 11:05:20 CEST; 1s ago
Process: 191751 ExecStart=/usr/sbin/atopgpud (code=exited, status=1/FAILURE)
Main PID: 191751 (code=exited, status=1/FAILURE)
Oct 18 11:05:20 maestro-3002 systemd: Starting Atop GPU stats daemon...
Oct 18 11:05:20 maestro-3002 atopgpud: atopgpud ERROR: Python module 'pynvml' not installed!
Oct 18 11:05:20 maestro-3002 systemd: atopgpu.service: Main process exited, code=exited, status=1/FAILURE
Oct 18 11:05:20 maestro-3002 systemd: atopgpu.service: Failed with result 'exit-code'.
Oct 18 11:05:20 maestro-3002 systemd: Failed to start Atop GPU stats daemon.
pynvml is not available:
# python3 -c "import pynvml"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pynvml'
# python3 -c "import py3nvml"
My guess is that py3nvml is not a drop-in replacement for nvidia-ml-py. Atop upstream mentions nvidia-ml-py as a dependencie, not py3nvml.
I've tried to modified the import naively with:
import py3nvml.py3nvml as pynvml
But it is not that simple (from journalctl -u atopgpu):
systemd: Starting Atop GPU stats daemon...
atopgpud: atopgpud INFO: Number of GPUs: 4
systemd: Started Atop GPU stats daemon.
atopgpud: Traceback (most recent call last):
atopgpud: File "/usr/sbin/atopgpud", line 583, in <module>
atopgpud: File "/usr/sbin/atopgpud", line 288, in main
atopgpud: gpulist.append( GpuProp(i) )
atopgpud: File "/usr/sbin/atopgpud", line 74, in __init__
atopgpud: self.stats.devname = pynvml.nvmlDeviceGetName(gpuhandle).decode(
atopgpud: AttributeError: 'str' object has no attribute 'decode'
Using nvidia-ml-py-11.470.66 and installing it using "python3 setup.py install --prefix=/usr", everything works.
This breaks Fedora too, which I didn't realize since I lack an nvidia card. I'll package nvidia-ml-py.
We have a problem. Both nvidia-ml-py and py3nvml are Free Software, but they both, when used with atopgpud, require the libnvidia-ml library, which is not. I assume you installed this from rpmforge?
It comes from the official nvidia-driver-devel package.
Ok. So that's not something we can ship in Fedora. What I should really do, then, is stop shipping atopgpud:
I don't love it, but I can't force the atop developers to write code that doesn't rely on proprietary software.
You're the one in control and I won't argue. But I do think that atopgpud should be shipped along its .service file somewhere, even if it is not working if you don't have the required nvidia software installed.
> You're the one in control and I won't argue. But I do think that atopgpud should be shipped along its .service file somewhere, even if it is not working if you don't have the required nvidia software installed.
I'll argue ;) From my point of view, since atopgpu handles the missing library in its code with a clean exit, it does harm the system to keep it.
So you think removing it is the best way to go?
Whoops. I meant it does *NOT* harm the system to keep it, sorry.
FEDORA-EPEL-2022-8d256a0ff8 has been submitted as an update to Fedora EPEL 8. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2022-8d256a0ff8
FEDORA-EPEL-2022-8d256a0ff8 has been pushed to the Fedora EPEL 8 testing repository.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2022-8d256a0ff8
See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-EPEL-2022-8d256a0ff8 has been pushed to the Fedora EPEL 8 stable repository.
If problem still persists, please make note of it in this bug report.