Hide Forgot
Description of problem: I have a fresh install of RHEL 8.2 updated. Then I installed the latest cuda version from the official site. I have a 1660 graphics card. <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=RHEL&target_version=8&target_type=rpmnetwork> When I restart after I install cuda GDM crashes with the “Oh no! Something has gone wrong.” screen. If I boot up in multiuser mode everything works as expected. I can even run “startx” and get to the desktop just fine. I uncommented the line “WaylandEnable=false” in /etc/gdm/custom.conf with no change. I have also posted on the Nvidia Forums, but was not able resolve the problem. <https://forums.developer.nvidia.com/t/centos-8-2-gdm-crashes-after-cuda-11-install/129136/9> I have this same problem on Centos 8 and filed a bug report there <https://bugs.centos.org/view.php?id=17538> The nvidia driver works without a problem on Cedntos 7. Please let me know if there is any other information I can get to help resolve this problem. How reproducible: everytime. Steps to Reproduce: 1. Install using the latest RHEL 8.2 ISO. 2. Update all packages using dnf 3. Install cuda using the latest instructions at <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=8&target_type=rpmnetwork> 4. Restart and GDM crashes and leaves the user at the “Oh no! Something has gone wrong.” screen. 5. Booting into multiuser mode works as expected. startx gets to the desktop fine, but if you try to start gdm you get the same Oh no! Something has gone wrong.” screen. Actual results: GDM crashes with the “Oh no! Something has gone wrong.” screen. Expected results: login screen comes up as normal. Additional info:
Can you put Enable=true in the [debug] section of /etc/gdm/custom.conf, reboot, reproduce and then attach the full output of journalctl ?
Created attachment 1698949 [details] journalctl output after enabling debug for GDM
I included output. A cursory look and it seems that it may be an SELinux problem.
indeed. if you (as a test), boot into multiuser target, then setenforce 0, then systemctl isolate graphical.target, do things start working? I'm going to move this to the selinux-policy component, but they may refer you to the vendor. in the interm, it looks like the journal output provides a way to produce an selinux module locally that works around the issue.
setenforce 0, then systemctl isolate graphical.target allows the login screen to come up normally. I think that confirms it is a bug related to selinux.
The attached journal file contains several messages like this: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted) which means that SELinux denials happened, but detailed information about these denials is missing. Please run the following command (on the tested machine), which collects SELinux denials with all the details: # ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today Thank you.
Created attachment 1698990 [details] Output of # ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today
Please let me know if there is anything I can do for you.
I've also reported this on the ELRepo issue tracker: https://elrepo.org/bugs/view.php?id=1022 Is this something that we can/should expect NVIDIA to resolve, or will it fall down to the users/packagers to do this? My SELinux knowledge is not that good. ======================================== ***** Plugin restorecon (92.2 confidence) suggests ************************ If you want to fix the label. /memfd:/.nvidia_drv.XXXXXX (deleted) default label should be default_t. Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly. Do # /sbin/restorecon -v /memfd:/.nvidia_drv.XXXXXX (deleted) ***** Plugin catchall_boolean (7.83 confidence) suggests ****************** If you want to allow domain to can mmap files Then you must tell SELinux about this by enabling the 'domain_can_mmap_files' boolean. Do setsebool -P domain_can_mmap_files 1 ***** Plugin catchall (1.41 confidence) suggests ************************** If you believe that gnome-session-c should be allowed map access on the .nvidia_drv.XXXXXX (deleted) file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'gnome-session-c' --raw | audit2allow -M my-gnomesessionc # semodule -X 300 -i my-gnomesessionc.pp ========================================
Setting the SELinux boolean domain_can_mmap_files works around the issue: setsebool -P domain_can_mmap_files 1 Michael Rochefort is also testing the following local SELinux policy module generated by audit2allow: # cat nvidialocal.te module nvidialocal 1.0; require { type xserver_tmpfs_t; type xdm_t; class file map; } #============= xdm_t ============== allow xdm_t xserver_tmpfs_t:file map;
I think this is the issue I referred to here: https://github.com/fedora-selinux/selinux-policy/pull/312 This policy change probably needs to be merged to RHEL.
Is there any kind of ETA when this might get fixed in RHEL? It looks like Aaron Plattner has a solution that was merged for fedora.
This bug has not been acknowledged by the selinux team to be resolved during the RHEL 8.3 development and testing phase, so it will be evaluated for inclusion into the next minor product update. Please also note this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution.
Thank you - I have raised a support case for this issue. Given the fix has been in fedora for the last 6 months, it would be nice if this could be resolved quickly for anyone using NVIDIA hardware.
*** Bug 1859804 has been marked as a duplicate of this bug. ***
*** Bug 1868605 has been marked as a duplicate of this bug. ***
I just verfied on RHEL8.3 pre-test version, 4.18.0-221.el8.x86_64, still found such issues. Please Zdenek check which RHEL8.3 will merge such change. Regards, Jiqi.
This bug was addressed in the selinux-policy-3.14.3-51 package.
Could you please confirm which release version RHEL will include selinux-policy-3.14.3-51 rpm package, Thanks.
(In reply to Jiqi Li from comment #32) > Could you please confirm which release version RHEL will include > selinux-policy-3.14.3-51 rpm package, Thanks. The fix for this bugzilla is expected to be delivered as a part of RHEL 8.3 GA. It was also scheduled for the next RHEL 8.2 batch update.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (selinux-policy bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4528