Bug 1851448 - GDM crashes after cuda install
Summary: GDM crashes after cuda install
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: selinux-policy
Version: 8.2
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: 8.3
Assignee: Zdenek Pytela
QA Contact: Milos Malik
URL:
Whiteboard:
: 1859804 1868605 (view as bug list)
Depends On:
Blocks: 1712227 1712305 1725856 1796607 1866362
TreeView+ depends on / blocked
 
Reported: 2020-06-26 14:25 UTC by Trevor Clark
Modified: 2020-09-04 14:39 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1866362 (view as bug list)
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
journalctl output after enabling debug for GDM (6.02 MB, text/plain)
2020-06-26 18:35 UTC, Trevor Clark
no flags Details
Output of # ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today (2.31 KB, text/plain)
2020-06-27 18:06 UTC, Trevor Clark
no flags Details

Description Trevor Clark 2020-06-26 14:25:36 UTC
Description of problem:

I have a fresh install of RHEL 8.2 updated. Then I installed the latest cuda version from the official site. I have a 1660 graphics card.

<https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=RHEL&target_version=8&target_type=rpmnetwork>

When I restart after I install cuda GDM crashes with the “Oh no! Something has gone wrong.” screen. If I boot up in multiuser mode everything works as expected. I can even run “startx” and get to the desktop just fine. I uncommented the line “WaylandEnable=false” in /etc/gdm/custom.conf with no change.

I have also posted on the Nvidia Forums, but was not able resolve the problem.

<https://forums.developer.nvidia.com/t/centos-8-2-gdm-crashes-after-cuda-11-install/129136/9>

I have this same problem on Centos 8 and filed a bug report there

<https://bugs.centos.org/view.php?id=17538>

The nvidia driver works without a problem on Cedntos 7.

Please let me know if there is any other information I can get to help resolve this problem.


How reproducible:

everytime.

Steps to Reproduce:

1. Install using the latest RHEL 8.2 ISO.

2. Update all packages using dnf

3. Install cuda using the latest instructions at <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=8&target_type=rpmnetwork>

4. Restart and GDM crashes and leaves the user at the “Oh no! Something has gone wrong.” screen.

5. Booting into multiuser mode works as expected. startx gets to the desktop fine, but if you try to start gdm you get the same Oh no! Something has gone wrong.” screen.

Actual results:

GDM crashes with the “Oh no! Something has gone wrong.” screen.


Expected results:

login screen comes up as normal.


Additional info:

Comment 1 Ray Strode [halfline] 2020-06-26 18:06:02 UTC
Can you put

Enable=true

in the [debug] section of /etc/gdm/custom.conf, reboot, reproduce and then attach the full output of journalctl ?

Comment 2 Trevor Clark 2020-06-26 18:35:00 UTC
Created attachment 1698949 [details]
journalctl output after enabling debug for GDM

Comment 3 Trevor Clark 2020-06-26 18:35:42 UTC
I included output. A cursory look and it seems that it may be an SELinux problem.

Comment 4 Ray Strode [halfline] 2020-06-26 19:28:25 UTC
indeed. if you (as a test), boot into multiuser target, then setenforce 0, then systemctl isolate graphical.target, do things start working?

I'm going to move this to the selinux-policy component, but they may refer you to the vendor. in the interm, it looks like the journal output provides a way to produce an selinux module locally that works around the issue.

Comment 5 Trevor Clark 2020-06-26 22:51:14 UTC
setenforce 0, then systemctl isolate graphical.target allows the login screen to come up normally. I think that confirms it is a bug related to selinux.

Comment 6 Milos Malik 2020-06-27 04:42:45 UTC
The attached journal file contains several messages like this:

  SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted)

which means that SELinux denials happened, but detailed information about these denials is missing.

Please run the following command (on the tested machine), which collects SELinux denials with all the details:

# ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today

Thank you.

Comment 7 Trevor Clark 2020-06-27 18:06:44 UTC
Created attachment 1698990 [details]
Output of # ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today

Comment 8 Trevor Clark 2020-06-27 18:07:11 UTC
Please let me know if there is anything I can do for you.

Comment 9 Michael Rochefort 2020-07-14 17:53:21 UTC
I've also reported this on the ELRepo issue tracker: https://elrepo.org/bugs/view.php?id=1022

Is this something that we can/should expect NVIDIA to resolve, or will it fall down to the users/packagers to do this? My SELinux knowledge is not that good.

========================================
*****  Plugin restorecon (92.2 confidence) suggests   ************************

If you want to fix the label. 
/memfd:/.nvidia_drv.XXXXXX (deleted) default label should be default_t.
Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly.
Do
# /sbin/restorecon -v /memfd:/.nvidia_drv.XXXXXX (deleted)

*****  Plugin catchall_boolean (7.83 confidence) suggests   ******************

If you want to allow domain to can mmap files
Then you must tell SELinux about this by enabling the 'domain_can_mmap_files' boolean.

Do
setsebool -P domain_can_mmap_files 1

*****  Plugin catchall (1.41 confidence) suggests   **************************

If you believe that gnome-session-c should be allowed map access on the .nvidia_drv.XXXXXX (deleted) file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'gnome-session-c' --raw | audit2allow -M my-gnomesessionc
# semodule -X 300 -i my-gnomesessionc.pp
========================================

Comment 10 Phil Perry 2020-07-14 21:31:58 UTC
Setting the SELinux boolean domain_can_mmap_files works around the issue:

setsebool -P domain_can_mmap_files 1

Michael Rochefort is also testing the following local SELinux policy module generated by audit2allow:

# cat nvidialocal.te

module nvidialocal 1.0;

require {
	type xserver_tmpfs_t;
	type xdm_t;
	class file map;
}

#============= xdm_t ==============
allow xdm_t xserver_tmpfs_t:file map;

Comment 11 Aaron Plattner 2020-07-15 02:56:40 UTC
I think this is the issue I referred to here: https://github.com/fedora-selinux/selinux-policy/pull/312

This policy change probably needs to be merged to RHEL.

Comment 12 Trevor Clark 2020-07-22 18:05:49 UTC
Is there any kind of ETA when this might get fixed in RHEL? It looks like Aaron Plattner has a solution that was merged for fedora.

Comment 13 Zdenek Pytela 2020-07-23 07:55:48 UTC
This bug has not been acknowledged by the selinux team to be resolved during the RHEL 8.3 development and testing phase, so it will be evaluated for inclusion into the next minor product update.

Please also note this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution.

Comment 14 Phil Perry 2020-07-24 14:58:44 UTC
Thank you - I have raised a support case for this issue.

Given the fix has been in fedora for the last 6 months, it would be nice if this could be resolved quickly for anyone using NVIDIA hardware.

Comment 15 Zdenek Pytela 2020-07-27 08:11:13 UTC
*** Bug 1859804 has been marked as a duplicate of this bug. ***

Comment 29 Zdenek Pytela 2020-08-20 10:20:15 UTC
*** Bug 1868605 has been marked as a duplicate of this bug. ***

Comment 30 Jiqi Li 2020-08-20 11:26:38 UTC
I just verfied on RHEL8.3 pre-test version, 4.18.0-221.el8.x86_64, still found such issues.

Please Zdenek check which RHEL8.3 will merge such change.

Regards,
Jiqi.

Comment 31 Zdenek Pytela 2020-08-20 13:39:16 UTC
This bug was addressed in the selinux-policy-3.14.3-51 package.

Comment 32 Jiqi Li 2020-08-21 10:37:54 UTC
Could you please confirm which release version RHEL will include selinux-policy-3.14.3-51 rpm package, Thanks.

Comment 33 Zdenek Pytela 2020-08-24 08:09:03 UTC
(In reply to Jiqi Li from comment #32)
> Could you please confirm which release version RHEL will include
> selinux-policy-3.14.3-51 rpm package, Thanks.

The fix for this bugzilla is expected to be delivered as a part of RHEL 8.3 GA. It was also scheduled for the next RHEL 8.2 batch update.


Note You need to log in before you can comment on or make changes to this bug.