1851448 – GDM crashes after cuda install

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1851448 - GDM crashes after cuda install

Summary: GDM crashes after cuda install

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	selinux-policy
Sub Component:
Version:	8.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	8.3
Assignee:	Zdenek Pytela
QA Contact:	Milos Malik
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1859804 1868605 (view as bug list)
Depends On:
Blocks:	1712227 1712305 1725856 1796607 1866362
TreeView+	depends on / blocked

Reported:	2020-06-26 14:25 UTC by Trevor Clark
Modified:	2023-12-15 18:19 UTC (History)
CC List:	25 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1866362 (view as bug list)
Environment:
Last Closed:	2020-11-04 01:56:46 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
journalctl output after enabling debug for GDM (6.02 MB, text/plain) 2020-06-26 18:35 UTC, Trevor Clark	no flags	Details
Output of # ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today (2.31 KB, text/plain) 2020-06-27 18:06 UTC, Trevor Clark	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:4528	0	None	None	None	2020-11-04 01:57:16 UTC

Description Trevor Clark 2020-06-26 14:25:36 UTC

Description of problem:

I have a fresh install of RHEL 8.2 updated. Then I installed the latest cuda version from the official site. I have a 1660 graphics card.

<https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=RHEL&target_version=8&target_type=rpmnetwork>

When I restart after I install cuda GDM crashes with the “Oh no! Something has gone wrong.” screen. If I boot up in multiuser mode everything works as expected. I can even run “startx” and get to the desktop just fine. I uncommented the line “WaylandEnable=false” in /etc/gdm/custom.conf with no change.

I have also posted on the Nvidia Forums, but was not able resolve the problem.

<https://forums.developer.nvidia.com/t/centos-8-2-gdm-crashes-after-cuda-11-install/129136/9>

I have this same problem on Centos 8 and filed a bug report there

<https://bugs.centos.org/view.php?id=17538>

The nvidia driver works without a problem on Cedntos 7.

Please let me know if there is any other information I can get to help resolve this problem.


How reproducible:

everytime.

Steps to Reproduce:

1. Install using the latest RHEL 8.2 ISO.

2. Update all packages using dnf

3. Install cuda using the latest instructions at <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=8&target_type=rpmnetwork>

4. Restart and GDM crashes and leaves the user at the “Oh no! Something has gone wrong.” screen.

5. Booting into multiuser mode works as expected. startx gets to the desktop fine, but if you try to start gdm you get the same Oh no! Something has gone wrong.” screen.

Actual results:

GDM crashes with the “Oh no! Something has gone wrong.” screen.


Expected results:

login screen comes up as normal.


Additional info:

Comment 1 Ray Strode [halfline] 2020-06-26 18:06:02 UTC

Can you put

Enable=true

in the [debug] section of /etc/gdm/custom.conf, reboot, reproduce and then attach the full output of journalctl ?

Comment 2 Trevor Clark 2020-06-26 18:35:00 UTC

Created attachment 1698949 [details]
journalctl output after enabling debug for GDM

Comment 3 Trevor Clark 2020-06-26 18:35:42 UTC

I included output. A cursory look and it seems that it may be an SELinux problem.

Comment 4 Ray Strode [halfline] 2020-06-26 19:28:25 UTC

indeed. if you (as a test), boot into multiuser target, then setenforce 0, then systemctl isolate graphical.target, do things start working?

I'm going to move this to the selinux-policy component, but they may refer you to the vendor. in the interm, it looks like the journal output provides a way to produce an selinux module locally that works around the issue.

Comment 5 Trevor Clark 2020-06-26 22:51:14 UTC

setenforce 0, then systemctl isolate graphical.target allows the login screen to come up normally. I think that confirms it is a bug related to selinux.

Comment 6 Milos Malik 2020-06-27 04:42:45 UTC

The attached journal file contains several messages like this:

  SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted)

which means that SELinux denials happened, but detailed information about these denials is missing.

Please run the following command (on the tested machine), which collects SELinux denials with all the details:

# ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today

Thank you.

Comment 7 Trevor Clark 2020-06-27 18:06:44 UTC

Created attachment 1698990 [details]
Output of # ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today

Comment 8 Trevor Clark 2020-06-27 18:07:11 UTC

Please let me know if there is anything I can do for you.

Comment 9 Mike Rochefort 2020-07-14 17:53:21 UTC

I've also reported this on the ELRepo issue tracker: https://elrepo.org/bugs/view.php?id=1022

Is this something that we can/should expect NVIDIA to resolve, or will it fall down to the users/packagers to do this? My SELinux knowledge is not that good.

========================================
*****  Plugin restorecon (92.2 confidence) suggests   ************************

If you want to fix the label. 
/memfd:/.nvidia_drv.XXXXXX (deleted) default label should be default_t.
Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly.
Do
# /sbin/restorecon -v /memfd:/.nvidia_drv.XXXXXX (deleted)

*****  Plugin catchall_boolean (7.83 confidence) suggests   ******************

If you want to allow domain to can mmap files
Then you must tell SELinux about this by enabling the 'domain_can_mmap_files' boolean.

Do
setsebool -P domain_can_mmap_files 1

*****  Plugin catchall (1.41 confidence) suggests   **************************

If you believe that gnome-session-c should be allowed map access on the .nvidia_drv.XXXXXX (deleted) file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'gnome-session-c' --raw | audit2allow -M my-gnomesessionc
# semodule -X 300 -i my-gnomesessionc.pp
========================================

Comment 10 Phil Perry 2020-07-14 21:31:58 UTC

Setting the SELinux boolean domain_can_mmap_files works around the issue:

setsebool -P domain_can_mmap_files 1

Michael Rochefort is also testing the following local SELinux policy module generated by audit2allow:

# cat nvidialocal.te

module nvidialocal 1.0;

require {
	type xserver_tmpfs_t;
	type xdm_t;
	class file map;
}

#============= xdm_t ==============
allow xdm_t xserver_tmpfs_t:file map;

Comment 11 Aaron Plattner 2020-07-15 02:56:40 UTC

I think this is the issue I referred to here: https://github.com/fedora-selinux/selinux-policy/pull/312

This policy change probably needs to be merged to RHEL.

Comment 12 Trevor Clark 2020-07-22 18:05:49 UTC

Is there any kind of ETA when this might get fixed in RHEL? It looks like Aaron Plattner has a solution that was merged for fedora.

Comment 13 Zdenek Pytela 2020-07-23 07:55:48 UTC

This bug has not been acknowledged by the selinux team to be resolved during the RHEL 8.3 development and testing phase, so it will be evaluated for inclusion into the next minor product update.

Please also note this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution.

Comment 14 Phil Perry 2020-07-24 14:58:44 UTC

Thank you - I have raised a support case for this issue.

Given the fix has been in fedora for the last 6 months, it would be nice if this could be resolved quickly for anyone using NVIDIA hardware.

Comment 15 Zdenek Pytela 2020-07-27 08:11:13 UTC

*** Bug 1859804 has been marked as a duplicate of this bug. ***

Comment 29 Zdenek Pytela 2020-08-20 10:20:15 UTC

*** Bug 1868605 has been marked as a duplicate of this bug. ***

Comment 30 Jiqi Li 2020-08-20 11:26:38 UTC

I just verfied on RHEL8.3 pre-test version, 4.18.0-221.el8.x86_64, still found such issues.

Please Zdenek check which RHEL8.3 will merge such change.

Regards,
Jiqi.

Comment 31 Zdenek Pytela 2020-08-20 13:39:16 UTC

This bug was addressed in the selinux-policy-3.14.3-51 package.

Comment 32 Jiqi Li 2020-08-21 10:37:54 UTC

Could you please confirm which release version RHEL will include selinux-policy-3.14.3-51 rpm package, Thanks.

Comment 33 Zdenek Pytela 2020-08-24 08:09:03 UTC

(In reply to Jiqi Li from comment #32)
> Could you please confirm which release version RHEL will include
> selinux-policy-3.14.3-51 rpm package, Thanks.

The fix for this bugzilla is expected to be delivered as a part of RHEL 8.3 GA. It was also scheduled for the next RHEL 8.2 batch update.

Comment 40 errata-xmlrpc 2020-11-04 01:56:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (selinux-policy bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4528

Note You need to log in before you can comment on or make changes to this bug.

ajb
aplattner
ayadav
csoriano
g63it
hasuzuki
hdegoede
kazen
lijq9
lvrabec
mmalik
mroche
ofalk
pasik
pasteur
phil
plautrba
rboza89
riehecky
ssekidde
sstagnar
toracat
tscherf
yijun_shen
zpytela