Bug 2214569

Summary: The container_use_devices boolean doesn't allow the map operation on NVIDIA GPU
Product: Red Hat Enterprise Linux 9 Reporter: Fabien Dupont <fdupont>
Component: container-selinuxAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: Edward Shen <weshen>
Severity: high Docs Contact:
Priority: unspecified    
Version: 9.2CC: ajia, dornelas, dwalsh, jnovy, lsm5, mboddu, tsweeney
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: container-selinux-2.218.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:24:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fabien Dupont 2023-06-13 12:33:16 UTC
Description of problem:

When writing this blog post: https://cloud.redhat.com/blog/how-to-accelerate-workloads-with-nvidia-gpus-on-red-hat-device-edge, we had issues with SELinux preventing the map operation on "xserver_misc_device_t", even when the "container_use_devices" boolean is set to on. This blocks usage of NVIDIA GPUs in containers with MicroShift or podman.

Version-Release number of selected component (if applicable): 2.205.0-1.el9_2.noarch


How reproducible: Always


Steps to Reproduce:

$ podman run --rm -i --device /dev/nvidia --group-add keep-groups nvcr.io/nvidia/cuda:12.1.1-devel-ubi8 nvidia-smi

Actual results: Segmentation fault and AVC


Expected results: Returns info about the NVIDIA GPU


Additional info:

The following module provides a workaround:

module nvidia-container-microshift 1.0;

require {
	type xserver_misc_device_t;
	type container_t;
	class chr_file map;
}

#============= container_t ==============
allow container_t xserver_misc_device_t:chr_file map;

Comment 1 Daniel Walsh 2023-06-13 12:38:24 UTC
Fixed in container-selinux-2.218.0

Comment 2 Tom Sweeney 2023-06-13 15:27:44 UTC
Assigning to @jnovy for any further packaging or BZ needs.

Comment 3 Fabien Dupont 2023-06-13 16:25:57 UTC
Could the fix be backported to RHEL 9.2, so it can benefit to Red Hat Device Edge and customers who want to stick to EUS versions?

Comment 8 Edward Shen 2023-09-14 07:33:04 UTC
Hello Fabien Dupont, this is a GPU-accelerated program related issue, we may not have the appropriate environment to test it, may I have your favor to verify it? Thanks!

Comment 9 Derrick Ornelas 2023-09-14 15:24:22 UTC
I assume the fix for this is somewhere in https://github.com/containers/container-selinux/compare/2.217.0...v2.218.0 ? 

Is this still something we want to try to fix in 9.2 for RHEL & R4E users? If it really is fixed in v2.218.0, then as of two weeks ago it's already fixed for RHDE (ie, microshift) users who would be consuming container-selinux-2.221.0-1.rhaos4.13.el9 (https://access.redhat.com/downloads/content/container-selinux/2.221.0-1.rhaos4.13.el9/noarch/fd431d51/package) alongside microshift.

Comment 13 errata-xmlrpc 2023-11-07 08:24:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (container-selinux bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6328

Comment 14 Red Hat Bugzilla 2024-03-07 04:25:54 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days