Description of problem: the app nodes are physical servers with nvidia v100 graphical cards. Useing the following reference guide to install the required software: https://blog.openshift.com/how-to-use-gpus-with-deviceplugin-in-openshift-3-10/ When running the "nvidia-smi" command from within the pod, getting the error: ~~~ failed to initialize nvml: insufficient permissions ~~~ To fix this we need to explicitly run the command [1] as stated in the https://github.com/NVIDIA/nvidia-container-runtime/issues/28 [1] chcon -t container_file_t /dev/nvidia*. As a fix we made the entry selinux entry and using daemon restoreconnd to make the persistent changes. Version-Release number of selected component (if applicable): OCP cluster 3.11.88 How reproducible: everytine Need help to check if there is any proper way to make the fix. Expected results: The command should have ran without need to make any changes. Additional info: Any other information or docs can help to identify this issue.
Seth, is this something that could or would be fixed in 4.x? For 3.x, I wonder if we could settle with a kbase solution. I don't want to set expectations that this 3.x fix will bubble to the top of the list.
The referenced document is a blog post, not part of our official documentation (supported procedures). The blog post should have just included the chcon/semanage command from the beginning as it is a required step for that to work with selinux in enforcing mode. The reporter is already doing the correct thing to resolve that issue.