Bug 1983079 - No "permittedHostDevices" section in HCO CR, allows any hostdevice in the VM spec.
Summary: No "permittedHostDevices" section in HCO CR, allows any hostdevice in the VM ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: sgott
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-16 12:59 UTC by Kedar Bidarkar
Modified: 2022-03-16 15:51 UTC (History)
9 users (show)

Fixed In Version: virt-launcher-container-v4.10.0-128 hco-bundle-registry-container-v4.10.0-439
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 15:50:56 UTC
Target Upstream Version:
Embargoed:
nunnatsa: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 6161 0 None open remove validation host device on vmi creation 2021-07-29 14:36:59 UTC
Red Hat Product Errata RHSA-2022:0947 0 None None None 2022-03-16 15:51:05 UTC

Description Kedar Bidarkar 2021-07-16 12:59:44 UTC
Description of problem:

1) Empty or No "permittedHostDevices" section in HCO CR, allows any hostdevice in the VM spec.

2) It appears that the validation of the vm specs hostdevice is done, only after we add the first entry under "permittedHostDevices" section in HCO CR.

Version-Release number of selected component (if applicable):
CNV-4.8.0

How reproducible:
Without adding any entry about "permittedHostDevices" in HCO CR, 
create a VMI spec with any random hostdevices name.

Steps to Reproduce:
1. Create a VM with hostdevice, with no "permittedHostDevices" section in HCO CR
2. Creation of the VM witht he hostdevice is allowed.
3. ( Ofcourse the VM would then be in a PENDING state )

Actual results:

1) Empty or No "permittedHostDevices" section in HCO CR, should deny all hostdevices, by default.

2) It appears that the validation of the vm specs hostdevice is done, only after we add the first entry under "permittedHostDevices" section in HCO CR.

Expected results:

Even if the  "permittedHostDevices" section is missing from HCO CR
The VM creation with "Any" HostDevice should be denied with the below message.

Message: "admission webhook .* denied the request: HostDevice {GPU_DEVICE_NAME} is not permitted .*"

Additional info:

Comment 1 sgott 2021-07-16 19:22:13 UTC
Vladik, can you clarify if this feature is working as expected from your point of view?

Comment 2 Vladik Romanovsky 2021-07-19 11:13:18 UTC
In order to support the legacy external NVIDIA device plugin for GPU assignment, KubeVirt allows the users to leave `permittedHostDevices` in KubeVirt CR and still be able to request a GPU or a host device. 
This way devices will not be verified.

However, KubeVirt also has a feature gate (HostDevices) that is disabled by default, to 
allow hostDevices / GPUs to be used in the system using the `validateHostDevicesWithPassthroughEnabled` validation.

My expectation was that HCO will only enable this feature gate when permittedHostDevices are set.

I would ask the HCO team to look into this issue...

Nahshon, is it the expected behavior?

Comment 3 Roman Mohr 2021-07-21 12:13:21 UTC
I think in such cases the desired way is always to just permit the object creation (except if we talk about access control in the sense of e.g. RBAC). The general expectation is that such devices can be added and removed anytime and that we can always have objects which reference  non-existing entries. Since we should be eventually consistent the expectation is that this can be fixed any moment and we should just retry.

Comment 4 Nahshon Unna-Tsameret 2021-07-26 05:29:50 UTC
@vromanso - we can do it. The thing is the feature gates and their default values were introduced before the permittedHostDevices list, so this is not the way it was implemented in HCO. @stirabos - what do you think?

Comment 5 Simone Tiraboschi 2021-08-19 12:59:17 UTC
If the value of HostDevices feature gate is always just a direct consequence of permittedHostDevices value, I think we should drop HostDevices feature gate and simply rely on permittedHostDevices.
Having two APIs for the same feature smells like a bad design.

Comment 6 Jed Lejosne 2021-09-14 17:30:40 UTC
Note: when installing CNV 4.9 from the web UI, there is (now?) an option for specifying permitted host devices.

Comment 7 Kedar Bidarkar 2022-01-20 19:58:09 UTC
1) Updated HCO CR with the below configuration.
  permittedHostDevices:
    pciHostDevices:
    - pciDeviceSelector: "10de:1db6"
      resourceName: "nvidia.com/V100GL_Tesla_V100"

2) Created a VM with the below spec in it.
spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          hostDevices:
          - deviceName: nvidia.com/GV100GL_Tesla_V100
            name: hostdevice

3) Started the VM, VMI is in pending state, with the below msg.
status:
  conditions:
  - lastProbeTime: 
    lastTransitionTime: 
    message: virt-launcher pod has not yet been scheduled
    reason: PodNotExists
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 
    message: 'failed to render launch manifest: HostDevice nvidia.com/GV100GL_Tesla_V100
      is not permitted in permittedHostDevices configuration'
    reason: FailedCreate
    status: "False"
    type: Synchronized
  created: true
  printableStatus: Starting

Comment 8 Kedar Bidarkar 2022-01-20 20:01:22 UTC
Tested with HCO version: v4.10.0-605 ; virt-Operator: v4.10.0-197

Comment 9 Kedar Bidarkar 2022-01-20 20:22:17 UTC
1) Without HCO CR and KubeVirt CR PermittedHostDevices

[kbidarka@localhost ocs]$ oc get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o yaml | grep -i permittedHostDevices
[kbidarka@localhost ocs]$ 

[kbidarka@localhost ocs]$ oc get hyperconverged kubevirt-hyperconverged -n openshift-cnv -o yaml | grep -i permittedHostDevices
[kbidarka@localhost ocs]$ 

2) Created a VM with the below spec in it.
spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          hostDevices:
          - deviceName: nvidia.com/GV100GL_Tesla_V100
            name: hostdevice

3) Started the VM, VMI is in "scheduling" state, with the below msg.

status:
  activePods:
    daa05c7f-48c4-45a9-9016-01321983a941: ""
  conditions:
  - lastProbeTime: "2022-01-20T20:10:09Z"
    lastTransitionTime: "2022-01-20T20:10:09Z"
    message: Guest VM is not reported as running
    reason: GuestNotRunning
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-01-20T20:10:09Z"
    message: '0/6 nodes are available: 1 Insufficient nvidia.com/GV100GL_Tesla_V100, 2 node(s)
      didn''t match Pod''s node affinity/selector, 3 node(s) had taint {node-role.kubernetes.io/master:
      }, that the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled


VERIFID with HCO version: v4.10.0-605 ; virt-Operator: v4.10.0-197


NOTE: host device validation now happen on pod creation.

Comment 14 errata-xmlrpc 2022-03-16 15:50:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.