1969912 – PCI passthrough devices are enabled by default

Bug 1969912 - PCI passthrough devices are enabled by default

Summary: PCI passthrough devices are enabled by default

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Installation
Sub Component:
Version:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Simone Tiraboschi
QA Contact:	ibesso
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-09 13:05 UTC by Fabian Deutsch
Modified:	2022-01-10 08:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:	hco-bundle-registry-container-v4.8.0-398
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 14:32:39 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt hyperconverged-cluster-operator pull 1384	None	closed	Remove the default PCI host devices	2021-06-11 17:48:30 UTC
Github	kubevirt hyperconverged-cluster-operator pull 1387	None	closed	[release-1.4] Remove the default PCI host devices	2021-06-11 07:41:07 UTC
Red Hat Product Errata	RHSA-2021:2920	None	None	None	2021-07-27 14:33:33 UTC

Description Fabian Deutsch 2021-06-09 13:05:17 UTC

Description of problem:
The new PCI passthrough support is adding two NVIDIA devices by default to the HCO CR. These two devices will also be automatically enabled.
This is problematic because in existing clusters, an update to CNV 4.8 will add and enable these devices by default, if those devices are also managed by the NVIDIA operator, then it can possibly lead to issues simply installing CNV - and will definetly lead to issues with the device is used by the NVIDIA Operator and a VM at the same time.

To prevent this the two devices should not be included by default in the list.

Version-Release number of selected component (if applicable):
4.8.0

How reproducible:
Always

Steps to Reproduce:
1. Install 2.6.0
2. Update OCP+CNV to 4.8.0
3.

Actual results:
Two devices are in pci passthrough list HCO CR and enabled

Expected results:
PCI passthrough list is empty

Additional info:

Comment 5 ibesso 2021-06-10 16:03:43 UTC

reference to the other bug (already VERIFIED) which this bug reverts:
https://bugzilla.redhat.com/show_bug.cgi?id=1958862

Comment 7 Kedar Bidarkar 2021-06-12 17:03:21 UTC

HCO operator, Build date: "build-date": "2021-06-11T22:42:06.208702"
hyperconverged-cluster-operator/images/v4.8.0-58

]$ oc get hyperconverged kubevirt-hyperconverged -n openshift-cnv -o yaml
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  ...
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  certConfig:
    ca:
      duration: 48h0m0s
      renewBefore: 24h0m0s
    server:
      duration: 24h0m0s
      renewBefore: 12h0m0s
  featureGates:
    sriovLiveMigration: false
    withHostPassthroughCPU: false
  infra: {}
  liveMigrationConfig:
    bandwidthPerMigration: 64Mi
    completionTimeoutPerGiB: 800
    parallelMigrationsPerCluster: 5
    parallelOutboundMigrationsPerNode: 2
    progressTimeout: 150
  version: v4.8.0
  workloads: {}
status:
  conditions:

---

]$ oc get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o yaml 
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  ...
  name: kubevirt-kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  certificateRotateStrategy:
    selfSigned:
      ca:
        duration: 48h0m0s
        renewBefore: 24h0m0s
      server:
        duration: 24h0m0s
        renewBefore: 12h0m0s
  configuration:
    developerConfiguration:
      featureGates:
      - DataVolumes
      - SRIOV
      - LiveMigration
      - CPUManager
      - CPUNodeDiscovery
      - Snapshot
      - HotplugVolumes
      - GPU
      - HostDevices
      - WithHostModelCPU
      - HypervStrictCheck
    machineType: pc-q35-rhel8.4.0
    migrations:
      bandwidthPerMigration: 64Mi
      completionTimeoutPerGiB: 800
      parallelMigrationsPerCluster: 5
      parallelOutboundMigrationsPerNode: 2
      progressTimeout: 150
    network:
      defaultNetworkInterface: masquerade
    obsoleteCPUModels:
      "486": true
      Conroe: true
      athlon: true
      core2duo: true
      coreduo: true
      kvm32: true
      kvm64: true
      n270: true
      pentium: true
      pentium2: true
      pentium3: true
      pentiumpro: true
      phenom: true
      qemu32: true
      qemu64: true
    selinuxLauncherType: virt_launcher.process
    smbios:
      family: Red Hat
      manufacturer: Red Hat
      product: Container-native virtualization
      sku: 4.8.0
      version: 4.8.0
  customizeComponents: {}
  uninstallStrategy: BlockUninstallIfWorkloadsExist
  workloadUpdateStrategy: {}
status:
  conditions:

Comment 9 Krzysztof Majcher 2021-06-14 08:09:18 UTC

hi Isaac, can you update the status of this one? It seems it is already done?

Comment 10 ibesso 2021-06-15 12:33:29 UTC

Verified with HCO 4.8.0-405
---------------------------
Verified both with the updated automation test and manually.

Moving to VERIFIED.

Comment 13 errata-xmlrpc 2021-07-27 14:32:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920

Note You need to log in before you can comment on or make changes to this bug.