Bug 1911662

Summary: el6 guests don't work properly if virtio bus is specified on various devices
Product: Container Native Virtualization (CNV) Reporter: Roman Mohr <rmohr>
Component: VirtualizationAssignee: Roman Mohr <rmohr>
Status: CLOSED ERRATA QA Contact: Israel Pinto <ipinto>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.6.0CC: abologna, chhu, cnv-qe-bugs, danken, oarribas, sgott
Target Milestone: ---   
Target Release: 2.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v2.6.0-489 virt-operator-container-v2.6.0-100 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-10 11:22:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roman Mohr 2020-12-30 15:29:22 UTC
Description of problem:


el6 only properly supports `virtio-transitional` models on devices. In CNV it is right now not or only in a very complex fashion possible to force devices to use `virtio-transitional` models on devices.


Version-Release number of selected component (if applicable):


How reproducible:

 * Start a rhel6 VM with virtio bus on its networking interface
 * Try to boot from a virtio disk which is not forced to be placed on the root complex

In both cases rhel6 can't use the devices.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Roman Mohr 2020-12-30 15:31:23 UTC
https://github.com/kubevirt/kubevirt/pull/4730 adds a boolean to indicate that the whole VM should use for everything where it can `virto-transitional`.

Comment 2 Roman Mohr 2021-01-21 15:37:57 UTC
FYI, two more PRs will land regarding to this in master soon:

 1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of virtio-(non-)transitional on the ballooning device)
 2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi controller use of the new virtio values)

Regarding to (1), sticking with virtio for the ballooning device did not reveale any problems regarding to el6 guests, so we probably don't have to backport it, if we don't want.
Regarding to (2), here we may pick the wrong virtio bus for  the scsi controller which could have an impact on hotplug and scsi drives on old el6 guests.

Comment 3 Andrea Bolognani 2021-01-22 16:24:16 UTC
(In reply to Roman Mohr from comment #2)
> FYI, two more PRs will land regarding to this in master soon:
> 
>  1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of
> virtio-(non-)transitional on the ballooning device)
>  2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi
> controller use of the new virtio values)
> 
> Regarding to (1), sticking with virtio for the ballooning device did not
> reveale any problems regarding to el6 guests, so we probably don't have to
> backport it, if we don't want.

Did you verify that the RHEL 6 guest not only boots with
non-transitional memballoon, but also that the ballooning actually
works afterwards? My understanding is that it would not.

I recommend backporting (1) either way, because having a single
device not obey the newly-introduced knob sounds like the perfect way
to cause subtle breakage that we won't discover until months down the
line :)

Comment 4 Roman Mohr 2021-01-25 13:50:11 UTC
(In reply to Andrea Bolognani from comment #3)
> (In reply to Roman Mohr from comment #2)
> > FYI, two more PRs will land regarding to this in master soon:
> > 
> >  1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of
> > virtio-(non-)transitional on the ballooning device)
> >  2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi
> > controller use of the new virtio values)
> > 
> > Regarding to (1), sticking with virtio for the ballooning device did not
> > reveale any problems regarding to el6 guests, so we probably don't have to
> > backport it, if we don't want.
> 
> Did you verify that the RHEL 6 guest not only boots with
> non-transitional memballoon, but also that the ballooning actually
> works afterwards? My understanding is that it would not.
> 
> I recommend backporting (1) either way, because having a single
> device not obey the newly-introduced knob sounds like the perfect way
> to cause subtle breakage that we won't discover until months down the
> line :)

That would definitely make sense, but I think it takes some time until we can consume 8.3.1 if I understand Stu corectly.
When we backport it now, we would break CNV 2.6 for QE until the version is there.

Comment 5 Roman Mohr 2021-01-25 13:58:23 UTC
(In reply to Andrea Bolognani from comment #3)
> (In reply to Roman Mohr from comment #2)
> > FYI, two more PRs will land regarding to this in master soon:
> > 
> >  1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of
> > virtio-(non-)transitional on the ballooning device)
> >  2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi
> > controller use of the new virtio values)
> > 
> > Regarding to (1), sticking with virtio for the ballooning device did not
> > reveale any problems regarding to el6 guests, so we probably don't have to
> > backport it, if we don't want.
> 
> Did you verify that the RHEL 6 guest not only boots with
> non-transitional memballoon, but also that the ballooning actually
> works afterwards? My understanding is that it would not.


I can see


```
  balloon.current=1000448
  balloon.maximum=1000448
  balloon.last-update=0
  balloon.rss=328316

```

this is the qemu commandline:


```
/usr/libexec/qemu-kvm -name guest=default_vmi-centos6,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-default_vmi-centos6/master-key.aes -machine pc-q35-rhel8.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on -m 977 -overcommit mem-lock=off -smp 1,sockets=1,dies=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid 0aefd280-47db-43ca-b37d-0312cf6c4489 -smbios type=1,manufacturer=KubeVirt,product=None,uuid=0aefd280-47db-43ca-b37d-0312cf6c4489,family=KubeVirt -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=18,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0 -device pcie-root-port,port=0x11,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -device virtio-serial-pci-transitional,id=virtio-serial0,bus=pci.2,addr=0x2 -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage","backing":null} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device virtio-blk-pci-transitional,bus=pci.2,addr=0x3,drive=libvirt-2-format,id=ua-containerdisk,bootindex=1,write-cache=on -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/vmi-centos6/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci-transitional,bus=pci.2,addr=0x4,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on -netdev tap,fd=20,id=hostua-default,vhost=on,vhostfd=21 -device virtio-net-pci-transitional,host_mtu=1440,netdev=hostua-default,id=ua-default,mac=8a:77:3e:ba:48:0f,bus=pci.2,addr=0x1,romfile= -chardev socket,id=charserial0,fd=22,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=23,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/a1c0ccc9-a90b-4800-86bb-a88fcc9c3488/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.4,addr=0x0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci-transitional,rng=objrng0,id=rng0,bus=pci.2,addr=0x5 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=o
```

Comment 6 Roman Mohr 2021-01-25 13:58:50 UTC
And this probably means no update.

Comment 7 Israel Pinto 2021-01-31 12:16:28 UTC
Verify with: 
virt-operator-container-v2.6.0-106
 virt-launcher-container-v2.6.0-106


Create VM with CNV common template to get useVirtioTransitional: true
and update the disk and drivers to virtio see vm spec [1]
Tested:
1. VM is running with virtio drivers
2. Connect via console and VNC
3. Connect with SSH
All PASS
 

[1]

apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/rhel6.10: Red Hat Enterprise Linux 6.0 or higher
    vm.kubevirt.io/flavor: small
    vm.kubevirt.io/os: rhel6
    vm.kubevirt.io/validations: |
      [
        {
          "name": "minimal-required-memory",
          "path": "jsonpath::.spec.domain.resources.requests.memory",
          "rule": "integer",
          "message": "This VM requires more memory.",
          "min": 536870912
        }
      ]
    vm.kubevirt.io/workload: server
  selfLink: /apis/kubevirt.io/v1alpha3/namespaces/rhel6/virtualmachines/rhel6-legal-gull
  resourceVersion: '2795855'
  name: rhel6-legal-gull
  uid: 9186b5ae-0dcb-46ea-92a7-36ae3e9f7fa9
  creationTimestamp: '2021-01-31T10:02:54Z'
  generation: 2
  managedFields:
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:name.os.template.kubevirt.io/rhel6.10': {}
            'f:vm.kubevirt.io/flavor': {}
            'f:vm.kubevirt.io/os': {}
            'f:vm.kubevirt.io/validations': {}
            'f:vm.kubevirt.io/workload': {}
          'f:labels':
            'f:vm.kubevirt.io/template.version': {}
            'f:vm.kubevirt.io/template.namespace': {}
            'f:app': {}
            .: {}
            'f:os.template.kubevirt.io/rhel6.10': {}
            'f:vm.kubevirt.io/template.revision': {}
            'f:workload.template.kubevirt.io/server': {}
            'f:flavor.template.kubevirt.io/small': {}
            'f:vm.kubevirt.io/template': {}
        'f:spec':
          .: {}
          'f:dataVolumeTemplates': {}
          'f:running': {}
          'f:template':
            .: {}
            'f:metadata':
              .: {}
              'f:labels':
                .: {}
                'f:flavor.template.kubevirt.io/small': {}
                'f:kubevirt.io/domain': {}
                'f:kubevirt.io/size': {}
                'f:os.template.kubevirt.io/rhel6.10': {}
                'f:vm.kubevirt.io/name': {}
                'f:workload.template.kubevirt.io/server': {}
            'f:spec':
              .: {}
              'f:domain':
                .: {}
                'f:cpu':
                  .: {}
                  'f:cores': {}
                  'f:sockets': {}
                  'f:threads': {}
                'f:devices':
                  .: {}
                  'f:disks': {}
                  'f:interfaces': {}
                  'f:rng': {}
                  'f:useVirtioTransitional': {}
                'f:machine':
                  .: {}
                  'f:type': {}
                'f:resources':
                  .: {}
                  'f:requests':
                    .: {}
                    'f:memory': {}
              'f:evictionStrategy': {}
              'f:hostname': {}
              'f:networks': {}
              'f:terminationGracePeriodSeconds': {}
              'f:volumes': {}
      manager: Mozilla
      operation: Update
      time: '2021-01-31T10:02:54Z'
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:kubevirt.io/latest-observed-api-version': {}
            'f:kubevirt.io/storage-observed-api-version': {}
        'f:status':
          .: {}
          'f:conditions': {}
          'f:created': {}
          'f:ready': {}
          'f:volumeSnapshotStatuses': {}
      manager: virt-controller
      operation: Update
      time: '2021-01-31T10:06:09Z'
  namespace: rhel6
  labels:
    app: rhel6-legal-gull
    flavor.template.kubevirt.io/small: 'true'
    os.template.kubevirt.io/rhel6.10: 'true'
    vm.kubevirt.io/template: rhel6-server-small
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: '1'
    vm.kubevirt.io/template.version: v0.13.1
    workload.template.kubevirt.io/server: 'true'
spec:
  dataVolumeTemplates:
    - apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        creationTimestamp: null
        name: rhel6-legal-gull-rootdisk
      spec:
        pvc:
          accessModes:
            - ReadWriteMany
          resources:
            requests:
              storage: 20Gi
          storageClassName: ocs-storagecluster-ceph-rbd
          volumeMode: Block
        source:
          http:
            url: >-
              http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/rhel-images/rhel-610.qcow2
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        flavor.template.kubevirt.io/small: 'true'
        kubevirt.io/domain: rhel6-legal-gull
        kubevirt.io/size: small
        os.template.kubevirt.io/rhel6.10: 'true'
        vm.kubevirt.io/name: rhel6-legal-gull
        workload.template.kubevirt.io/server: 'true'
    spec:
      domain:
        cpu:
          cores: 1
          sockets: 1
          threads: 1
        devices:
          disks:
            - bootOrder: 1
              disk:
                bus: virtio
              name: rootdisk
            - disk:
                bus: sata
              name: cloudinitdisk
          interfaces:
            - masquerade: {}
              model: virtio
              name: default
          rng: {}
          useVirtioTransitional: true
        machine:
          type: pc-q35-rhel8.3.0
        resources:
          requests:
            memory: 2Gi
      evictionStrategy: LiveMigrate
      hostname: rhel6-legal-gull
      networks:
        - name: default
          pod: {}
      terminationGracePeriodSeconds: 180
      volumes:
        - dataVolume:
            name: rhel6-legal-gull-rootdisk
          name: rootdisk
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              user: cloud-user
              password: rfh8-snre-fau3
              chpasswd: { expire: False }
          name: cloudinitdisk
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2021-01-31T10:06:06Z'
      status: 'True'
      type: Ready
  created: true
  ready: true
  volumeSnapshotStatuses:
    - enabled: true
      name: rootdisk
    - enabled: false
      name: cloudinitdisk
      reason: Volume type does not suport snapshots

Comment 10 errata-xmlrpc 2021-03-10 11:22:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Comment 11 Petr Horáček 2021-08-18 09:26:23 UTC
*** Bug 1794243 has been marked as a duplicate of this bug. ***