Bug 1911662 - el6 guests don't work properly if virtio bus is specified on various devices
Summary: el6 guests don't work properly if virtio bus is specified on various devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2.6.0
Assignee: Roman Mohr
QA Contact: Israel Pinto
URL:
Whiteboard:
: 1794243 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-30 15:29 UTC by Roman Mohr
Modified: 2024-06-13 23:55 UTC (History)
6 users (show)

Fixed In Version: hco-bundle-registry-container-v2.6.0-489 virt-operator-container-v2.6.0-100
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-10 11:22:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 4730 0 None closed Support virtio-transitional for old guests 2021-02-09 16:25:17 UTC
Red Hat Bugzilla 1783192 0 urgent CLOSED Guest kernel panic when start RHEL6.10 guest with q35 machine type and virtio disk in cnv 2024-06-13 22:20:29 UTC
Red Hat Bugzilla 1794243 0 medium CLOSED Default network is not work in rhel6.10 q35 VMI 2021-08-18 09:26:32 UTC
Red Hat Issue Tracker CNV-9261 0 None None None 2024-06-13 23:55:16 UTC
Red Hat Product Errata RHSA-2021:0799 0 None None None 2021-03-10 11:23:17 UTC

Internal Links: 1911786

Description Roman Mohr 2020-12-30 15:29:22 UTC
Description of problem:


el6 only properly supports `virtio-transitional` models on devices. In CNV it is right now not or only in a very complex fashion possible to force devices to use `virtio-transitional` models on devices.


Version-Release number of selected component (if applicable):


How reproducible:

 * Start a rhel6 VM with virtio bus on its networking interface
 * Try to boot from a virtio disk which is not forced to be placed on the root complex

In both cases rhel6 can't use the devices.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Roman Mohr 2020-12-30 15:31:23 UTC
https://github.com/kubevirt/kubevirt/pull/4730 adds a boolean to indicate that the whole VM should use for everything where it can `virto-transitional`.

Comment 2 Roman Mohr 2021-01-21 15:37:57 UTC
FYI, two more PRs will land regarding to this in master soon:

 1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of virtio-(non-)transitional on the ballooning device)
 2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi controller use of the new virtio values)

Regarding to (1), sticking with virtio for the ballooning device did not reveale any problems regarding to el6 guests, so we probably don't have to backport it, if we don't want.
Regarding to (2), here we may pick the wrong virtio bus for  the scsi controller which could have an impact on hotplug and scsi drives on old el6 guests.

Comment 3 Andrea Bolognani 2021-01-22 16:24:16 UTC
(In reply to Roman Mohr from comment #2)
> FYI, two more PRs will land regarding to this in master soon:
> 
>  1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of
> virtio-(non-)transitional on the ballooning device)
>  2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi
> controller use of the new virtio values)
> 
> Regarding to (1), sticking with virtio for the ballooning device did not
> reveale any problems regarding to el6 guests, so we probably don't have to
> backport it, if we don't want.

Did you verify that the RHEL 6 guest not only boots with
non-transitional memballoon, but also that the ballooning actually
works afterwards? My understanding is that it would not.

I recommend backporting (1) either way, because having a single
device not obey the newly-introduced knob sounds like the perfect way
to cause subtle breakage that we won't discover until months down the
line :)

Comment 4 Roman Mohr 2021-01-25 13:50:11 UTC
(In reply to Andrea Bolognani from comment #3)
> (In reply to Roman Mohr from comment #2)
> > FYI, two more PRs will land regarding to this in master soon:
> > 
> >  1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of
> > virtio-(non-)transitional on the ballooning device)
> >  2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi
> > controller use of the new virtio values)
> > 
> > Regarding to (1), sticking with virtio for the ballooning device did not
> > reveale any problems regarding to el6 guests, so we probably don't have to
> > backport it, if we don't want.
> 
> Did you verify that the RHEL 6 guest not only boots with
> non-transitional memballoon, but also that the ballooning actually
> works afterwards? My understanding is that it would not.
> 
> I recommend backporting (1) either way, because having a single
> device not obey the newly-introduced knob sounds like the perfect way
> to cause subtle breakage that we won't discover until months down the
> line :)

That would definitely make sense, but I think it takes some time until we can consume 8.3.1 if I understand Stu corectly.
When we backport it now, we would break CNV 2.6 for QE until the version is there.

Comment 5 Roman Mohr 2021-01-25 13:58:23 UTC
(In reply to Andrea Bolognani from comment #3)
> (In reply to Roman Mohr from comment #2)
> > FYI, two more PRs will land regarding to this in master soon:
> > 
> >  1. https://github.com/kubevirt/kubevirt/pull/4862 (allows the usage of
> > virtio-(non-)transitional on the ballooning device)
> >  2. https://github.com/kubevirt/kubevirt/pull/4850 (will make on the scsi
> > controller use of the new virtio values)
> > 
> > Regarding to (1), sticking with virtio for the ballooning device did not
> > reveale any problems regarding to el6 guests, so we probably don't have to
> > backport it, if we don't want.
> 
> Did you verify that the RHEL 6 guest not only boots with
> non-transitional memballoon, but also that the ballooning actually
> works afterwards? My understanding is that it would not.


I can see


```
  balloon.current=1000448
  balloon.maximum=1000448
  balloon.last-update=0
  balloon.rss=328316

```

this is the qemu commandline:


```
/usr/libexec/qemu-kvm -name guest=default_vmi-centos6,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-default_vmi-centos6/master-key.aes -machine pc-q35-rhel8.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on -m 977 -overcommit mem-lock=off -smp 1,sockets=1,dies=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid 0aefd280-47db-43ca-b37d-0312cf6c4489 -smbios type=1,manufacturer=KubeVirt,product=None,uuid=0aefd280-47db-43ca-b37d-0312cf6c4489,family=KubeVirt -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=18,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0 -device pcie-root-port,port=0x11,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -device virtio-serial-pci-transitional,id=virtio-serial0,bus=pci.2,addr=0x2 -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage","backing":null} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device virtio-blk-pci-transitional,bus=pci.2,addr=0x3,drive=libvirt-2-format,id=ua-containerdisk,bootindex=1,write-cache=on -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/vmi-centos6/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci-transitional,bus=pci.2,addr=0x4,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on -netdev tap,fd=20,id=hostua-default,vhost=on,vhostfd=21 -device virtio-net-pci-transitional,host_mtu=1440,netdev=hostua-default,id=ua-default,mac=8a:77:3e:ba:48:0f,bus=pci.2,addr=0x1,romfile= -chardev socket,id=charserial0,fd=22,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=23,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/a1c0ccc9-a90b-4800-86bb-a88fcc9c3488/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.4,addr=0x0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci-transitional,rng=objrng0,id=rng0,bus=pci.2,addr=0x5 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=o
```

Comment 6 Roman Mohr 2021-01-25 13:58:50 UTC
And this probably means no update.

Comment 7 Israel Pinto 2021-01-31 12:16:28 UTC
Verify with: 
virt-operator-container-v2.6.0-106
 virt-launcher-container-v2.6.0-106


Create VM with CNV common template to get useVirtioTransitional: true
and update the disk and drivers to virtio see vm spec [1]
Tested:
1. VM is running with virtio drivers
2. Connect via console and VNC
3. Connect with SSH
All PASS
 

[1]

apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/rhel6.10: Red Hat Enterprise Linux 6.0 or higher
    vm.kubevirt.io/flavor: small
    vm.kubevirt.io/os: rhel6
    vm.kubevirt.io/validations: |
      [
        {
          "name": "minimal-required-memory",
          "path": "jsonpath::.spec.domain.resources.requests.memory",
          "rule": "integer",
          "message": "This VM requires more memory.",
          "min": 536870912
        }
      ]
    vm.kubevirt.io/workload: server
  selfLink: /apis/kubevirt.io/v1alpha3/namespaces/rhel6/virtualmachines/rhel6-legal-gull
  resourceVersion: '2795855'
  name: rhel6-legal-gull
  uid: 9186b5ae-0dcb-46ea-92a7-36ae3e9f7fa9
  creationTimestamp: '2021-01-31T10:02:54Z'
  generation: 2
  managedFields:
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:name.os.template.kubevirt.io/rhel6.10': {}
            'f:vm.kubevirt.io/flavor': {}
            'f:vm.kubevirt.io/os': {}
            'f:vm.kubevirt.io/validations': {}
            'f:vm.kubevirt.io/workload': {}
          'f:labels':
            'f:vm.kubevirt.io/template.version': {}
            'f:vm.kubevirt.io/template.namespace': {}
            'f:app': {}
            .: {}
            'f:os.template.kubevirt.io/rhel6.10': {}
            'f:vm.kubevirt.io/template.revision': {}
            'f:workload.template.kubevirt.io/server': {}
            'f:flavor.template.kubevirt.io/small': {}
            'f:vm.kubevirt.io/template': {}
        'f:spec':
          .: {}
          'f:dataVolumeTemplates': {}
          'f:running': {}
          'f:template':
            .: {}
            'f:metadata':
              .: {}
              'f:labels':
                .: {}
                'f:flavor.template.kubevirt.io/small': {}
                'f:kubevirt.io/domain': {}
                'f:kubevirt.io/size': {}
                'f:os.template.kubevirt.io/rhel6.10': {}
                'f:vm.kubevirt.io/name': {}
                'f:workload.template.kubevirt.io/server': {}
            'f:spec':
              .: {}
              'f:domain':
                .: {}
                'f:cpu':
                  .: {}
                  'f:cores': {}
                  'f:sockets': {}
                  'f:threads': {}
                'f:devices':
                  .: {}
                  'f:disks': {}
                  'f:interfaces': {}
                  'f:rng': {}
                  'f:useVirtioTransitional': {}
                'f:machine':
                  .: {}
                  'f:type': {}
                'f:resources':
                  .: {}
                  'f:requests':
                    .: {}
                    'f:memory': {}
              'f:evictionStrategy': {}
              'f:hostname': {}
              'f:networks': {}
              'f:terminationGracePeriodSeconds': {}
              'f:volumes': {}
      manager: Mozilla
      operation: Update
      time: '2021-01-31T10:02:54Z'
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:kubevirt.io/latest-observed-api-version': {}
            'f:kubevirt.io/storage-observed-api-version': {}
        'f:status':
          .: {}
          'f:conditions': {}
          'f:created': {}
          'f:ready': {}
          'f:volumeSnapshotStatuses': {}
      manager: virt-controller
      operation: Update
      time: '2021-01-31T10:06:09Z'
  namespace: rhel6
  labels:
    app: rhel6-legal-gull
    flavor.template.kubevirt.io/small: 'true'
    os.template.kubevirt.io/rhel6.10: 'true'
    vm.kubevirt.io/template: rhel6-server-small
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: '1'
    vm.kubevirt.io/template.version: v0.13.1
    workload.template.kubevirt.io/server: 'true'
spec:
  dataVolumeTemplates:
    - apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        creationTimestamp: null
        name: rhel6-legal-gull-rootdisk
      spec:
        pvc:
          accessModes:
            - ReadWriteMany
          resources:
            requests:
              storage: 20Gi
          storageClassName: ocs-storagecluster-ceph-rbd
          volumeMode: Block
        source:
          http:
            url: >-
              http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/rhel-images/rhel-610.qcow2
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        flavor.template.kubevirt.io/small: 'true'
        kubevirt.io/domain: rhel6-legal-gull
        kubevirt.io/size: small
        os.template.kubevirt.io/rhel6.10: 'true'
        vm.kubevirt.io/name: rhel6-legal-gull
        workload.template.kubevirt.io/server: 'true'
    spec:
      domain:
        cpu:
          cores: 1
          sockets: 1
          threads: 1
        devices:
          disks:
            - bootOrder: 1
              disk:
                bus: virtio
              name: rootdisk
            - disk:
                bus: sata
              name: cloudinitdisk
          interfaces:
            - masquerade: {}
              model: virtio
              name: default
          rng: {}
          useVirtioTransitional: true
        machine:
          type: pc-q35-rhel8.3.0
        resources:
          requests:
            memory: 2Gi
      evictionStrategy: LiveMigrate
      hostname: rhel6-legal-gull
      networks:
        - name: default
          pod: {}
      terminationGracePeriodSeconds: 180
      volumes:
        - dataVolume:
            name: rhel6-legal-gull-rootdisk
          name: rootdisk
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              user: cloud-user
              password: rfh8-snre-fau3
              chpasswd: { expire: False }
          name: cloudinitdisk
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2021-01-31T10:06:06Z'
      status: 'True'
      type: Ready
  created: true
  ready: true
  volumeSnapshotStatuses:
    - enabled: true
      name: rootdisk
    - enabled: false
      name: cloudinitdisk
      reason: Volume type does not suport snapshots

Comment 10 errata-xmlrpc 2021-03-10 11:22:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Comment 11 Petr Horáček 2021-08-18 09:26:23 UTC
*** Bug 1794243 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.