Bug 1783192 - Guest kernel panic when start RHEL6.10 guest with q35 machine type and virtio disk in cnv
Summary: Guest kernel panic when start RHEL6.10 guest with q35 machine type and virtio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 1.3
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
: 2.6.0
Assignee: Igor Bezukh
QA Contact: Israel Pinto
URL:
Whiteboard: libvirt_CNV_INT
: 1913342 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-13 09:26 UTC by chhu
Modified: 2023-09-15 00:20 UTC (History)
20 users (show)

Fixed In Version: virt-launcher-container-v2.6.0-99
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-10 11:16:12 UTC
Target Upstream Version:
Embargoed:
ibezukh: needinfo+
ibezukh: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt common-templates pull 302 0 None closed Adopt RHEL6 and Centos 6 templates to new libvirt/QEMU changes 2021-01-31 15:44:34 UTC
Github kubevirt common-templates pull 305 0 None closed Centos 6 and RHEL 6 to use virtio bus for network and storage by default 2021-01-31 15:44:33 UTC
Github kubevirt kubevirt pull 4730 0 None closed Support virtio-transitional for old guests 2021-01-31 15:45:18 UTC
Github kubevirt kubevirt pull 4763 0 None closed [release-0.36] Support virtio-transitional for old guests 2021-01-31 15:44:34 UTC
Red Hat Product Errata RHSA-2021:0799 0 None None None 2021-03-10 11:17:36 UTC

Internal Links: 1911662

Description chhu 2019-12-13 09:26:37 UTC
Description of problem:
Start RHEL6.10 guest with q35 machine type and virtio disk, guest kernel panic

Version-Release number of selected component (if applicable):
libvirt-daemon-driver-qemu-5.0.0-12.module+el8.0.1+3755+6782b0ed.x86_64
libvirt-daemon-kvm-5.0.0-12.module+el8.0.1+3755+6782b0ed.x86_64
qemu-kvm-core-3.1.0-30.module+el8.0.1+3755+6782b0ed.x86_64

How reproducible:
100%

Steps to Reproduce in cnv:
1. Start rhel6.10 VMI in cnv2.1 with yaml file: asb-vmi-nfs-rhel.yaml, with virtio disk:
    devices:
      disks:
      - disk:
          bus: virtio
        name: pvcdisk

2. Login to the VMI, the kenrel crash
# virtctl console asb-vmi-nfs-rhel
Successfully connected to asb-vmi-nfs-rhel console. The escape sequence is ^]
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-754.el6.x86_64 #1
Call Trace:
 [<ffffffff8155344d>] ? panic+0xa7/0x18b
 [<ffffffff8112e250>] ? perf_event_exit_task+0xc0/0x340
 [<ffffffff810845f3>] ? do_exit+0x853/0x860
 [<ffffffff8119f2b5>] ? fput+0x25/0x30
 [<ffffffff81084658>] ? do_group_exit+0x58/0xd0
 [<ffffffff810846e7>] ? sys_exit_group+0x17/0x20
 [<ffffffff8155f3cb>] ? system_call_fastpath+0x2f/0x34

3.Try to change the virtio to virtio-transitional, it's not supported in cnv2.1 yet
# oc create -f asb-vmi-nfs-rhel.yaml
------------------------------------
    devices:
      disks:
      - disk:
          bus: virtio-transitional
        name: pvcdisk
-------------------------------------
The  "" is invalid: spec.domain.devices.disks[0].disk.bus: spec.domain.devices.disks[0] is set with an unrecognized bus virtio-transitional, must be one of: [virtio sata scsi]

Actual results:
In step2: Hit guest kernel panic

Expected results:
In step2: Guest start successfully

Comment 4 Jaroslav Suchanek 2020-03-10 15:11:04 UTC
Switching this one to CNV as libvirt cannot do anything about it. This is policy decision.

Comment 5 Fabian Deutsch 2020-03-11 14:32:28 UTC
According to https://access.redhat.com/articles/4234591 RHEL 6 is not supported at this point in time.

Comment 9 Kedar Bidarkar 2020-07-22 12:23:09 UTC
https://access.redhat.com/articles/4234591   As per this kbase article, we plan to close this bug.

As RHEL6 is not supported with OpenShift Virtualization

Comment 13 Fabian Deutsch 2020-11-18 13:36:22 UTC
The RHEL 6 templates in OpenShift Virtualization use the sata disk bus, because virtio is known to have issues.

The easy solution is to switch the bus type to sata.

An alternative solution is to place the disks on the pci root bus.
Israel explored this option

Comment 14 sgott 2020-12-16 13:24:00 UTC
Omer, it appears that we already set the bus type to sata for RHEL 6. can you confirm that?

Comment 15 Dr. David Alan Gilbert 2020-12-16 17:45:44 UTC
As I said on by 1892340, the way to debug this is to attach a console, tdo a reset of the VM and then attack grub; removing the 'quiet rhgb' kernel parameters
and then seeing the full boot where the root device is missing.
Moving the PCIe devices to bus 0 makes this work with virtio devices.

Comment 21 Igor Bezukh 2021-01-10 07:40:33 UTC
The fix is now backported. 
I assume we need to update the templates to turn on the new API boolean "spec.domain.devices.useVirtioTransitional" for RHEL 6.x

Comment 22 Dan Kenigsberg 2021-01-10 07:58:19 UTC
(In reply to Igor Bezukh from comment #21)
> The fix is now backported. 

Which fix are you referring to? Please link all relevant PRs to the bug. Once they are all merged, move the BZ to MODIFIED.

> I assume we need to update the templates to turn on the new API boolean
> "spec.domain.devices.useVirtioTransitional" for RHEL 6.x

+1.

Comment 23 Igor Bezukh 2021-01-10 09:07:56 UTC
(In reply to Dan Kenigsberg from comment #22)
> (In reply to Igor Bezukh from comment #21)
> > The fix is now backported. 
> 
> Which fix are you referring to? Please link all relevant PRs to the bug.
> Once they are all merged, move the BZ to MODIFIED.
> 

Attached PR links. As soon as I Will locate them DS I will update the "Fixed in version"

> > I assume we need to update the templates to turn on the new API boolean
> > "spec.domain.devices.useVirtioTransitional" for RHEL 6.x
> 
> +1.

WIP

Comment 24 Igor Bezukh 2021-01-10 13:03:12 UTC
Attached common-templates PR link

Comment 25 Israel Pinto 2021-01-10 13:09:42 UTC
*** Bug 1913342 has been marked as a duplicate of this bug. ***

Comment 27 Israel Pinto 2021-01-13 12:03:30 UTC
As the PR https://github.com/kubevirt/common-templates/pull/305/files is still open and there is conflict with this PR: https://github.com/kubevirt/common-templates/pull/292/files
on the nic model reopening this bug.

Comment 29 sgott 2021-01-19 12:55:39 UTC
The root cause of the verification failure was that no upstream release was tagged--which common-templates consumes. We believe the relationship between PRs in Comment #27 is incorrect, thus moving this BZ back to ON_QE.

Comment 30 Igor Bezukh 2021-01-20 07:58:43 UTC
I've tested rhel6.10 and centos6 images on my local KV setup and the fix with flag does work. I also created a VM from the templates and applied it on the local setup and it works as well.
The only issue is with the upstream CI of common-templates and with KV release being untagged, thus moving this bug to ON_QA, this scenario can be safely tested by QE

Comment 31 Israel Pinto 2021-01-31 12:15:59 UTC
Verify with: 
virt-operator-container-v2.6.0-106
 virt-launcher-container-v2.6.0-106


Create VM with CNV common template to get useVirtioTransitional: true
and update the disk and drivers to virtio see vm spec [1]
Tested:
1. VM is running with virtio drivers
2. Connect via console and VNC
3. Connect with SSH
All PASS
 

[1]

apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/rhel6.10: Red Hat Enterprise Linux 6.0 or higher
    vm.kubevirt.io/flavor: small
    vm.kubevirt.io/os: rhel6
    vm.kubevirt.io/validations: |
      [
        {
          "name": "minimal-required-memory",
          "path": "jsonpath::.spec.domain.resources.requests.memory",
          "rule": "integer",
          "message": "This VM requires more memory.",
          "min": 536870912
        }
      ]
    vm.kubevirt.io/workload: server
  selfLink: /apis/kubevirt.io/v1alpha3/namespaces/rhel6/virtualmachines/rhel6-legal-gull
  resourceVersion: '2795855'
  name: rhel6-legal-gull
  uid: 9186b5ae-0dcb-46ea-92a7-36ae3e9f7fa9
  creationTimestamp: '2021-01-31T10:02:54Z'
  generation: 2
  managedFields:
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:name.os.template.kubevirt.io/rhel6.10': {}
            'f:vm.kubevirt.io/flavor': {}
            'f:vm.kubevirt.io/os': {}
            'f:vm.kubevirt.io/validations': {}
            'f:vm.kubevirt.io/workload': {}
          'f:labels':
            'f:vm.kubevirt.io/template.version': {}
            'f:vm.kubevirt.io/template.namespace': {}
            'f:app': {}
            .: {}
            'f:os.template.kubevirt.io/rhel6.10': {}
            'f:vm.kubevirt.io/template.revision': {}
            'f:workload.template.kubevirt.io/server': {}
            'f:flavor.template.kubevirt.io/small': {}
            'f:vm.kubevirt.io/template': {}
        'f:spec':
          .: {}
          'f:dataVolumeTemplates': {}
          'f:running': {}
          'f:template':
            .: {}
            'f:metadata':
              .: {}
              'f:labels':
                .: {}
                'f:flavor.template.kubevirt.io/small': {}
                'f:kubevirt.io/domain': {}
                'f:kubevirt.io/size': {}
                'f:os.template.kubevirt.io/rhel6.10': {}
                'f:vm.kubevirt.io/name': {}
                'f:workload.template.kubevirt.io/server': {}
            'f:spec':
              .: {}
              'f:domain':
                .: {}
                'f:cpu':
                  .: {}
                  'f:cores': {}
                  'f:sockets': {}
                  'f:threads': {}
                'f:devices':
                  .: {}
                  'f:disks': {}
                  'f:interfaces': {}
                  'f:rng': {}
                  'f:useVirtioTransitional': {}
                'f:machine':
                  .: {}
                  'f:type': {}
                'f:resources':
                  .: {}
                  'f:requests':
                    .: {}
                    'f:memory': {}
              'f:evictionStrategy': {}
              'f:hostname': {}
              'f:networks': {}
              'f:terminationGracePeriodSeconds': {}
              'f:volumes': {}
      manager: Mozilla
      operation: Update
      time: '2021-01-31T10:02:54Z'
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:kubevirt.io/latest-observed-api-version': {}
            'f:kubevirt.io/storage-observed-api-version': {}
        'f:status':
          .: {}
          'f:conditions': {}
          'f:created': {}
          'f:ready': {}
          'f:volumeSnapshotStatuses': {}
      manager: virt-controller
      operation: Update
      time: '2021-01-31T10:06:09Z'
  namespace: rhel6
  labels:
    app: rhel6-legal-gull
    flavor.template.kubevirt.io/small: 'true'
    os.template.kubevirt.io/rhel6.10: 'true'
    vm.kubevirt.io/template: rhel6-server-small
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: '1'
    vm.kubevirt.io/template.version: v0.13.1
    workload.template.kubevirt.io/server: 'true'
spec:
  dataVolumeTemplates:
    - apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        creationTimestamp: null
        name: rhel6-legal-gull-rootdisk
      spec:
        pvc:
          accessModes:
            - ReadWriteMany
          resources:
            requests:
              storage: 20Gi
          storageClassName: ocs-storagecluster-ceph-rbd
          volumeMode: Block
        source:
          http:
            url: >-
              http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/rhel-images/rhel-610.qcow2
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        flavor.template.kubevirt.io/small: 'true'
        kubevirt.io/domain: rhel6-legal-gull
        kubevirt.io/size: small
        os.template.kubevirt.io/rhel6.10: 'true'
        vm.kubevirt.io/name: rhel6-legal-gull
        workload.template.kubevirt.io/server: 'true'
    spec:
      domain:
        cpu:
          cores: 1
          sockets: 1
          threads: 1
        devices:
          disks:
            - bootOrder: 1
              disk:
                bus: virtio
              name: rootdisk
            - disk:
                bus: sata
              name: cloudinitdisk
          interfaces:
            - masquerade: {}
              model: virtio
              name: default
          rng: {}
          useVirtioTransitional: true
        machine:
          type: pc-q35-rhel8.3.0
        resources:
          requests:
            memory: 2Gi
      evictionStrategy: LiveMigrate
      hostname: rhel6-legal-gull
      networks:
        - name: default
          pod: {}
      terminationGracePeriodSeconds: 180
      volumes:
        - dataVolume:
            name: rhel6-legal-gull-rootdisk
          name: rootdisk
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              user: cloud-user
              password: rfh8-snre-fau3
              chpasswd: { expire: False }
          name: cloudinitdisk
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2021-01-31T10:06:06Z'
      status: 'True'
      type: Ready
  created: true
  ready: true
  volumeSnapshotStatuses:
    - enabled: true
      name: rootdisk
    - enabled: false
      name: cloudinitdisk
      reason: Volume type does not suport snapshots

Comment 34 errata-xmlrpc 2021-03-10 11:16:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Comment 35 Red Hat Bugzilla 2023-09-15 00:20:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.