Bug 1784049 - Rhel6 guest with cluster default q35 chipset causes kernel panic
Summary: Rhel6 guest with cluster default q35 chipset causes kernel panic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.4.0
: ---
Assignee: Liran Rotenberg
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-16 15:07 UTC by Radek Duda
Modified: 2021-01-28 00:18 UTC (History)
13 users (show)

Fixed In Version: rhv-4.4.0-31
Doc Type: Bug Fix
Doc Text:
Previously, if you ran a virtual machine (VMs) with an old operating system such as RHEL 6 and the BIOS Type was a Q35 Chipset, it caused a kernel panic. The current release fixes this issue. If a VM has an old operating system and the BIOS Type is a Q35 Chipset, it uses the VirtIO-transitional model for some devices, which enables the VM to run normally.
Clone Of:
Environment:
Last Closed: 2020-08-04 13:21:21 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rhel6.10 q35 qemu cmdline (4.42 KB, text/plain)
2019-12-16 15:07 UTC, Radek Duda
no flags Details
kernel panic (20.25 KB, image/png)
2019-12-16 15:08 UTC, Radek Duda
no flags Details
rhel6 libvirt xml (11.29 KB, text/plain)
2020-01-02 13:36 UTC, Radek Duda
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5749441 0 None None None 2021-01-27 23:23:33 UTC
Red Hat Product Errata RHSA-2020:3247 0 None None None 2020-08-04 13:21:44 UTC
oVirt gerrit 108136 0 master MERGED core: support Q35 with legacy VirtIO 2021-02-11 15:14:52 UTC
oVirt gerrit 108220 0 master MERGED core: remove legacy virtio to scsi 2021-02-11 15:14:52 UTC
oVirt gerrit 108287 0 master MERGED core: Q35 devices virtio-transitional 2021-02-11 15:14:52 UTC

Description Radek Duda 2019-12-16 15:07:27 UTC
Created attachment 1645617 [details]
rhel6.10 q35 qemu cmdline

Description of problem:
Rhel6.10 VM with default configuration (q35 chipset - cluster Default) is unusable. Kernel panic emerges during boot process.

Version-Release number of selected component (if applicable):
engine:
ovirt-engine-4.4.0-0.6.master.el7.noarch

host:
libvirt-5.0.0-7.module+el8+2887+effa3c42.x86_64
vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64

guest (rhel6.10 (20180613 latest compose)):
kernel-2.6.32-754.el6.x86_64
dracut-004-411.el6.noarch

How reproducible:
always

Steps to Reproduce:
1.Create rhel6 VM in RHV4.4 (choose Operating system as Red Hat Enterprise Linux  6.x x64)
2. boot VM

Actual results:
VM starts, but Kernel panic occurs during boot process

Expected results:
VM starts successfully

Additional info:
If I make rhel6 VM with i440-fx chipset ("Edit virtual machine" -> System -> Custom emulated machine: pc-i440fx-rhel7.6.0) instead of q35, VM is working ok.

* this is regression since in rhv4.3 was cluster Default i440-fx chipset

Comment 3 Radek Duda 2019-12-16 15:08:25 UTC
Created attachment 1645618 [details]
kernel panic

Comment 13 Michal Skrivanek 2019-12-17 08:25:09 UTC
Please test in 4.4.0-9, and make sure the Cluster you are creating has a concrete CPU set - do not use auto detection

Comment 14 Radek Duda 2019-12-17 09:04:05 UTC
I tested with 4.4.0-9 with the same result.
I do not see any possibility for CPU set auto detection.

I have Cluster setting like this:
Cluster CPU Type:
Intel SandyBridge Family
Emulated Machine:
pc-q35-rhel8.0.0

Comment 15 Ryan Barry 2019-12-17 13:38:37 UTC
Q35 is only partly supported in RHEL6, but should boot. Can we get a screenshot of the panic?

Comment 16 Radek Duda 2019-12-17 13:42:48 UTC
(In reply to Ryan Barry from comment #15)
> Q35 is only partly supported in RHEL6, but should boot. Can we get a
> screenshot of the panic?

It is already attached

Comment 17 Ryan Barry 2019-12-17 13:49:29 UTC
Sorry, let me be clearer. The screenshot of the panic does not give enough information. The cmdline is not the kernel cmdline used, and it's not clear whether this comes from a template, fresh install with Anaconda, or other. What is the exact reproducer?

Comment 20 Radek Duda 2020-01-02 13:36:12 UTC
Created attachment 1649180 [details]
rhel6 libvirt xml

Comment 21 Martin Tessun 2020-03-23 14:13:12 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1380285 - legacy virtio is needed for RHEL 6 guests.

Comment 22 Michal Skrivanek 2020-03-31 13:19:45 UTC
if that's enough then it's as "complex" as blocking rhel 6, so let's give it a try with legacy virtio

Comment 23 Liran Rotenberg 2020-03-31 15:19:09 UTC
For me it works on master.
I created VM with RHEL 6.10, q35 chipset (also tried to create one from if440x template).
Boot passed without kernel panic.

I got a warning on the CPU when booting (Skylake, which isn't signed and unrecognized for el6).
Another issue was booting with Virtio or Virtio-SCSI as the interface to the disk.
In this scenario I did encountered kernel panic.

I tried, after Michal suggestion to add model=virtio-transitional to the disk while using VirtIO to use legacy virtio (https://libvirt.org/formatdomain.html#elementsVirtioTransitional).

    <disk snapshot="no" type="file" device="disk">
      <target dev="vda" bus="virtio"/>
      <source file="/rhev/data-center/99ea6e34-21a8-11ea-b3f7-482ae35a5f83/a32ac133-0d03-4be0-8a3d-97df0c85c653/images/3ee65f3c-ccc7-41e1-84ca-15e5c83a273f/786bf401-5007-4209-8ddc-d308ac9778b1">
        <seclabel model="dac" type="none" relabel="no"/>
      </source>
      <driver name="qemu" iothread="1" io="threads" type="qcow2" error_policy="stop" cache="none" model="virtio-transitional"/>
      <alias name="ua-3ee65f3c-ccc7-41e1-84ca-15e5c83a273f"/>
      <boot order="1"/>
      <serial>3ee65f3c-ccc7-41e1-84ca-15e5c83a273f</serial>
    </disk>

libvirt domxml as return:

<disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads' iothread='1'/>
      <source file='/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/a32ac133-0d03-4be0-8a3d-97df0c85c653/images/3ee65f3c-ccc7-41e1-84ca-15e5c83a273f/786bf401-5007-4209-8ddc-d308ac9778b1'>
        <seclabel model='dac' relabel='no'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <serial>3ee65f3c-ccc7-41e1-84ca-15e5c83a273f</serial>
      <boot order='1'/>
      <alias name='ua-3ee65f3c-ccc7-41e1-84ca-15e5c83a273f'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>

This still resulted still the same.

For what it worth, maybe asking virt what is the expected behavior for model="virtio-transitional", because it was thrown.
As for RHEV, we can set by default SATA as disk interface for Q35 chipset, or maybe make Virtio/Virtio-SCSI invalid for VMs with OS older than el7.

Comment 24 Liran Rotenberg 2020-03-31 15:36:05 UTC
After a small discussion with libvirt guys.
I was setting the attribute to the wrong value.
- model="virtio-transitional" should be added to the disk element.
After setting it manually, the VM goes up with Q35 and Virtio.

Comment 25 Lukas Svaty 2020-04-01 09:34:49 UTC
moving to 4.4.0, due to regression keyword

Comment 26 Michal Skrivanek 2020-04-01 11:03:16 UTC
will probably fix in time, but just in case, moving to 4.4.1, It's not a regression RHEL 6 was never supported on Q35 because Q35 wasn't supported at all in <4.4

Comment 28 Liran Rotenberg 2020-04-05 12:13:34 UTC
Moving back to POST. SCSI disk isn't supported and caused a regression. A new patch fixing it is posted.

Comment 29 Avihai 2020-04-06 08:01:33 UTC
(In reply to Liran Rotenberg from comment #28)
> Moving back to POST. SCSI disk isn't supported and caused a regression. A
> new patch fixing it is posted.

Just to clarify, automation test cases which created VM with RHEL6 + virtio/virtio_iscsi disks encountered this issue.
Meaning RHEL6 OS_TYPE VM's did not boot up.

Liran saw the issue, debugged and issued a patch to fix it.

Issue seen at:
ovirt-engine-4.4.0-0.31.master.el8ev.noarch
vdsm-4.40.11-1.el8ev.x86_64
libvirt-6.0.0-16.module+el8.2.0+6131+4e715f3b.x86_64
Qemu-img-4.2.0-17.module+el8.2.0+6131+4e715f3b.x86_64

Comment 30 Liran Rotenberg 2020-04-06 13:57:53 UTC
I'm moving it back to ASSIGNED.
The next build will resolve disk using VirtIO and the regression to RHV QE.
Using VirtIO-SCSI is possible if the controller device is changed.
Also, some other devices might need it to support the VM OS.

Comment 31 Nisim Simsolo 2020-04-22 10:31:55 UTC
Verified:
ovirt-engine-4.4.0-0.33.master.el8ev
vdsm-4.40.13-1.el8ev.x86_64
qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950.x86_64
libvirt-daemon-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64

Verification scenario:
1. Run RHEL 6 VM with cluster default (Q35 chipset) and VirtIO disk interface.
   Verify VM is booting.
2. Run RHEL 6 VM with cluster default and VirtIO-SCSI disk interface.
   Verify VM is booting.

Comment 39 errata-xmlrpc 2020-08-04 13:21:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3247


Note You need to log in before you can comment on or make changes to this bug.