Bug 1783192
| Summary: | Guest kernel panic when start RHEL6.10 guest with q35 machine type and virtio disk in cnv | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | chhu |
| Component: | Virtualization | Assignee: | Igor Bezukh <ibezukh> |
| Status: | CLOSED ERRATA | QA Contact: | Israel Pinto <ipinto> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 1.3 | CC: | abologna, berrange, cnv-qe-bugs, crobinso, danken, dgilbert, dyuan, fdeutsch, hhan, ibezukh, jdenemar, joedward, jsuchane, kbidarka, lmen, oyahud, rmohr, rnetser, sgott, xuzhang |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 2.6.0 | Flags: | ibezukh:
needinfo+
ibezukh: needinfo+ |
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | libvirt_CNV_INT | ||
| Fixed In Version: | virt-launcher-container-v2.6.0-99 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-03-10 11:16:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Switching this one to CNV as libvirt cannot do anything about it. This is policy decision. According to https://access.redhat.com/articles/4234591 RHEL 6 is not supported at this point in time. https://access.redhat.com/articles/4234591 As per this kbase article, we plan to close this bug. As RHEL6 is not supported with OpenShift Virtualization The RHEL 6 templates in OpenShift Virtualization use the sata disk bus, because virtio is known to have issues. The easy solution is to switch the bus type to sata. An alternative solution is to place the disks on the pci root bus. Israel explored this option Omer, it appears that we already set the bus type to sata for RHEL 6. can you confirm that? As I said on by 1892340, the way to debug this is to attach a console, tdo a reset of the VM and then attack grub; removing the 'quiet rhgb' kernel parameters and then seeing the full boot where the root device is missing. Moving the PCIe devices to bus 0 makes this work with virtio devices. The fix is now backported. I assume we need to update the templates to turn on the new API boolean "spec.domain.devices.useVirtioTransitional" for RHEL 6.x (In reply to Igor Bezukh from comment #21) > The fix is now backported. Which fix are you referring to? Please link all relevant PRs to the bug. Once they are all merged, move the BZ to MODIFIED. > I assume we need to update the templates to turn on the new API boolean > "spec.domain.devices.useVirtioTransitional" for RHEL 6.x +1. (In reply to Dan Kenigsberg from comment #22) > (In reply to Igor Bezukh from comment #21) > > The fix is now backported. > > Which fix are you referring to? Please link all relevant PRs to the bug. > Once they are all merged, move the BZ to MODIFIED. > Attached PR links. As soon as I Will locate them DS I will update the "Fixed in version" > > I assume we need to update the templates to turn on the new API boolean > > "spec.domain.devices.useVirtioTransitional" for RHEL 6.x > > +1. WIP Attached common-templates PR link *** Bug 1913342 has been marked as a duplicate of this bug. *** As the PR https://github.com/kubevirt/common-templates/pull/305/files is still open and there is conflict with this PR: https://github.com/kubevirt/common-templates/pull/292/files on the nic model reopening this bug. The root cause of the verification failure was that no upstream release was tagged--which common-templates consumes. We believe the relationship between PRs in Comment #27 is incorrect, thus moving this BZ back to ON_QE. I've tested rhel6.10 and centos6 images on my local KV setup and the fix with flag does work. I also created a VM from the templates and applied it on the local setup and it works as well. The only issue is with the upstream CI of common-templates and with KV release being untagged, thus moving this bug to ON_QA, this scenario can be safely tested by QE Verify with:
virt-operator-container-v2.6.0-106
virt-launcher-container-v2.6.0-106
Create VM with CNV common template to get useVirtioTransitional: true
and update the disk and drivers to virtio see vm spec [1]
Tested:
1. VM is running with virtio drivers
2. Connect via console and VNC
3. Connect with SSH
All PASS
[1]
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
annotations:
kubevirt.io/latest-observed-api-version: v1alpha3
kubevirt.io/storage-observed-api-version: v1alpha3
name.os.template.kubevirt.io/rhel6.10: Red Hat Enterprise Linux 6.0 or higher
vm.kubevirt.io/flavor: small
vm.kubevirt.io/os: rhel6
vm.kubevirt.io/validations: |
[
{
"name": "minimal-required-memory",
"path": "jsonpath::.spec.domain.resources.requests.memory",
"rule": "integer",
"message": "This VM requires more memory.",
"min": 536870912
}
]
vm.kubevirt.io/workload: server
selfLink: /apis/kubevirt.io/v1alpha3/namespaces/rhel6/virtualmachines/rhel6-legal-gull
resourceVersion: '2795855'
name: rhel6-legal-gull
uid: 9186b5ae-0dcb-46ea-92a7-36ae3e9f7fa9
creationTimestamp: '2021-01-31T10:02:54Z'
generation: 2
managedFields:
- apiVersion: kubevirt.io/v1alpha3
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:name.os.template.kubevirt.io/rhel6.10': {}
'f:vm.kubevirt.io/flavor': {}
'f:vm.kubevirt.io/os': {}
'f:vm.kubevirt.io/validations': {}
'f:vm.kubevirt.io/workload': {}
'f:labels':
'f:vm.kubevirt.io/template.version': {}
'f:vm.kubevirt.io/template.namespace': {}
'f:app': {}
.: {}
'f:os.template.kubevirt.io/rhel6.10': {}
'f:vm.kubevirt.io/template.revision': {}
'f:workload.template.kubevirt.io/server': {}
'f:flavor.template.kubevirt.io/small': {}
'f:vm.kubevirt.io/template': {}
'f:spec':
.: {}
'f:dataVolumeTemplates': {}
'f:running': {}
'f:template':
.: {}
'f:metadata':
.: {}
'f:labels':
.: {}
'f:flavor.template.kubevirt.io/small': {}
'f:kubevirt.io/domain': {}
'f:kubevirt.io/size': {}
'f:os.template.kubevirt.io/rhel6.10': {}
'f:vm.kubevirt.io/name': {}
'f:workload.template.kubevirt.io/server': {}
'f:spec':
.: {}
'f:domain':
.: {}
'f:cpu':
.: {}
'f:cores': {}
'f:sockets': {}
'f:threads': {}
'f:devices':
.: {}
'f:disks': {}
'f:interfaces': {}
'f:rng': {}
'f:useVirtioTransitional': {}
'f:machine':
.: {}
'f:type': {}
'f:resources':
.: {}
'f:requests':
.: {}
'f:memory': {}
'f:evictionStrategy': {}
'f:hostname': {}
'f:networks': {}
'f:terminationGracePeriodSeconds': {}
'f:volumes': {}
manager: Mozilla
operation: Update
time: '2021-01-31T10:02:54Z'
- apiVersion: kubevirt.io/v1alpha3
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
'f:kubevirt.io/latest-observed-api-version': {}
'f:kubevirt.io/storage-observed-api-version': {}
'f:status':
.: {}
'f:conditions': {}
'f:created': {}
'f:ready': {}
'f:volumeSnapshotStatuses': {}
manager: virt-controller
operation: Update
time: '2021-01-31T10:06:09Z'
namespace: rhel6
labels:
app: rhel6-legal-gull
flavor.template.kubevirt.io/small: 'true'
os.template.kubevirt.io/rhel6.10: 'true'
vm.kubevirt.io/template: rhel6-server-small
vm.kubevirt.io/template.namespace: openshift
vm.kubevirt.io/template.revision: '1'
vm.kubevirt.io/template.version: v0.13.1
workload.template.kubevirt.io/server: 'true'
spec:
dataVolumeTemplates:
- apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
creationTimestamp: null
name: rhel6-legal-gull-rootdisk
spec:
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 20Gi
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Block
source:
http:
url: >-
http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/rhel-images/rhel-610.qcow2
running: true
template:
metadata:
creationTimestamp: null
labels:
flavor.template.kubevirt.io/small: 'true'
kubevirt.io/domain: rhel6-legal-gull
kubevirt.io/size: small
os.template.kubevirt.io/rhel6.10: 'true'
vm.kubevirt.io/name: rhel6-legal-gull
workload.template.kubevirt.io/server: 'true'
spec:
domain:
cpu:
cores: 1
sockets: 1
threads: 1
devices:
disks:
- bootOrder: 1
disk:
bus: virtio
name: rootdisk
- disk:
bus: sata
name: cloudinitdisk
interfaces:
- masquerade: {}
model: virtio
name: default
rng: {}
useVirtioTransitional: true
machine:
type: pc-q35-rhel8.3.0
resources:
requests:
memory: 2Gi
evictionStrategy: LiveMigrate
hostname: rhel6-legal-gull
networks:
- name: default
pod: {}
terminationGracePeriodSeconds: 180
volumes:
- dataVolume:
name: rhel6-legal-gull-rootdisk
name: rootdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
user: cloud-user
password: rfh8-snre-fau3
chpasswd: { expire: False }
name: cloudinitdisk
status:
conditions:
- lastProbeTime: null
lastTransitionTime: '2021-01-31T10:06:06Z'
status: 'True'
type: Ready
created: true
ready: true
volumeSnapshotStatuses:
- enabled: true
name: rootdisk
- enabled: false
name: cloudinitdisk
reason: Volume type does not suport snapshots
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Description of problem: Start RHEL6.10 guest with q35 machine type and virtio disk, guest kernel panic Version-Release number of selected component (if applicable): libvirt-daemon-driver-qemu-5.0.0-12.module+el8.0.1+3755+6782b0ed.x86_64 libvirt-daemon-kvm-5.0.0-12.module+el8.0.1+3755+6782b0ed.x86_64 qemu-kvm-core-3.1.0-30.module+el8.0.1+3755+6782b0ed.x86_64 How reproducible: 100% Steps to Reproduce in cnv: 1. Start rhel6.10 VMI in cnv2.1 with yaml file: asb-vmi-nfs-rhel.yaml, with virtio disk: devices: disks: - disk: bus: virtio name: pvcdisk 2. Login to the VMI, the kenrel crash # virtctl console asb-vmi-nfs-rhel Successfully connected to asb-vmi-nfs-rhel console. The escape sequence is ^] Kernel panic - not syncing: Attempted to kill init! Pid: 1, comm: init Not tainted 2.6.32-754.el6.x86_64 #1 Call Trace: [<ffffffff8155344d>] ? panic+0xa7/0x18b [<ffffffff8112e250>] ? perf_event_exit_task+0xc0/0x340 [<ffffffff810845f3>] ? do_exit+0x853/0x860 [<ffffffff8119f2b5>] ? fput+0x25/0x30 [<ffffffff81084658>] ? do_group_exit+0x58/0xd0 [<ffffffff810846e7>] ? sys_exit_group+0x17/0x20 [<ffffffff8155f3cb>] ? system_call_fastpath+0x2f/0x34 3.Try to change the virtio to virtio-transitional, it's not supported in cnv2.1 yet # oc create -f asb-vmi-nfs-rhel.yaml ------------------------------------ devices: disks: - disk: bus: virtio-transitional name: pvcdisk ------------------------------------- The "" is invalid: spec.domain.devices.disks[0].disk.bus: spec.domain.devices.disks[0] is set with an unrecognized bus virtio-transitional, must be one of: [virtio sata scsi] Actual results: In step2: Hit guest kernel panic Expected results: In step2: Guest start successfully