Bug 1981005 - Migrations to 4.4.7 hosts fail - qemu regression
Summary: Migrations to 4.4.7 hosts fail - qemu regression
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-distribution
Classification: oVirt
Component: General
Version: 4.4.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.4.8
: 4.4.8
Assignee: Eduardo Lima (Etrunko)
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-10 12:54 UTC by Nathaniel Roach
Modified: 2021-08-23 11:15 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Rebased qemu-kvm and libvirt packages on CentOS Virtualization SIG for CentOS Stream 8 including a fix for bug #1964326 - Qemu core dump when do tls migration via tcp protocol.
Clone Of:
Environment:
Last Closed: 2021-08-23 09:00:56 UTC
oVirt Team: Node
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
engine.log (23.20 KB, text/plain)
2021-07-12 10:42 UTC, Nathaniel Roach
no flags Details
source vdsm.log (136.76 KB, text/plain)
2021-07-12 10:43 UTC, Nathaniel Roach
no flags Details
destination vdsm.log (161.56 KB, text/plain)
2021-07-12 10:44 UTC, Nathaniel Roach
no flags Details

Description Nathaniel Roach 2021-07-10 12:54:09 UTC
Description of problem:

After upgrade (and selinux policy fix) from ONN 4.4.6 to 4.4.7, VMs are no longer able to be migrated to 4.4.7 hosts.

Version-Release number of selected component (if applicable):
HE:
4.4.7.6-1.el8

ONN Version:
RHEL - 8.4.2105.0 - 3.el8
OS Description:
oVirt Node 4.4.7
Kernel Version:
4.18.0 - 315.el8.x86_64
KVM Version:
6.0.0 - 19.el8s
LIBVIRT Version:
libvirt-7.4.0-1.el8s
VDSM Version:
vdsm-4.40.70.6-1.el8
SPICE Version:
0.14.3 - 4.el8

How reproducible:
100% for me - either migration fails or VM crashes on recipient, after source has stopped the guest.

Steps to Reproduce:
(For me)
Install 4.4.6 ONN on host
Upgrade through the HE to 4.4.7
(Fix SELINUX policy)
Start a VM on another host (4.4.6 or 4.4.7)
Attempt to migrate to a 4.4.7 host

Actual results:
VM either crashes on recipient or migration fails with VM running on source.

Expected results:
VM seamlessly migrates

Additional info:
VDSM log:
2021-07-09 22:02:17,491+0800 INFO  (libvirt/events) [vds] Channel state for vm_id=5d11885a-37d3-4f68-a953-72d808f43cdd changed from=UNKNOWN(-1) to=disconnected(2) (qemuguestagent:289)
2021-07-09 22:02:55,537+0800 INFO  (libvirt/events) [virt.vm] (vmId='5d11885a-37d3-4f68-a953-72d808f43cdd') underlying process disconnected (vm:1134)
2021-07-09 22:02:55,537+0800 INFO  (libvirt/events) [virt.vm] (vmId='5d11885a-37d3-4f68-a953-72d808f43cdd') Release VM resources (vm:5313)
2021-07-09 22:02:55,537+0800 INFO  (libvirt/events) [virt.vm] (vmId='5d11885a-37d3-4f68-a953-72d808f43cdd') Stopping connection (guestagent:438)
2021-07-09 22:02:55,539+0800 INFO  (libvirt/events) [virt.vm] (vmId='5d11885a-37d3-4f68-a953-72d808f43cdd') Stopping connection (guestagent:438)
2021-07-09 22:02:55,539+0800 INFO  (libvirt/events) [vdsm.api] START inappropriateDevices(thiefId='5d11885a-37d3-4f68-a953-72d808f43cdd') from=internal, task_id=7abe370b-13bc-4c49-bf02-2e40db142250 (api:48)
2021-07-09 22:02:55,544+0800 WARN  (vm/5d11885a) [virt.vm] (vmId='5d11885a-37d3-4f68-a953-72d808f43cdd') Couldn't destroy incoming VM: Domain not found: no domain with matching uuid '5d11885a-37d3-4f68-a953-72d808f43cdd' (vm:4046)
2021-07-09 22:02:55,544+0800 INFO  (vm/5d11885a) [virt.vm] (vmId='5d11885a-37d3-4f68-a953-72d808f43cdd') Changed state to Down: VM destroyed during the startup (code=10) (vm:1895)

syslog:
Jul 09 22:35:01 HOSTNAME abrt-hook-ccpp[177862]: Process 177022 (qemu-kvm) of user 107 killed by SIGABRT - dumping core

qemu log for that guest:
qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
2021-07-09 14:02:54.521+0000: shutting down, reason=failed

Let me know if you need more details, I'm happy to fetch logs.

Comment 1 Arik 2021-07-11 08:21:53 UTC
Please attach engine.log and vdsm.log from both source and destination hosts

Comment 2 RHEL Program Management 2021-07-11 08:21:58 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Jean-Louis Dupond 2021-07-12 07:21:00 UTC
Seems like https://bugzilla.redhat.com/show_bug.cgi?id=1964326

Comment 4 Nathaniel Roach 2021-07-12 10:42:26 UTC
Created attachment 1800748 [details]
engine.log

hosted-engine /var/log/ovirt-engine/engine.log from the minute of the migration

Comment 5 Nathaniel Roach 2021-07-12 10:43:28 UTC
Created attachment 1800749 [details]
source vdsm.log

source ONN /var/log/vdsm/vdsm.log from the minute of the migration

Comment 6 Nathaniel Roach 2021-07-12 10:44:05 UTC
Created attachment 1800750 [details]
destination vdsm.log

destination ONN /var/log/vdsm/vdsm.log from the minute of the migration

Comment 8 Milan Zamazal 2021-07-12 13:06:06 UTC
I checked that:

- After upgrading QEMU from 5.2.0 to 6.0.0-19, I can reproduce the bug.
- After further upgrading QEMU from 6.0.0-19 to 6.0.0-21, the bug disappears (thanks to Jean-Louis for pointing out the corresponding bug in Comment 3).

Since AV 8.4 has QEMU 5.2 and the upcoming AV 8.5 has a fixed QEMU version now, I think we can close this bug, it concerns only an interim QEMU version that nobody is going to use in future.

Nathaniel, could you check whether updating QEMU to 15:6.0.0-21 fixes the problem for you too, once it is available in Stream?

Comment 9 Jean-Louis Dupond 2021-07-12 13:12:44 UTC
(In reply to Milan Zamazal from comment #8)
> 
> Since AV 8.4 has QEMU 5.2 and the upcoming AV 8.5 has a fixed QEMU version
> now, I think we can close this bug, it concerns only an interim QEMU version
> that nobody is going to use in future.
> 


But I think we should get fixed qemu version asap in Virt SIG and do a ovirt-node respin with the fixed version in it.

Comment 10 Milan Zamazal 2021-07-12 13:45:42 UTC
(In reply to Jean-Louis Dupond from comment #9)
> 
> But I think we should get fixed qemu version asap in Virt SIG and do a
> ovirt-node respin with the fixed version in it.

Ah, right, ovirt-node 4.4.7 has the broken QEMU version :-(.

Sandro, any chance to get an updated version in Node?

Comment 12 Lev Veyde 2021-07-13 00:48:49 UTC
(In reply to Milan Zamazal from comment #10)
> (In reply to Jean-Louis Dupond from comment #9)
> > 
> > But I think we should get fixed qemu version asap in Virt SIG and do a
> > ovirt-node respin with the fixed version in it.
> 
> Ah, right, ovirt-node 4.4.7 has the broken QEMU version :-(.
> 
> Sandro, any chance to get an updated version in Node?

Can somebody please test this build ?:

https://jenkins.ovirt.org/job/ovirt-node-ng-image_master_build-artifacts-el8-x86_64/784/artifact/exported-artifacts/ovirt-node-ng-installer-4.4.7-2021071221.el8.iso

It's a custom build with qemu-img/kvm*/guest-agent 5.2.0.

There is also a corresponding update package at:
https://jenkins.ovirt.org/job/ovirt-node-ng-image_master_build-artifacts-el8-x86_64/784/artifact/exported-artifacts/ovirt-node-ng-image-update-4.4.7.1-1.el8.noarch.rpm


Please note that these ISO and RPM also contains a fix for the https://bugzilla.redhat.com/show_bug.cgi?id=1979624

Comment 13 Jean-Louis Dupond 2021-07-13 09:14:32 UTC
Tested with the RPM, and it seems to work fine!

Comment 14 Nathaniel Roach 2021-07-13 13:10:09 UTC
Tested it with the iso, and can confirm that it's working fine for receiving migrations.

Comment 15 Sandro Bonazzola 2021-07-13 13:48:18 UTC
Eduardo, can you provide a new qemu-kvm build with the fix?

Comment 17 Eduardo Lima (Etrunko) 2021-07-28 14:22:10 UTC
(In reply to Sandro Bonazzola from comment #15)
> Eduardo, can you provide a new qemu-kvm build with the fix?

Sure, I just don't understand exactly for which distro the package should be built.

Comment 18 Jean-Louis Dupond 2021-07-29 06:51:09 UTC
(In reply to Eduardo Lima (Etrunko) from comment #17)
> (In reply to Sandro Bonazzola from comment #15)
> > Eduardo, can you provide a new qemu-kvm build with the fix?
> 
> Sure, I just don't understand exactly for which distro the package should be
> built.

For virt8s-advancedvirt-common-*, as the current one (qemu-kvm-6.0.0-19.el8s) contains some critical bugs.

Comment 19 Eduardo Lima (Etrunko) 2021-08-02 14:49:50 UTC
(In reply to Jean-Louis Dupond from comment #18)
> (In reply to Eduardo Lima (Etrunko) from comment #17)
> > (In reply to Sandro Bonazzola from comment #15)
> > > Eduardo, can you provide a new qemu-kvm build with the fix?
> > 
> > Sure, I just don't understand exactly for which distro the package should be
> > built.
> 
> For virt8s-advancedvirt-common-*, as the current one
> (qemu-kvm-6.0.0-19.el8s) contains some critical bugs.

Packages in CentOS Stream 8 SIG have been updated, can you please check?

qemu-kvm-6.0.0-26.el8s
libvirt-7.5.0-1.el8s

Comment 20 Lev Veyde 2021-08-02 16:29:40 UTC
(In reply to Eduardo Lima (Etrunko) from comment #19)
> (In reply to Jean-Louis Dupond from comment #18)
> > (In reply to Eduardo Lima (Etrunko) from comment #17)
> > > (In reply to Sandro Bonazzola from comment #15)
> > > > Eduardo, can you provide a new qemu-kvm build with the fix?
> > > 
> > > Sure, I just don't understand exactly for which distro the package should be
> > > built.
> > 
> > For virt8s-advancedvirt-common-*, as the current one
> > (qemu-kvm-6.0.0-19.el8s) contains some critical bugs.
> 
> Packages in CentOS Stream 8 SIG have been updated, can you please check?
> 
> qemu-kvm-6.0.0-26.el8s
> libvirt-7.5.0-1.el8s

Yes, I see both at the random mirror at:

https://linux-mirrors.fnal.gov/linux/centos/8-stream/virt/x86_64/advancedvirt-common/Packages/

We'll plan a new build including these packages for 4.4.8 RC this week.

Comment 21 Lev Veyde 2021-08-06 00:38:45 UTC
Built a new ovirt-node-ng including the new Qemu version:

https://jenkins.ovirt.org/job/ovirt-node-ng-image_master_build-artifacts-el8-x86_64/814/


Note You need to log in before you can comment on or make changes to this bug.