Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
while live-migrating many instances concurrently, libvirt sometimes return internal error: migration was active, but no RAM info was set:
~~~
2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set
~~~
Version-Release number of selected component (if applicable):
libvirt-daemon-6.0.0-25.6.module+el8.2.1+12457+868e9540.ppc64le
How reproducible:
Random
Steps to Reproduce:
1. live evacuate a compute
2.
3.
Actual results:
live migration fails and leave database info in dire state
Expected results:
completes successfully
Additional info:
Could you please attach debug logs [1] of libvirtd when this happens so that we can see what qemu returned? The error message you've seen happens when ram info is missing in the reply of 'query-migrate' from qemu.
[1] https://www.libvirt.org/kbase/debuglogs.html
My understanding:
According to the description in attached customer case, vm migration is successful, the only issue is that libvirt reports below error when querying job info:
migration was active, but no RAM info was set
Comment 4Dr. David Alan Gilbert
2022-04-13 09:42:00 UTC
Looking at the customers logs, it doesn't look to me as if they're migrating lots in parallel on any one machine; only one or two VMs at a time.
Most of the VMs migrate very quickly (a few seconds).
Looking at the qemu code, it looks pretty solid; I can kind of see a 'maybe' theoretical race;
the code that generates the 'query-migrate' reads the 'status' twice. Maybe if it changes
between the two you could end up wiht an inconsistency. Maybe. Never seen it though.
Comparing the timestamp of initiating migration and the error:
1) 2022-03-30 06:08:37.025+0000: initiating migration
2) 2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set
We can see the error happened at the very early phase of migration.
Comment 6Dr. David Alan Gilbert
2022-04-13 11:17:57 UTC
That could be the race I can imagine; If the code read the status as 'setup' it wouldn't save the ram info.
If the migration then switche dsetup->active
and then copied the status field as 'active'
you'd see this symptom.
Comment 7Dr. David Alan Gilbert
2022-04-13 11:35:09 UTC
I've just posted a qemu fix:
migration: Read state once
It's a theoretical fix, in the sense we've not got enough debug to know if this is the real cause.
Comment 8Kashyap Chamarthy
2022-04-13 12:16:31 UTC
(In reply to Dr. David Alan Gilbert from comment #7)
> I've just posted a qemu fix:
> migration: Read state once
Link to the above:
https://lists.gnu.org/archive/html/qemu-devel/2022-04/msg01395.html
Thanks, Dave (G)!
> It's a theoretical fix, in the sense we've not got enough debug to know if
> this is the real cause.
Comment 11Jaroslav Suchanek
2022-04-14 15:04:03 UTC
Passing down to qemu-kvm for further processing.
Comment 12Dr. David Alan Gilbert
2022-04-14 15:21:26 UTC
Hmm, so I'm not too sure what to do with this; assuming my fix for hte theoretical reason goes in upstream, then what?
We've not got a reproducer fo rit; do we bother backporting it or just take it in the next one?
Comment 16Dr. David Alan Gilbert
2022-04-25 08:38:50 UTC
My qemu fix is in upstream qemu; 552de79bfdd5e9e53847eb3c6d6e4cd898a4370e migration: Read state once
Test on kernel-4.18.0-449.el8.x86_64 && qemu-kvm-6.2.0-29.module+el8.8.0+17991+08d03241.x86_64 with repeating nearly 300 times according to the reproduce case (see Comment 13), they all pass, didn't hit any issues.
So mark this bug verified per above results.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2023:2757