Bug 1788783
| Summary: | after_migration is not sent to the guest after migration | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> | |
| Component: | vdsm | Assignee: | Milan Zamazal <mzamazal> | |
| Status: | CLOSED ERRATA | QA Contact: | Beni Pelled <bpelled> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 4.3.7 | CC: | lsurette, mzamazal, pelauter, rdlugyhe, sgoodman, srevivo, ycui | |
| Target Milestone: | ovirt-4.4.0 | Flags: | lsvaty:
testing_plan_complete-
|
|
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously, when migrating a virtual machine, information about the running guest agent was not always passed to the destination host. In these cases, the migrated virtual machine on the destination host did not receive an after_migration life cycle event notification.
This update fixes this issue. The after_migration notification works as expected now.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1795726 (view as bug list) | Environment: | ||
| Last Closed: | 2020-08-04 13:27:53 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1795726 | |||
|
Description
Germano Veit Michel
2020-01-08 05:32:14 UTC
Hi Germano, it's weird. If I understand it right, guestAgentAPIVersion is always present in params, but not always in metadata. But the value is set to be the same in both params in metadata in Vm.migration_parameters(). We have a bug in 4.4 that the value is not set in metadata, but then it's also not set in params. We always get the API version from metadata in 4.3. If it is missing then there is a race, if after_migrate is called before the API version is retrieved, it doesn't work. Now the question is why is the value missing in metadata after migration. Can you see it in the source log in the domain XML sent to the destination? Can you see "dumped metadata" debug log message in vdsm.log on the source just before a migration demonstrating the failure? We have libvirt hooks transforming the domain XML on migrations but they are completely unrelated and should be harmless. A libvirt bug is also unlikely, why would it eat just guestAgentAPIVersion. I can't see any obvious point of failure at the moment; we should look into source vdsm.log whether we can get any hint from it. Hi Milan, It is indeed very weird. I could not find any clue on where this is coming from. It seems like the guestAgentAPIVersion is simply wiped on the destination host at some point. I don't have the logs from that host anymore, I've wiped it. But I can reproduce this again, with debug also on source. Besides virt debug on both src and dst, do you have any other idea of log to enable or even add to the code to try to clarify this? Thanks (In reply to Germano Veit Michel from comment #3) > Besides virt debug on both src and dst, do you have any other idea of log to > enable or even add to the code to try to clarify this? Enabling libvirt debug log may be useful, in case Vdsm logs don't provide enough info. I can reproduce it now. libvirt log shows a correct and completed metadata update but then the inserted item is apparently missing in the domain XML sent to destination. I suspect some timing issues with change propagation in libvirt, will look into it further. Thanks. I've reproduced it again as well, will attach logs anyway just in case. 2020-01-21 13:29:31,362+1000 DEBUG (vm/59244f31) [virt.vm] (vmId='59244f31-f6a2-474d-b52c-afe2a1e5e978') send_lifecycle_event after_migration called (guestagent:565) Slightly after: 2020-01-21 13:29:34,782+1000 INFO (vmchannels) [virt.vm] (vmId='59244f31-f6a2-474d-b52c-afe2a1e5e978') Guest API version changed from 0 to 3 (guestagent:292) I think the problem is that initial destination metadata is not read from the libvirt domain XML but from _srcDomXML parameter provided in the Vdsm migration creation call. But the guest agent API version is set on the source only after _srcDomXML value is retrieved, so it's missing in _srcDomXML. I may still be missing something, because it doesn't explain itself why the problem doesn't appear on every migration. But up-to-date metadata should be present in _srcDomainXML anyway and that should be fixed. Germano, would you have a chance to check whether https://gerrit.ovirt.org/106483 fixes the problem with after_migration events? Hi Milan, sure I can. Just did not have any chance to work on this today. Please hold a bit, hopefully tomorrow. (In reply to Milan Zamazal from comment #8) > Germano, would you have a chance to check whether > https://gerrit.ovirt.org/106483 fixes the problem with after_migration > events? It seems to work for me. Did 4 migrations in a row, and it worked 4 times: # grep after /var/log/ovirt-guest-agent/ovirt-guest-agent.log Dummy-1::INFO::2020-01-24 13:14:07,029::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed Dummy-1::INFO::2020-01-24 13:19:58,731::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed Dummy-1::INFO::2020-01-24 13:21:23,520::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed Dummy-1::INFO::2020-01-24 13:22:41,195::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed And the only Guest API change was when the Guest started, nothing on migrations: 2020-01-24 13:13:40,653+1000 INFO (vmchannels) [virt.vm] (vmId='59244f31-f6a2-474d-b52c-afe2a1e5e978') Guest API version changed from 0 to 3 (guestagent:292) Hi Germano, thank you for testing. I posted a patch fixing the bug in master. You are welcome, and thanks for the fix! Is this on the radar for 4.3.z after the fix is accepted? There is a customer ticket attached... The patch is basically one line, so it's definitely a suitable target this bug is targeting 4.4.1 and is in modified state. Can we retarget to 4.4.0 and move to QE? (In reply to Sandro Bonazzola from comment #15) > this bug is targeting 4.4.1 and is in modified state. Can we retarget to > 4.4.0 and move to QE? Yes. Verified on:
- RHV 4.4.0-0.26.master.el8ev
- vdsm-4.40.7-1.el8ev.x86_64 (host)
- libvirt-6.0.0-13.module+el8.2.0+6048+0fa476b4.x86_64 (host)
Verification steps:
1. Start a VM
2. Migrate the VM.
Result:
after_migration is sent to the guest:
# tail -f /var/log/vdsm/vdsm.log | grep 'after_migration'
2020-03-09 13:52:38,583+0200 DEBUG (vm/be5d4f39) [virt.vm] (vmId='be5d4f39-cb05-4a4d-9c6e-9856ff9c3e61') send_lifecycle_event after_migration called (guestagent:565)
2020-03-09 13:52:38,932+0200 DEBUG (vm/be5d4f39) [virt.vm] (vmId='be5d4f39-cb05-4a4d-9c6e-9856ff9c3e61') sent '{"__name__": "lifecycle-event", "type": "after_migration"}\n' (guestagent:342)
(DEBUG mode should be enabled in order to see these lines)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246 |