Bug 1795726
| Summary: | after_migration is not sent to the guest after migration [RHV clone - 4.3.9] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
| Component: | vdsm | Assignee: | Milan Zamazal <mzamazal> |
| Status: | CLOSED ERRATA | QA Contact: | Beni Pelled <bpelled> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.7 | CC: | gveitmic, lsurette, mzamazal, rbarry, sgoodman, srevivo, ycui |
| Target Milestone: | ovirt-4.3.9 | Keywords: | ZStream |
| Target Release: | 4.3.9 | Flags: | lsvaty:
testing_plan_complete-
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | vdsm-4.30.42 | Doc Type: | Bug Fix |
| Doc Text: |
Before this update, when migrating a virtual machine, information about the running guest agent wasn't always passed to the destination host, resulting in an after_migration life cycle event not being sent to the virtual machine on the destination host. This update fixes the issue, and the after_migration notification should work as expected.
|
Story Points: | --- |
| Clone Of: | 1788783 | Environment: | |
| Last Closed: | 2020-04-02 16:32:08 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1788783 | ||
| Bug Blocks: | |||
|
Description
RHV bug bot
2020-01-28 17:22:18 UTC
Hi Germano, it's weird. If I understand it right, guestAgentAPIVersion is always present in params, but not always in metadata. But the value is set to be the same in both params in metadata in Vm.migration_parameters(). We have a bug in 4.4 that the value is not set in metadata, but then it's also not set in params. We always get the API version from metadata in 4.3. If it is missing then there is a race, if after_migrate is called before the API version is retrieved, it doesn't work. Now the question is why is the value missing in metadata after migration. Can you see it in the source log in the domain XML sent to the destination? Can you see "dumped metadata" debug log message in vdsm.log on the source just before a migration demonstrating the failure? We have libvirt hooks transforming the domain XML on migrations but they are completely unrelated and should be harmless. A libvirt bug is also unlikely, why would it eat just guestAgentAPIVersion. I can't see any obvious point of failure at the moment; we should look into source vdsm.log whether we can get any hint from it. (Originally by Milan Zamazal) Hi Milan, It is indeed very weird. I could not find any clue on where this is coming from. It seems like the guestAgentAPIVersion is simply wiped on the destination host at some point. I don't have the logs from that host anymore, I've wiped it. But I can reproduce this again, with debug also on source. Besides virt debug on both src and dst, do you have any other idea of log to enable or even add to the code to try to clarify this? Thanks (Originally by Germano Veit Michel) (In reply to Germano Veit Michel from comment #3) > Besides virt debug on both src and dst, do you have any other idea of log to > enable or even add to the code to try to clarify this? Enabling libvirt debug log may be useful, in case Vdsm logs don't provide enough info. (Originally by Milan Zamazal) I can reproduce it now. libvirt log shows a correct and completed metadata update but then the inserted item is apparently missing in the domain XML sent to destination. I suspect some timing issues with change propagation in libvirt, will look into it further. (Originally by Milan Zamazal) Thanks. I've reproduced it again as well, will attach logs anyway just in case. 2020-01-21 13:29:31,362+1000 DEBUG (vm/59244f31) [virt.vm] (vmId='59244f31-f6a2-474d-b52c-afe2a1e5e978') send_lifecycle_event after_migration called (guestagent:565) Slightly after: 2020-01-21 13:29:34,782+1000 INFO (vmchannels) [virt.vm] (vmId='59244f31-f6a2-474d-b52c-afe2a1e5e978') Guest API version changed from 0 to 3 (guestagent:292) (Originally by Germano Veit Michel) I think the problem is that initial destination metadata is not read from the libvirt domain XML but from _srcDomXML parameter provided in the Vdsm migration creation call. But the guest agent API version is set on the source only after _srcDomXML value is retrieved, so it's missing in _srcDomXML. I may still be missing something, because it doesn't explain itself why the problem doesn't appear on every migration. But up-to-date metadata should be present in _srcDomainXML anyway and that should be fixed. Germano, would you have a chance to check whether https://gerrit.ovirt.org/106483 fixes the problem with after_migration events? (Originally by Milan Zamazal) Hi Milan, sure I can. Just did not have any chance to work on this today. Please hold a bit, hopefully tomorrow. (Originally by Germano Veit Michel) (In reply to Milan Zamazal from comment #8) > Germano, would you have a chance to check whether > https://gerrit.ovirt.org/106483 fixes the problem with after_migration > events? It seems to work for me. Did 4 migrations in a row, and it worked 4 times: # grep after /var/log/ovirt-guest-agent/ovirt-guest-agent.log Dummy-1::INFO::2020-01-24 13:14:07,029::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed Dummy-1::INFO::2020-01-24 13:19:58,731::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed Dummy-1::INFO::2020-01-24 13:21:23,520::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed Dummy-1::INFO::2020-01-24 13:22:41,195::hooks::64::Hooks::Hook(after_migration) "/etc/ovirt-guest-agent/hooks.d/after_migration/55_flush-caches" executed And the only Guest API change was when the Guest started, nothing on migrations: 2020-01-24 13:13:40,653+1000 INFO (vmchannels) [virt.vm] (vmId='59244f31-f6a2-474d-b52c-afe2a1e5e978') Guest API version changed from 0 to 3 (guestagent:292) (Originally by Germano Veit Michel) Hi Germano, thank you for testing. I posted a patch fixing the bug in master. (Originally by Milan Zamazal) You are welcome, and thanks for the fix! Is this on the radar for 4.3.z after the fix is accepted? There is a customer ticket attached... (Originally by Germano Veit Michel) The patch is basically one line, so it's definitely a suitable target (Originally by Ryan Barry) WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]
For more info please contact: rhv-devops
To clarify the scope of this issue: after_migrate hook functionality is available only when using ovirt-guest-agent. That means it can work only with el7 guests; it doesn't work with el8 guests since there is no ovirt-guest-agent on el8 and AFAIK there is no substitute of this kind of hook functionality anywhere. Verified on:
- RHV 4.3.9.1-0.1.el7
- vdsm-4.30.42-1.el7ev.x86_64 (host)
- libvirt-4.5.0-33.el7.x86_64 (host)
Verification steps:
1. Migrate VM.
Result:
after_migration is sent to the guest:
# tail -f /var/log/vdsm/vdsm.log | grep 'after_migration'
2020-03-09 13:52:38,583+0200 DEBUG (vm/be5d4f39) [virt.vm] (vmId='be5d4f39-cb05-4a4d-9c6e-9856ff9c3e61') send_lifecycle_event after_migration called (guestagent:565)
2020-03-09 13:52:38,932+0200 DEBUG (vm/be5d4f39) [virt.vm] (vmId='be5d4f39-cb05-4a4d-9c6e-9856ff9c3e61') sent '{"__name__": "lifecycle-event", "type": "after_migration"}\n' (guestagent:342)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1307 |