Bug 871616
Summary: | Guest agent information is missing after few VM's migrations | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Oded Ramraz <oramraz> | ||||||||
Component: | vdsm | Assignee: | Vinzenz Feenstra [evilissimo] <vfeenstr> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Jiri Belka <jbelka> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.1.0 | CC: | abaron, acathrow, bazulay, danken, dcaroest, dyasny, hateya, iheim, italkohe, lpeer, michal.skrivanek, mkenneth, mpavlik, pep, pstehlik, rhev-integ, sgrinber, sputhenp, vipatel, ykaul, zdover | ||||||||
Target Milestone: | --- | Keywords: | Regression | ||||||||
Target Release: | 3.2.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | virt | ||||||||||
Fixed In Version: | vdsm-4.10.2-16.0.el6ev | Doc Type: | Bug Fix | ||||||||
Doc Text: |
Previously, guest agent information vanished after virtual machines were migrated several times. This was because the virtual machine channel listener was not handling any errors. If an error occurred, VDSM did not try to reconnect and the connection to the guest was lost for the lifetime of the guest or until VDSM was restarted.
A patch to VDSM introduces a mechanism to reconnect to the
channel. When an error occurs, the setup callback is called, which gives the handled client a chance to recreate the socket and prepare it for a connect.
After that callback is called, the erroneous connection is moved into the unconnected items dict where it will be handled by the event loop.
If there have been 5 or more unsuccessful attempts made the reconnect rate will be slowed down to the same time as specified for the 'read timeout'.
The items which are slowed down are moved into the 'reconnect_cooldown' dict.
After this patch is applied, guest agent information does not vanish after several virtual machine migrations.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 947888 (view as bug list) | Environment: | |||||||||
Last Closed: | 2013-06-10 20:32:37 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 947888 | ||||||||||
Attachments: |
|
Description
Oded Ramraz
2012-10-30 21:08:37 UTC
Created attachment 635804 [details]
vdsm and engine logs
Oded a few questions: 1 - did the info exist through vdsm-cli ? 2 - can we get the agent logs (debug mode) as well ? (In reply to comment #2) > Oded a few questions: > > 1 - did the info exist through vdsm-cli ? The info is visible via vdsmCli after the guest installation process but it disappear after few migrations ( or hibernate / resume VM operations - since the test perform both actions ) We are able to reproduce this issue easily on few environments. > 2 - can we get the agent logs (debug mode) as well ? Yes , we'll attach those logs soon ( hopefully tomorrow ) Created attachment 645773 [details]
Guest Agent log
(In reply to comment #2) > Oded a few questions: > > 1 - did the info exist through vdsm-cli ? > 2 - can we get the agent logs (debug mode) as well ? It seems that the IP doesn't return after the VM suspend action Could the problem be with virtio-serial not working after suspend - we can test to see if shutdown command works, etc. Also is this after suspend only or is it really caused by migration (impacts severity) (In reply to comment #7) > Could the problem be with virtio-serial not working after suspend - we can > test to see if shutdown command works, etc. > > Also is this after suspend only or is it really caused by migration (impacts > severity) It seems that restarting the VDSM solves that issue. The agent can't be un/re-installed, probably since the virtio-serial is not working - restart the VDSM solves these two issues. As for the other questions, the sequence seems to be not so simple, when I manage to find it, I'll give the answers. I have noticed two things in the logs: 1. The VDSM Host 2 logs are full of SSL certificate validation errors which leads me to the conclusion of a misconfigured Host/Engine setup. 2. The VDSM Host 1 logs seem to have some libvirt connection issues. I am not really sure what it is, however in the light of having RHEVM 3.1 just right of the door I would like to see this reproduced with the RHEVM 3.1 release version. Also this issue is not related to the guest agent, this issue must be somewhere on VDSM, libvirt, qemu or the drivers. Please try to reproduce this with an appropriately configured setup and please provide fresh logs from the RHEVM 3.1 environment. Thanks. (In reply to comment #7) > Could the problem be with virtio-serial not working after suspend - we can > test to see if shutdown command works, etc. > > Also is this after suspend only or is it really caused by migration (impacts > severity) the shutdown command is working though This happens after a few migrations. I think that I can reproduce it migrating the vms between the two hosts using SDK. Which logs do you need ? engine and 2 VDSMs ? I personally need only VDSM logs, however if there's a problem somewhere else engine logs might be needed as well. Therefore add both of them. Thanks. (In reply to comment #10) > (In reply to comment #7) > > Could the problem be with virtio-serial not working after suspend - we can > > test to see if shutdown command works, etc. > > > > Also is this after suspend only or is it really caused by migration (impacts > > severity) > > the shutdown command is working though Well the question is how the shutdown performed in the end. It won't tell anything if the GA version timed out and the ACPI shutdown kicked in. (In reply to comment #13) > (In reply to comment #10) > > (In reply to comment #7) > > > Could the problem be with virtio-serial not working after suspend - we can > > > test to see if shutdown command works, etc. > > > > > > Also is this after suspend only or is it really caused by migration (impacts > > > severity) > > > > the shutdown command is working though > > Well the question is how the shutdown performed in the end. It won't tell > anything if the GA version timed out and the ACPI shutdown kicked in. The shutdown performed smoothly, It seems to be virtio issue but I'll let you decide once I manage to get the logs Please attach the log files as soon as you have them. Thanks. Created attachment 674021 [details]
logs from local run
*** Bug 870447 has been marked as a duplicate of this bug. *** merged to master: http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=5b5c58580e20ffaf3ceff7193f4c28cbadd8c42f ok, vdsm-4.10.2-16.0.el6ev.x86_64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html |