Bug 871616 - Guest agent information is missing after few VM's migrations
Guest agent information is missing after few VM's migrations
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.1.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.2.0
Assigned To: Vinzenz Feenstra [evilissimo]
Jiri Belka
virt
: Regression
: 870447 (view as bug list)
Depends On:
Blocks: 947888
  Show dependency treegraph
 
Reported: 2012-10-30 17:08 EDT by Oded Ramraz
Modified: 2013-06-10 16:32 EDT (History)
21 users (show)

See Also:
Fixed In Version: vdsm-4.10.2-16.0.el6ev
Doc Type: Bug Fix
Doc Text:
Previously, guest agent information vanished after virtual machines were migrated several times. This was because the virtual machine channel listener was not handling any errors. If an error occurred, VDSM did not try to reconnect and the connection to the guest was lost for the lifetime of the guest or until VDSM was restarted. A patch to VDSM introduces a mechanism to reconnect to the channel. When an error occurs, the setup callback is called, which gives the handled client a chance to recreate the socket and prepare it for a connect. After that callback is called, the erroneous connection is moved into the unconnected items dict where it will be handled by the event loop. If there have been 5 or more unsuccessful attempts made the reconnect rate will be slowed down to the same time as specified for the 'read timeout'. The items which are slowed down are moved into the 'reconnect_cooldown' dict. After this patch is applied, guest agent information does not vanish after several virtual machine migrations.
Story Points: ---
Clone Of:
: 947888 (view as bug list)
Environment:
Last Closed: 2013-06-10 16:32:37 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
vdsm and engine logs (3.61 MB, application/zip)
2012-10-30 17:18 EDT, Oded Ramraz
no flags Details
Guest Agent log (1.11 KB, application/x-gzip)
2012-11-15 12:26 EST, Barak Dagan
no flags Details
logs from local run (1.76 MB, application/x-gzip)
2013-01-07 08:46 EST, Barak Dagan
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 11977 None None None Never

  None (edit)
Description Oded Ramraz 2012-10-30 17:08:37 EDT
Description of problem:

After creating 5 VM's and installing RHEL6.3 with guest agent on them I migrated the VM's between 2 hosts few times ( using automated scripts ) .
After few migration processes guest agent info such as VM's IP was missing in both UI / API . 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Oded Ramraz 2012-10-30 17:18:53 EDT
Created attachment 635804 [details]
vdsm and engine logs
Comment 2 Barak 2012-11-01 06:10:18 EDT
Oded a few questions:

1 - did the info exist through vdsm-cli ?
2 - can we get the agent logs (debug mode) as well ?
Comment 3 Oded Ramraz 2012-11-13 15:24:18 EST
(In reply to comment #2)
> Oded a few questions:
> 
> 1 - did the info exist through vdsm-cli ?

The info is visible via vdsmCli after the guest installation process but it disappear after few migrations ( or hibernate / resume VM operations - since the test perform both actions )
We are able to reproduce this issue easily on few environments.

> 2 - can we get the agent logs (debug mode) as well ?

Yes , we'll attach those logs soon ( hopefully tomorrow )
Comment 5 Barak Dagan 2012-11-15 12:26:37 EST
Created attachment 645773 [details]
Guest Agent log
Comment 6 Barak Dagan 2012-11-15 12:28:16 EST
(In reply to comment #2)
> Oded a few questions:
> 
> 1 - did the info exist through vdsm-cli ?
> 2 - can we get the agent logs (debug mode) as well ?

It seems that the IP doesn't return after the VM suspend action
Comment 7 Andrew Cathrow 2012-12-03 06:52:58 EST
Could the problem be with virtio-serial not working after suspend - we can test to see if shutdown command works, etc.

Also is this after suspend only or is it really caused by migration (impacts severity)
Comment 8 Barak Dagan 2012-12-16 13:27:00 EST
(In reply to comment #7)
> Could the problem be with virtio-serial not working after suspend - we can
> test to see if shutdown command works, etc.
> 
> Also is this after suspend only or is it really caused by migration (impacts
> severity)

It seems that restarting the VDSM solves that issue. The agent can't be un/re-installed, probably since the virtio-serial is not working - restart the VDSM solves these two issues. As for the other questions, the sequence seems to be not so simple, when I manage to find it, I'll give the answers.
Comment 9 Vinzenz Feenstra [evilissimo] 2012-12-17 09:01:26 EST
I have noticed two things in the logs:

1. The VDSM Host 2 logs are full of SSL certificate validation errors which leads me to the conclusion of a misconfigured Host/Engine setup.
2. The VDSM Host 1 logs seem to have some libvirt connection issues. I am not really sure what it is, however in the light of having RHEVM 3.1 just right of the door I would like to see this reproduced with the RHEVM 3.1 release version.

Also this issue is not related to the guest agent, this issue must be somewhere on VDSM, libvirt, qemu or the drivers.

Please try to reproduce this with an appropriately configured setup and please provide fresh logs from the RHEVM 3.1 environment.

Thanks.
Comment 10 Barak Dagan 2012-12-18 07:17:58 EST
(In reply to comment #7)
> Could the problem be with virtio-serial not working after suspend - we can
> test to see if shutdown command works, etc.
> 
> Also is this after suspend only or is it really caused by migration (impacts
> severity)

the shutdown command is working though
Comment 11 Barak Dagan 2012-12-26 14:02:42 EST
This happens after a few migrations. I think that I can reproduce it migrating the vms between the two hosts using SDK.

Which logs do you need ? engine and 2 VDSMs ?
Comment 12 Vinzenz Feenstra [evilissimo] 2013-01-03 01:56:46 EST
I personally need only VDSM logs, however if there's a problem somewhere else engine logs might be needed as well.
Therefore add both of them. Thanks.
Comment 13 Vinzenz Feenstra [evilissimo] 2013-01-03 06:56:33 EST
(In reply to comment #10)
> (In reply to comment #7)
> > Could the problem be with virtio-serial not working after suspend - we can
> > test to see if shutdown command works, etc.
> > 
> > Also is this after suspend only or is it really caused by migration (impacts
> > severity)
> 
> the shutdown command is working though

Well the question is how the shutdown performed in the end. It won't tell anything if the GA version timed out and the ACPI shutdown kicked in.
Comment 14 Barak Dagan 2013-01-03 08:14:21 EST
(In reply to comment #13)
> (In reply to comment #10)
> > (In reply to comment #7)
> > > Could the problem be with virtio-serial not working after suspend - we can
> > > test to see if shutdown command works, etc.
> > > 
> > > Also is this after suspend only or is it really caused by migration (impacts
> > > severity)
> > 
> > the shutdown command is working though
> 
> Well the question is how the shutdown performed in the end. It won't tell
> anything if the GA version timed out and the ACPI shutdown kicked in.

The shutdown performed smoothly, It seems to be virtio issue but I'll let you decide once I manage to get the logs
Comment 15 Vinzenz Feenstra [evilissimo] 2013-01-04 04:34:02 EST
Please attach the log files as soon as you have them. Thanks.
Comment 16 Barak Dagan 2013-01-07 08:46:35 EST
Created attachment 674021 [details]
logs from local run
Comment 31 Frantisek Kobzik 2013-04-11 04:37:57 EDT
*** Bug 870447 has been marked as a duplicate of this bug. ***
Comment 35 Jiri Belka 2013-04-30 06:59:39 EDT
ok, vdsm-4.10.2-16.0.el6ev.x86_64.
Comment 38 errata-xmlrpc 2013-06-10 16:32:37 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html

Note You need to log in before you can comment on or make changes to this bug.