Bug 1026474 - blank guestIPs are returned in vmGetStats after migration
Summary: blank guestIPs are returned in vmGetStats after migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: Windows Guest Tools
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.3.0
Assignee: Gal Hammer
QA Contact: Ilanit Stein
URL:
Whiteboard: virt
: 1050993 (view as bug list)
Depends On: 1037547
Blocks: 1050993
TreeView+ depends on / blocked
 
Reported: 2013-11-04 18:13 UTC by David Jaša
Modified: 2014-02-28 07:38 UTC (History)
23 users (show)

Fixed In Version: rhev-guest-tools-iso-3.3-10
Doc Type: Enhancement
Doc Text:
Windows' virtio-serial driver behavior was changed so a write request when the host is not connected results in immediate return with an error. The vmGetStats command no longer yields blank IPs after migration.
Clone Of:
: 1050993 (view as bug list)
Environment:
Last Closed: 2014-01-21 15:55:46 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Install.rhev-agent-service log (1.10 MB, image/png)
2013-11-07 15:05 UTC, Ilanit Stein
no flags Details
Restarting the guest agent service on the affected VMs failed on: (1.19 MB, image/png)
2013-11-07 15:13 UTC, Ilanit Stein
no flags Details
Vdsm log (594.06 KB, text/x-log)
2014-01-02 16:18 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1069775 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHBA-2014:0075 0 normal SHIPPED_LIVE rhev-guest-tools-iso bug fix and enhancement update 2014-01-21 20:14:48 UTC
oVirt gerrit 23216 0 None None None Never

Internal Links: 1069775

Description David Jaša 2013-11-04 18:13:41 UTC
Description of problem:
blank guestIPs are returned in vmGetStats after migration. Sometimes, the info gets updated correctly again, sometimes blank guest IPs persist "forever". Sometimes, the screen gets locked for no apparent reason but engine - ... - agent connectivity glitch with subsequent call to lock desktop is my suspect, supported also by different behaviour of different guests:
 * RHEL guest seems immune to blank guest IPs problem but sudden locks do sometimes occur
 * windows 2008 R2 doesn't re-fill guestIPs ever, screensaver locks sometime occur
 * windows 7 don't refill guestIPs but I didn't notice desktop locks
 * windows xp refills guestIPs after few minutes

Version-Release number of selected component (if applicable):
is 20.2
agent from 3.3-6 tools
vdsm-4.13.0-0.3.beta1.el6ev.x86_64
rhevm-backend-3.3.0-0.30.beta1.el6ev.noarch

How reproducible:
always to random (see Description for random symptoms)

Steps to Reproduce:
1. migrate a Windows VM with rhev agent running
2.
3.

Actual results:
* ip info is lost after migration
* sometimes, the desktop gets locked

Expected results:
ip info doesn't change
the desktop should not get locked (this is harder to reproduce/verify than ip info however)

Additional info:

Comment 1 David Jaša 2013-11-06 15:52:52 UTC
Few more findings:
  * guestIPs is blank after both migration and vdsmd restart
  * fqdn is filled after migration but empty after vdsmd restart
  * applist is filled in both cases

agent log snippet: agent restart followed by migration followed by vdsm restart:
Dummy-1::INFO::2013-11-06 15:40:04,663::ovirtguestservice::76::root::Stopping OVirt Guest Agent service
Dummy-1::INFO::2013-11-06 15:40:07,155::ovirtguestservice::63::root::Starting OVirt Guest Agent service
Dummy-2::ERROR::2013-11-06 15:41:28,280::ovirtagentlogic::148::root::AgentLogicBase::doWork
Traceback (most recent call last):
  File "OVirtAgentLogic.pyc", line 130, in doWork
  File "VirtIoChannel.pyc", line 198, in write
  File "VirtIoChannel.pyc", line 145, in write
  File "WinFile.pyc", line 66, in write
error: (554, 'GetOverlappedResult', 'Used to indicate that an operation cannot continue without blocking for I/O.')
Dummy-3::INFO::2013-11-06 15:41:28,678::ovirtagentlogic::169::root::Received an external command: refresh...
Dummy-3::ERROR::2013-11-06 15:41:28,694::guestagentwin32::76::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces
  File "win32com\client\__init__.pyc", line 95, in Dispatch
  File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName
  File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch
com_error: (-2147221008, 'CoInitialize has not been called.', None, None)
Dummy-3::INFO::2013-11-06 15:45:59,206::ovirtagentlogic::169::root::Received an external command: refresh...
Dummy-3::ERROR::2013-11-06 15:45:59,450::guestagentwin32::386::root::Error on user name translation.
Traceback (most recent call last):
  File "GuestAgentWin32.pyc", line 379, in getActiveUser
error: (5, 'TranslateName', 'Access is denied.')
Dummy-3::ERROR::2013-11-06 15:45:59,473::guestagentwin32::76::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces
  File "win32com\client\__init__.pyc", line 95, in Dispatch
  File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName
  File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch
com_error: (-2147221008, 'CoInitialize has not been called.', None, None)

Comment 2 Ilanit Stein 2013-11-07 15:05:22 UTC
Created attachment 821201 [details]
Install.rhev-agent-service log

Comment 3 Ilanit Stein 2013-11-07 15:09:02 UTC
Comment on attachment 821201 [details]
Install.rhev-agent-service log

Same problem occurred to me as well. Adding log

Comment 4 Ilanit Stein 2013-11-07 15:13:03 UTC
Created attachment 821209 [details]
Restarting the guest agent service on the affected VMs failed on:

Comment 5 Vinzenz Feenstra [evilissimo] 2013-11-07 15:49:45 UTC
(In reply to Ilanit Stein from comment #4)
> Created attachment 821209 [details]
> Restarting the guest agent service on the affected VMs failed on:

That's a strange error, please check the the VirtIO Serial driver is loaded.

Comment 6 David Jaša 2013-11-08 22:26:32 UTC
Regarding desktop locks - based on tests I've done during this week, it seems that they were caused by SWITCH_HOST mode of migration triggered by bug 1022565: SWITCH_HOST means that the client connects to destination after migration finishes and it disconnects from source when it exits so when there is any delay in establishing the connection, it can easily take 2 s from disconnect from source to connection to destination, making the lock kick in.

Comment 9 Michal Skrivanek 2013-11-19 15:51:50 UTC
may be related to bug 1023489

Comment 11 Mike Cao 2013-11-22 05:15:08 UTC
Hi, does the two hosts in the same subnet ?
pls paste the host ip address ,gateway for both source host and destination hosts

Comment 12 David Jaša 2013-11-25 17:25:35 UTC
(In reply to Ronen Hod from comment #10)
> Can you try the latest virtio-win drivers <...>

I tried and now the behaviour is different, right after migration, IP disappears for a while and this error is in Install.rhev-agent-service log file:
Dummy-3::ERROR::2013-11-06 15:45:59,473::guestagentwin32::76::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces
  File "win32com\client\__init__.pyc", line 95, in Dispatch
  File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName
  File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch
com_error: (-2147221008, 'CoInitialize has not been called.', None, None)

After a short while (~15 seconds, did not measure), the IP is reported in webadmin again. So it seems that virtio part of this bug is fixed by the new drivers.

Comment 13 Vinzenz Feenstra [evilissimo] 2013-12-05 14:02:46 UTC
Works with drivers fix in BZ#1037547 (is26 contains them)

Comment 14 Artyom 2014-01-02 16:06:15 UTC
Checked on is29, RHEV-toolsSetup_3.3_9.iso
Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254
Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254

After start, ip appear in UI.
After first migration from host2->host1, ip appear after about 15 minutes
After second migration from host1->host2: ip not appear, also after waiting 5 minutes

Also still have exceptions in Install.rhev-agent-service:
Dummy-3::ERROR::2014-01-02 15:48:31,905::ovirtagentlogic::148::root::AgentLogicBase::doWork
Traceback (most recent call last):
  File "OVirtAgentLogic.pyc", line 130, in doWork
  File "VirtIoChannel.pyc", line 198, in write
  File "VirtIoChannel.pyc", line 145, in write
  File "WinFile.pyc", line 66, in write
error: (554, 'GetOverlappedResult', 'Used to indicate that an operation cannot continue without blocking for I/O.')
Dummy-2::INFO::2014-01-02 15:48:33,375::ovirtagentlogic::169::root::Received an external command: refresh...
Dummy-2::ERROR::2014-01-02 15:48:33,390::guestagentwin32::76::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces
  File "win32com\client\__init__.pyc", line 95, in Dispatch
  File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName
  File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch
com_error: (-2147221008, 'CoInitialize has not been called.', None, None)
Dummy-2::INFO::2014-01-02 15:49:37,046::ovirtagentlogic::169::root::Received an external command: lock-screen...
Dummy-2::INFO::2014-01-02 15:49:37,092::ovirtagentlogic::169::root::Received an external command: lock-screen...
Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received an external command: lock-screen...
Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received an external command: lock-screen...
Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received an external command: lock-screen...

Comment 15 Artyom 2014-01-02 16:07:00 UTC
Also on vm run windows 7 64-bit

Comment 16 Artyom 2014-01-02 16:18:27 UTC
Created attachment 844614 [details]
Vdsm log

Also see exception in vdsm log, log from one of hosts, but the same error I can see on second host.

Comment 18 Artyom 2014-01-03 09:01:40 UTC
After first migration from host2->host1, ip appear after about 15 minutes
After 15 seconds

(In reply to Artyom from comment #14)
> Checked on is29, RHEV-toolsSetup_3.3_9.iso
> Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254
> Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254
> 
> After start, ip appear in UI.
> After first migration from host2->host1, ip appear after about 15 minutes
> After second migration from host1->host2: ip not appear, also after waiting
> 5 minutes
> 
> Also still have exceptions in Install.rhev-agent-service:
> Dummy-3::ERROR::2014-01-02
> 15:48:31,905::ovirtagentlogic::148::root::AgentLogicBase::doWork
> Traceback (most recent call last):
>   File "OVirtAgentLogic.pyc", line 130, in doWork
>   File "VirtIoChannel.pyc", line 198, in write
>   File "VirtIoChannel.pyc", line 145, in write
>   File "WinFile.pyc", line 66, in write
> error: (554, 'GetOverlappedResult', 'Used to indicate that an operation
> cannot continue without blocking for I/O.')
> Dummy-2::INFO::2014-01-02 15:48:33,375::ovirtagentlogic::169::root::Received
> an external command: refresh...
> Dummy-2::ERROR::2014-01-02 15:48:33,390::guestagentwin32::76::root::Error
> retrieving network interfaces.
> Traceback (most recent call last):
>   File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces
>   File "win32com\client\__init__.pyc", line 95, in Dispatch
>   File "win32com\client\dynamic.pyc", line 108, in
> _GetGoodDispatchAndUserName
>   File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch
> com_error: (-2147221008, 'CoInitialize has not been called.', None, None)
> Dummy-2::INFO::2014-01-02 15:49:37,046::ovirtagentlogic::169::root::Received
> an external command: lock-screen...
> Dummy-2::INFO::2014-01-02 15:49:37,092::ovirtagentlogic::169::root::Received
> an external command: lock-screen...
> Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received
> an external command: lock-screen...
> Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received
> an external command: lock-screen...
> Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received
> an external command: lock-screen...

Comment 19 Mike Cao 2014-01-06 03:09:27 UTC
(In reply to Artyom from comment #14)
> Checked on is29, RHEV-toolsSetup_3.3_9.iso
> Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254
> Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254
> 
Looks they are in the different subnets .Could you reproduce it on 2 hosts in the same subnet ? Does it happen on Linux guest as well ?

Mike

Comment 20 Vinzenz Feenstra [evilissimo] 2014-01-06 07:36:31 UTC
(In reply to Mike Cao from comment #19)
> (In reply to Artyom from comment #14)
> > Checked on is29, RHEV-toolsSetup_3.3_9.iso
> > Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254
> > Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254
> > 
> Looks they are in the different subnets .Could you reproduce it on 2 hosts
> in the same subnet ? Does it happen on Linux guest as well ?
> 
> Mike

How is this relevant? The problem is that the virtio-serial driver failing for whatever reason:
> error: (554, 'GetOverlappedResult', 'Used to indicate that an operation
> cannot continue without blocking for I/O.')

Since the virtio-serial driver is not working, the communication with the hypervisor is blocked and therefore the reporting of the IPs is no longer working.
The report says 'blank guestIPs' however there is way more not reported than just the IPs and that it completely unrelated.

Comment 21 Mike Cao 2014-01-06 07:48:54 UTC
(In reply to Vinzenz Feenstra [evilissimo] from comment #20)
> (In reply to Mike Cao from comment #19)
> > (In reply to Artyom from comment #14)
> > > Checked on is29, RHEV-toolsSetup_3.3_9.iso
> > > Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254
> > > Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254
> > > 
> > Looks they are in the different subnets .Could you reproduce it on 2 hosts
> > in the same subnet ? Does it happen on Linux guest as well ?
> > 
> > Mike
> 
> How is this relevant? The problem is that the virtio-serial driver failing
> for whatever reason:


> > error: (554, 'GetOverlappedResult', 'Used to indicate that an operation
> > cannot continue without blocking for I/O.')
> 
> Since the virtio-serial driver is not working, the communication with the
> hypervisor is blocked and therefore the reporting of the IPs is no longer
> working.
> The report says 'blank guestIPs' however there is way more not reported than
> just the IPs and that it completely unrelated.

Not sure I understand this issue correctly .I assume the guest is using DHCP to get the IP .When guest has the same subnet ip as host A when it is located in host A. while the the guest still has the hostA' subnet IP when it migrate to hostB .the dhcp server may assign a new ip to the guest which may need plenty of time .

As the reporter did not see any abnormal for applications ,login user shown in RHEVM ,What's why I think it may related to network 


Mike

Comment 22 Vinzenz Feenstra [evilissimo] 2014-01-06 08:05:22 UTC
(In reply to Mike Cao from comment #21)
> Not sure I understand this issue correctly .I assume the guest is using DHCP
> to get the IP .When guest has the same subnet ip as host A when it is
> located in host A. while the the guest still has the hostA' subnet IP when
> it migrate to hostB .the dhcp server may assign a new ip to the guest which
> may need plenty of time .
> 
> As the reporter did not see any abnormal for applications ,login user shown
> in RHEVM ,What's why I think it may related to network 

The application list is stored and kept on the database in RHEVM that's why it is not empty in the webadmin interface.

I have originally reproduced this issue myself and this seemed to have been a regression in the drivers as updating the drivers fixed for the first migration, however in Comment c#14 it showed that it still now still doing the same after the second migration. It looks to me like that the driver state gets affected during migrations and the fix applied to fix the issue did only resolve it partially.

You can ask to get this reproduced in the same subnet, but this issue will persist.

@Artyom: For the sake of problem exclusion please retest as Mike requested, by using two hypervisors in the same subnet. thank you.

Comment 23 Mike Cao 2014-01-06 08:12:17 UTC
(In reply to Vinzenz Feenstra [evilissimo] from comment #13)
> Works with drivers fix in BZ#1037547 (is26 contains them)

Do you mean comment #12 ? But I don't think we have a fix for BZ#1037547 currently and it will be fixed soon ...

Mike

Comment 24 Mike Cao 2014-01-06 08:16:48 UTC
(In reply to Mike Cao from comment #23)
> (In reply to Vinzenz Feenstra [evilissimo] from comment #13)
> > Works with drivers fix in BZ#1037547 (is26 contains them)
> 
> Do you mean comment #12 ? But I don't think we have a fix for BZ#1037547
> currently and it will be fixed soon ...
> 
> Mike

Rechecked comment #0 
"of different guests:
 * RHEL guest seems immune to blank guest IPs problem but sudden locks do sometimes occur
 * windows 2008 R2 doesn't re-fill guestIPs ever, screensaver locks sometime occur
 * windows 7 don't refill guestIPs but I didn't notice desktop locks
 * windows xp refills guestIPs after few minutes
"

BZ#1037547 is a windows server 2012 virtio-serial specified bug ,it can not affect other platforms

Comment 26 Artyom 2014-01-06 13:15:55 UTC
Checked on hosts:
Host 1 10.35.109.15
Host 2 10.35.109.14

Same result for windows 7 x64, same error message in vdsm.log and Install.rhev-agent-service log

Vm with RHEL6.5 always show ip without any error messages in logs

Comment 27 Gal Hammer 2014-01-13 13:30:45 UTC
A patch was posted upstream: http://gerrit.ovirt.org/23216

Comment 28 Gal Hammer 2014-01-14 08:12:36 UTC
*** Bug 1050993 has been marked as a duplicate of this bug. ***

Comment 29 Gal Hammer 2014-01-14 08:39:14 UTC
The bug is not directly connected to migration. A simpler scenario would be to stop vdsm, wait ~10 second (give the agent a change to write something to the virtio-serial port) and then start vdsm.

Comment 30 Vinzenz Feenstra [evilissimo] 2014-01-14 10:06:34 UTC
Merged to u/s ovirt-guest-agent master as http://gerrit.ovirt.org/gitweb?p=ovirt-guest-agent.git;a=commit;h=17d61aa3be1f85b70f72bb2756f8b3f046db3de8

Comment 33 Ilanit Stein 2014-01-16 13:53:56 UTC
Verified on rhevm is32.2:

1. Run a win 7x64 VM, install rhev-guest-tools-iso-3.3-9, 
   See the VM ip iddress in UI.
   Stop vdsmd service, wait a few min, and start vdsmd service.
   See that the ip address is indeed not seen in UI.

2. Install rhev-guest-tools-iso-3.3-10, 
   Stop vdsmd service, wait a few min, and start vdsmd service.
   See that the ip address is SEEN in UI.

Comment 34 errata-xmlrpc 2014-01-21 15:55:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0075.html


Note You need to log in before you can comment on or make changes to this bug.