Description of problem: blank guestIPs are returned in vmGetStats after migration. Sometimes, the info gets updated correctly again, sometimes blank guest IPs persist "forever". Sometimes, the screen gets locked for no apparent reason but engine - ... - agent connectivity glitch with subsequent call to lock desktop is my suspect, supported also by different behaviour of different guests: * RHEL guest seems immune to blank guest IPs problem but sudden locks do sometimes occur * windows 2008 R2 doesn't re-fill guestIPs ever, screensaver locks sometime occur * windows 7 don't refill guestIPs but I didn't notice desktop locks * windows xp refills guestIPs after few minutes Version-Release number of selected component (if applicable): is 20.2 agent from 3.3-6 tools vdsm-4.13.0-0.3.beta1.el6ev.x86_64 rhevm-backend-3.3.0-0.30.beta1.el6ev.noarch How reproducible: always to random (see Description for random symptoms) Steps to Reproduce: 1. migrate a Windows VM with rhev agent running 2. 3. Actual results: * ip info is lost after migration * sometimes, the desktop gets locked Expected results: ip info doesn't change the desktop should not get locked (this is harder to reproduce/verify than ip info however) Additional info:
Few more findings: * guestIPs is blank after both migration and vdsmd restart * fqdn is filled after migration but empty after vdsmd restart * applist is filled in both cases agent log snippet: agent restart followed by migration followed by vdsm restart: Dummy-1::INFO::2013-11-06 15:40:04,663::ovirtguestservice::76::root::Stopping OVirt Guest Agent service Dummy-1::INFO::2013-11-06 15:40:07,155::ovirtguestservice::63::root::Starting OVirt Guest Agent service Dummy-2::ERROR::2013-11-06 15:41:28,280::ovirtagentlogic::148::root::AgentLogicBase::doWork Traceback (most recent call last): File "OVirtAgentLogic.pyc", line 130, in doWork File "VirtIoChannel.pyc", line 198, in write File "VirtIoChannel.pyc", line 145, in write File "WinFile.pyc", line 66, in write error: (554, 'GetOverlappedResult', 'Used to indicate that an operation cannot continue without blocking for I/O.') Dummy-3::INFO::2013-11-06 15:41:28,678::ovirtagentlogic::169::root::Received an external command: refresh... Dummy-3::ERROR::2013-11-06 15:41:28,694::guestagentwin32::76::root::Error retrieving network interfaces. Traceback (most recent call last): File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces File "win32com\client\__init__.pyc", line 95, in Dispatch File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch com_error: (-2147221008, 'CoInitialize has not been called.', None, None) Dummy-3::INFO::2013-11-06 15:45:59,206::ovirtagentlogic::169::root::Received an external command: refresh... Dummy-3::ERROR::2013-11-06 15:45:59,450::guestagentwin32::386::root::Error on user name translation. Traceback (most recent call last): File "GuestAgentWin32.pyc", line 379, in getActiveUser error: (5, 'TranslateName', 'Access is denied.') Dummy-3::ERROR::2013-11-06 15:45:59,473::guestagentwin32::76::root::Error retrieving network interfaces. Traceback (most recent call last): File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces File "win32com\client\__init__.pyc", line 95, in Dispatch File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch com_error: (-2147221008, 'CoInitialize has not been called.', None, None)
Created attachment 821201 [details] Install.rhev-agent-service log
Comment on attachment 821201 [details] Install.rhev-agent-service log Same problem occurred to me as well. Adding log
Created attachment 821209 [details] Restarting the guest agent service on the affected VMs failed on:
(In reply to Ilanit Stein from comment #4) > Created attachment 821209 [details] > Restarting the guest agent service on the affected VMs failed on: That's a strange error, please check the the VirtIO Serial driver is loaded.
Regarding desktop locks - based on tests I've done during this week, it seems that they were caused by SWITCH_HOST mode of migration triggered by bug 1022565: SWITCH_HOST means that the client connects to destination after migration finishes and it disconnects from source when it exits so when there is any delay in establishing the connection, it can easily take 2 s from disconnect from source to connection to destination, making the lock kick in.
may be related to bug 1023489
Hi, does the two hosts in the same subnet ? pls paste the host ip address ,gateway for both source host and destination hosts
(In reply to Ronen Hod from comment #10) > Can you try the latest virtio-win drivers <...> I tried and now the behaviour is different, right after migration, IP disappears for a while and this error is in Install.rhev-agent-service log file: Dummy-3::ERROR::2013-11-06 15:45:59,473::guestagentwin32::76::root::Error retrieving network interfaces. Traceback (most recent call last): File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces File "win32com\client\__init__.pyc", line 95, in Dispatch File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch com_error: (-2147221008, 'CoInitialize has not been called.', None, None) After a short while (~15 seconds, did not measure), the IP is reported in webadmin again. So it seems that virtio part of this bug is fixed by the new drivers.
Works with drivers fix in BZ#1037547 (is26 contains them)
Checked on is29, RHEV-toolsSetup_3.3_9.iso Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254 Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254 After start, ip appear in UI. After first migration from host2->host1, ip appear after about 15 minutes After second migration from host1->host2: ip not appear, also after waiting 5 minutes Also still have exceptions in Install.rhev-agent-service: Dummy-3::ERROR::2014-01-02 15:48:31,905::ovirtagentlogic::148::root::AgentLogicBase::doWork Traceback (most recent call last): File "OVirtAgentLogic.pyc", line 130, in doWork File "VirtIoChannel.pyc", line 198, in write File "VirtIoChannel.pyc", line 145, in write File "WinFile.pyc", line 66, in write error: (554, 'GetOverlappedResult', 'Used to indicate that an operation cannot continue without blocking for I/O.') Dummy-2::INFO::2014-01-02 15:48:33,375::ovirtagentlogic::169::root::Received an external command: refresh... Dummy-2::ERROR::2014-01-02 15:48:33,390::guestagentwin32::76::root::Error retrieving network interfaces. Traceback (most recent call last): File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces File "win32com\client\__init__.pyc", line 95, in Dispatch File "win32com\client\dynamic.pyc", line 108, in _GetGoodDispatchAndUserName File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch com_error: (-2147221008, 'CoInitialize has not been called.', None, None) Dummy-2::INFO::2014-01-02 15:49:37,046::ovirtagentlogic::169::root::Received an external command: lock-screen... Dummy-2::INFO::2014-01-02 15:49:37,092::ovirtagentlogic::169::root::Received an external command: lock-screen... Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received an external command: lock-screen... Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received an external command: lock-screen... Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received an external command: lock-screen...
Also on vm run windows 7 64-bit
Created attachment 844614 [details] Vdsm log Also see exception in vdsm log, log from one of hosts, but the same error I can see on second host.
After first migration from host2->host1, ip appear after about 15 minutes After 15 seconds (In reply to Artyom from comment #14) > Checked on is29, RHEV-toolsSetup_3.3_9.iso > Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254 > Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254 > > After start, ip appear in UI. > After first migration from host2->host1, ip appear after about 15 minutes > After second migration from host1->host2: ip not appear, also after waiting > 5 minutes > > Also still have exceptions in Install.rhev-agent-service: > Dummy-3::ERROR::2014-01-02 > 15:48:31,905::ovirtagentlogic::148::root::AgentLogicBase::doWork > Traceback (most recent call last): > File "OVirtAgentLogic.pyc", line 130, in doWork > File "VirtIoChannel.pyc", line 198, in write > File "VirtIoChannel.pyc", line 145, in write > File "WinFile.pyc", line 66, in write > error: (554, 'GetOverlappedResult', 'Used to indicate that an operation > cannot continue without blocking for I/O.') > Dummy-2::INFO::2014-01-02 15:48:33,375::ovirtagentlogic::169::root::Received > an external command: refresh... > Dummy-2::ERROR::2014-01-02 15:48:33,390::guestagentwin32::76::root::Error > retrieving network interfaces. > Traceback (most recent call last): > File "GuestAgentWin32.pyc", line 54, in GetNetworkInterfaces > File "win32com\client\__init__.pyc", line 95, in Dispatch > File "win32com\client\dynamic.pyc", line 108, in > _GetGoodDispatchAndUserName > File "win32com\client\dynamic.pyc", line 85, in _GetGoodDispatch > com_error: (-2147221008, 'CoInitialize has not been called.', None, None) > Dummy-2::INFO::2014-01-02 15:49:37,046::ovirtagentlogic::169::root::Received > an external command: lock-screen... > Dummy-2::INFO::2014-01-02 15:49:37,092::ovirtagentlogic::169::root::Received > an external command: lock-screen... > Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received > an external command: lock-screen... > Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received > an external command: lock-screen... > Dummy-2::INFO::2014-01-02 15:49:37,108::ovirtagentlogic::169::root::Received > an external command: lock-screen...
(In reply to Artyom from comment #14) > Checked on is29, RHEV-toolsSetup_3.3_9.iso > Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254 > Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254 > Looks they are in the different subnets .Could you reproduce it on 2 hosts in the same subnet ? Does it happen on Linux guest as well ? Mike
(In reply to Mike Cao from comment #19) > (In reply to Artyom from comment #14) > > Checked on is29, RHEV-toolsSetup_3.3_9.iso > > Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254 > > Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254 > > > Looks they are in the different subnets .Could you reproduce it on 2 hosts > in the same subnet ? Does it happen on Linux guest as well ? > > Mike How is this relevant? The problem is that the virtio-serial driver failing for whatever reason: > error: (554, 'GetOverlappedResult', 'Used to indicate that an operation > cannot continue without blocking for I/O.') Since the virtio-serial driver is not working, the communication with the hypervisor is blocked and therefore the reporting of the IPs is no longer working. The report says 'blank guestIPs' however there is way more not reported than just the IPs and that it completely unrelated.
(In reply to Vinzenz Feenstra [evilissimo] from comment #20) > (In reply to Mike Cao from comment #19) > > (In reply to Artyom from comment #14) > > > Checked on is29, RHEV-toolsSetup_3.3_9.iso > > > Host 1: ip addr:10.35.64.85 Gateway: 10.35.64.254 > > > Host 2: ip addr:10.35.97.36 Gateway: 10.35.97.254 > > > > > Looks they are in the different subnets .Could you reproduce it on 2 hosts > > in the same subnet ? Does it happen on Linux guest as well ? > > > > Mike > > How is this relevant? The problem is that the virtio-serial driver failing > for whatever reason: > > error: (554, 'GetOverlappedResult', 'Used to indicate that an operation > > cannot continue without blocking for I/O.') > > Since the virtio-serial driver is not working, the communication with the > hypervisor is blocked and therefore the reporting of the IPs is no longer > working. > The report says 'blank guestIPs' however there is way more not reported than > just the IPs and that it completely unrelated. Not sure I understand this issue correctly .I assume the guest is using DHCP to get the IP .When guest has the same subnet ip as host A when it is located in host A. while the the guest still has the hostA' subnet IP when it migrate to hostB .the dhcp server may assign a new ip to the guest which may need plenty of time . As the reporter did not see any abnormal for applications ,login user shown in RHEVM ,What's why I think it may related to network Mike
(In reply to Mike Cao from comment #21) > Not sure I understand this issue correctly .I assume the guest is using DHCP > to get the IP .When guest has the same subnet ip as host A when it is > located in host A. while the the guest still has the hostA' subnet IP when > it migrate to hostB .the dhcp server may assign a new ip to the guest which > may need plenty of time . > > As the reporter did not see any abnormal for applications ,login user shown > in RHEVM ,What's why I think it may related to network The application list is stored and kept on the database in RHEVM that's why it is not empty in the webadmin interface. I have originally reproduced this issue myself and this seemed to have been a regression in the drivers as updating the drivers fixed for the first migration, however in Comment c#14 it showed that it still now still doing the same after the second migration. It looks to me like that the driver state gets affected during migrations and the fix applied to fix the issue did only resolve it partially. You can ask to get this reproduced in the same subnet, but this issue will persist. @Artyom: For the sake of problem exclusion please retest as Mike requested, by using two hypervisors in the same subnet. thank you.
(In reply to Vinzenz Feenstra [evilissimo] from comment #13) > Works with drivers fix in BZ#1037547 (is26 contains them) Do you mean comment #12 ? But I don't think we have a fix for BZ#1037547 currently and it will be fixed soon ... Mike
(In reply to Mike Cao from comment #23) > (In reply to Vinzenz Feenstra [evilissimo] from comment #13) > > Works with drivers fix in BZ#1037547 (is26 contains them) > > Do you mean comment #12 ? But I don't think we have a fix for BZ#1037547 > currently and it will be fixed soon ... > > Mike Rechecked comment #0 "of different guests: * RHEL guest seems immune to blank guest IPs problem but sudden locks do sometimes occur * windows 2008 R2 doesn't re-fill guestIPs ever, screensaver locks sometime occur * windows 7 don't refill guestIPs but I didn't notice desktop locks * windows xp refills guestIPs after few minutes " BZ#1037547 is a windows server 2012 virtio-serial specified bug ,it can not affect other platforms
Checked on hosts: Host 1 10.35.109.15 Host 2 10.35.109.14 Same result for windows 7 x64, same error message in vdsm.log and Install.rhev-agent-service log Vm with RHEL6.5 always show ip without any error messages in logs
A patch was posted upstream: http://gerrit.ovirt.org/23216
*** Bug 1050993 has been marked as a duplicate of this bug. ***
The bug is not directly connected to migration. A simpler scenario would be to stop vdsm, wait ~10 second (give the agent a change to write something to the virtio-serial port) and then start vdsm.
Merged to u/s ovirt-guest-agent master as http://gerrit.ovirt.org/gitweb?p=ovirt-guest-agent.git;a=commit;h=17d61aa3be1f85b70f72bb2756f8b3f046db3de8
Verified on rhevm is32.2: 1. Run a win 7x64 VM, install rhev-guest-tools-iso-3.3-9, See the VM ip iddress in UI. Stop vdsmd service, wait a few min, and start vdsmd service. See that the ip address is indeed not seen in UI. 2. Install rhev-guest-tools-iso-3.3-10, Stop vdsmd service, wait a few min, and start vdsmd service. See that the ip address is SEEN in UI.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0075.html