Bug 693157 - [vdsm] vdsm hangs forever after - prepareForShutdown returns 'OK'
Summary: [vdsm] vdsm hangs forever after - prepareForShutdown returns 'OK'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Federico Simoncelli
QA Contact: Jakub Libosvar
URL:
Whiteboard:
: 693165 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-03 08:48 UTC by David Naori
Modified: 2011-12-06 07:11 UTC (History)
7 users (show)

Fixed In Version: vdsm-4.9-65
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 07:11:41 UTC
Target Upstream Version:


Attachments (Terms of Use)
vdsm log (633.87 KB, application/x-gzip)
2011-04-03 08:48 UTC, David Naori
no flags Details
vdsm log (352.70 KB, application/octet-stream)
2011-05-09 09:50 UTC, Jakub Libosvar
no flags Details
vdsm + backend logs (167 bytes, application/x-gzip)
2011-05-18 12:26 UTC, Jakub Libosvar
no flags Details
Backend vdsm logs (821.50 KB, application/x-gzip)
2011-05-18 13:18 UTC, Jakub Libosvar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 0 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description David Naori 2011-04-03 08:48:42 UTC
Created attachment 489630 [details]
vdsm log

Description of problem:

when no vms are running, vdsm is not aware to libvirtd status.
in a situation that libvirtd is down and trying to run vm: vdsm fails to communicate with libvirtd, prepareForShutdown starts and return 'OK' but the process dose not restarts and hangs forever.

Version-Release number of selected component (if applicable):
vdsm-4.9-57.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.stop libvirtd 
2.try to run vm
  
Actual results:
vdsm hangs forever

Expected results:
vdsm should restart

Full vdsm log attached.

Comment 2 Dan Kenigsberg 2011-04-03 21:08:41 UTC
*** Bug 693165 has been marked as a duplicate of this bug. ***

Comment 3 Jakub Libosvar 2011-05-09 09:50:57 UTC
Created attachment 497751 [details]
vdsm log

Still reproducible in vdsm-4.9-64.el6.x86_64, vdsm hangs.

Thread-170::INFO::2011-05-09 05:34:16,499::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}}
Thread-170::ERROR::2011-05-09 05:34:16,502::vm::618::vm.Vm::(_startUnderlyingVm) vmId=`e7436a97-d9ab-41d1-bbcd-048fb103edde`::Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 588, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 906, in _run
    if self._dom.UUIDString() != self.id:
  File "/usr/share/vdsm/libvirtvm.py", line 262, in __getattr__
    attr = getattr(self._dom, name)
AttributeError: 'NoneType' object has no attribute 'UUIDString'

Thread-170::DEBUG::2011-05-09 05:34:16,505::vm::1760::vm.Vm::(setDownStatus) vmId=`e7436a97-d9ab-41d1-bbcd-048fb103edde`::Changed state to Down: 'NoneType' object has no attribute 'UUIDString'

Comment 4 Dan Kenigsberg 2011-05-09 11:22:49 UTC
Actually,

AttributeError: 'NoneType' object has no attribute 'UUIDString'

is an artifact of bug 695244.

Please recheck with vdsm-4.9-65 that fixed bug 702275.

Comment 5 Haim 2011-05-12 13:35:12 UTC
tested on vdsm-4.9-65 and bug still occurs, scenario is a bit different:

1) host (SPM) with running VMs on it
2) perform storage actions, such as deatch\attach
3) restart libvirtd; 


Thread-954::ERROR::2011-05-12 15:51:28,145::utils::526::vm.Vm::(collect) vmId=`0c5efaeb-ba63-421e-8a59-1fe0f4ee7662`::Stats function failed: <AdvancedStatsFunc
tion _highWrite at 0x1acaa08>
Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 523, in collect
    statsFunction()
  File "/usr/share/vdsm/utils.py", line 407, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 62, in _highWrite
    dCap, dAlloc, dPhys = self._vm._dom.blockInfo(vmDrive.path, 0)
  File "/usr/share/vdsm/libvirtvm.py", line 306, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 59, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 806, in blockInfo
    if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: cannot send data: Broken pipe
Thread-946::ERROR::2011-05-12 15:51:28,148::libvirtconnection::69::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-946::DEBUG::2011-05-12 15:51:28,149::clientIF::141::vds::(prepareForShutdown) cannot run prepareForShutdown twice
Thread-946::ERROR::2011-05-12 15:51:28,149::utils::526::vm.Vm::(collect) vmId=`074d712e-6f3d-4e1e-bffd-ab40a56c9067`::Stats function failed: <AdvancedStatsFunc
tion _highWrite at 0x1acaa08>
Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 523, in collect
    statsFunction()
  File "/usr/share/vdsm/utils.py", line 407, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 62, in _highWrite
    dCap, dAlloc, dPhys = self._vm._dom.blockInfo(vmDrive.path, 0)
  File "/usr/share/vdsm/libvirtvm.py", line 306, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 59, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 806, in blockInfo
    if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: cannot send data: Broken pipe
Thread-1007::ERROR::2011-05-12 15:51:28,183::libvirtconnection::69::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-1007::DEBUG::2011-05-12 15:51:28,183::clientIF::141::vds::(prepareForShutdown) cannot run prepareForShutdown twice
Thread-1007::ERROR::2011-05-12 15:51:28,184::utils::526::vm.Vm::(collect) vmId=`7ccf8808-d464-42ea-8a09-5ec4f46de527`::Stats function failed: <AdvancedStatsFun
:

Comment 6 Federico Simoncelli 2011-05-17 16:46:19 UTC
The bug didn't reproduce using a ruth test implementing the steps described in comment #5:

http://gerrit.usersys.redhat.com/419

Maybe we can get rid of the AdvancedStatsFunction exceptions stopping the AdvancedStatsThread when a libvirt broken pipe is raised.

Can you check that this is still reproducible on vdsm-4.9-66.1?

Comment 7 Ayal Baron 2011-05-18 11:31:28 UTC
(In reply to comment #6)
> The bug didn't reproduce using a ruth test implementing the steps described in
> comment #5:
> 
> http://gerrit.usersys.redhat.com/419
> 
> Maybe we can get rid of the AdvancedStatsFunction exceptions stopping the
> AdvancedStatsThread when a libvirt broken pipe is raised.
> 
> Can you check that this is still reproducible on vdsm-4.9-66.1?

Please retest with vdsm-4.9-67

Comment 8 Jakub Libosvar 2011-05-18 12:25:32 UTC
Still not good. Using steps described in description (comment 0) vdsm hangs on

Thread-1648::INFO::2011-05-18 14:03:34,128::vm::625::vm.Vm::(_startUnderlyingVm) vmId=`a4c6fec2-f16e-4092-8aab-bed9d202d2b2`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 592, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 949, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/share/vdsm/libvirtconnection.py", line 59, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1353, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: cannot send data: Broken pipe
Thread-1648::DEBUG::2011-05-18 14:03:34,139::vm::1764::vm.Vm::(setDownStatus) vmId=`a4c6fec2-f16e-4092-8aab-bed9d202d2b2`::Changed state to Down: cannot send data: Broken pipe

This is last log message appeared in log.

Comment 9 Jakub Libosvar 2011-05-18 12:26:09 UTC
Created attachment 499586 [details]
vdsm + backend logs

Vdsm and backend logs attached

Comment 11 Jakub Libosvar 2011-05-18 13:18:40 UTC
Created attachment 499598 [details]
Backend vdsm logs

My apologize, I tarred symlinks. These logs should contain info.

Comment 12 Jakub Libosvar 2011-05-26 16:12:27 UTC
After some more tests and discussion with Federico, moving this bug to verified using vdsm-4.9-70.el6.x86_64. 
There is still trouble if libvirt is stopped and not running then vdsm is not operating, doesn't respond to backend - but according bug 678084, libvirt should be able to revive itself in condition it is not started by init script but by upstart. 
Please correct or accurate my statements if I'm wrong. Still - after host installation to rhevm libvirt is started by System V initscript (bug 694026) in ic119.

Comment 13 errata-xmlrpc 2011-12-06 07:11:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html


Note You need to log in before you can comment on or make changes to this bug.