Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 659351 - [vdsm] [service] vdsm doesn't attempt to restart as required when connection to libvirt is broken
Summary: [vdsm] [service] vdsm doesn't attempt to restart as required when connection ...
Keywords:
Status: CLOSED DUPLICATE of bug 591506
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Dan Kenigsberg
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-02 15:47 UTC by Haim
Modified: 2016-04-18 06:35 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-06 12:09:26 UTC
Target Upstream Version:


Attachments (Terms of Use)
vdsm log. (500.98 KB, application/x-gzip)
2010-12-02 15:52 UTC, Haim
no flags Details

Description Haim 2010-12-02 15:47:45 UTC
Description of problem:

vdsm service enters dead lock after libvirt service was in deadlock, and got restarted. 

vdsm fails to respond to both getVdsCaps and getVdsStats, as well as basic commands such as list table. 

scenario was concurrent multiple migrations, which caused libvirt enter dead lock, after I manually restarted libvirt, vdsm entered dead lock. 
attached gdb output.

when examine vdsm log, i see the following output: 

Thread-10093::DEBUG::2010-12-02 15:51:23,531::libvirtvm::892::vds.vmlog.d2bd1c7a-d0b9-411c-b47f-56deae673db3::(destroy) destroy Called
Thread-9976::ERROR::2010-12-02 15:51:23,532::clientIF::48::vds::(wrapper) Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 44, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/clientIF.py", line 439, in destroy
    v.destroy()
  File "/usr/share/vdsm/libvirtvm.py", line 900, in destroy
    self._dom.destroy()
  File "/usr/share/vdsm/libvirtvm.py", line 146, in f
    raise e
libvirtError: cannot send data: Broken pipe

Thread-10093::ERROR::2010-12-02 15:51:23,536::libvirtvm::1071::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-10093::DEBUG::2010-12-02 15:51:23,537::clientIF::119::vds::(prepareForShutdown) cannot run prepareForShutdown twice
Thread-10128::DEBUG::2010-12-02 15:51:23,538::clientIF::45::vds::(wrapper) return destroy with {'status': {'message': 'Virtual machine does not exist', 'code': 1}}
Thread-10093::ERROR::2010-12-02 15:51:23,540::clientIF::48::vds::(wrapper) Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 44, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/clientIF.py", line 439, in destroy
    v.destroy()
  File "/usr/share/vdsm/libvirtvm.py", line 900, in destroy
    self._dom.destroy()
  File "/usr/share/vdsm/libvirtvm.py", line 146, in f
    raise e
libvirtError: cannot send data: Broken pipe

vdsm-4.9-28.el6.x86_64
libvirt-0.8.1-28.el6.x86_64

Comment 1 Haim 2010-12-02 15:50:23 UTC
please note that when libvirt was fully responsive the time vdsm entered deadlock.

Comment 2 Haim 2010-12-02 15:52:26 UTC
Created attachment 464284 [details]
vdsm log.

Comment 5 Haim 2010-12-06 09:34:49 UTC
Dan, the real problem is vdsm doesn't try to kill itself in case connection to libvirt is broken. 
this is a regression. 


[root@nott-vds1 ~]# ps -o etime `pgrep libvirt`
    ELAPSED
      07:46


Thread-3170::ERROR::2010-12-06 11:17:44,503::utils::424::vds.vmlog.d2bd1c7a-d0b9-411c-b47f-56deae673db3::(run) Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 416, in run
    self._samples.append(self.sample())
  File "/usr/share/vdsm/vm.py", line 132, in sample
    s = self.VmSampleClass(self._pid, self._ifids, self._vm)
  File "/usr/share/vdsm/libvirtvm.py", line 75, in __init__
    raise e
libvirtError: cannot send data: Broken pipe

easy to reproduce.

Comment 6 Dan Kenigsberg 2010-12-06 12:09:13 UTC
I believe this is a duplicate of the now-reopened bug 591506

Comment 7 Dan Kenigsberg 2010-12-06 12:09:26 UTC

*** This bug has been marked as a duplicate of bug 591506 ***


Note You need to log in before you can comment on or make changes to this bug.