Bug 990498 - SuperVdsm not restart if deleting /var/run/vdsm/svdsm.sock file
SuperVdsm not restart if deleting /var/run/vdsm/svdsm.sock file
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.3.0
x86_64 Linux
unspecified Severity medium
: ---
: 3.3.0
Assigned To: Yaniv Bronhaim
vvyazmin@redhat.com
infra
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-31 06:07 EDT by vvyazmin@redhat.com
Modified: 2016-02-10 14:06 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-01 09:25:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm (2.77 MB, application/x-gzip)
2013-07-31 06:07 EDT, vvyazmin@redhat.com
no flags Details

  None (edit)
Description vvyazmin@redhat.com 2013-07-31 06:07:57 EDT
Created attachment 781035 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Description of problem:
SuperVdsm not restart if deleting /var/run/vdsm/svdsm.sock file

Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS7 environment:

RHEVM: rhevm-3.3.0-0.11.master.el6ev.noarch
VDSM: vdsm-4.12.0-rc3.12.git139ec2f.el6ev.x86_64
LIBVIRT: libvirt-0.10.2-18.el6_4.9.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64
PythonSDK: rhevm-sdk-python-3.3.0.7-1.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
Create iSCSI DC with 2 hosts
Validate that vmds and supervdsm daemon are up and running
Verify when you run command with supervdsm execution all work OK (in my case was forceScsiScan command)
Delete /var/run/vdsm/svdsm.sock file on one of host
Run again same command with supervdsm execution

Actual results:
Action failed.

Expected results:
Supervdsm daemon restart, and execution command succeeds.

Impact on user:
Failed run command which depend on supervdsm execution

Workaround:
Manually restart supervdsm daemon, logout and login from RHEVM, and run again same command with supervdsm execution

Additional info:

UI:
Could not retrieve LUNs, please check your storage.

/var/log/ovirt-engine/engine.log
2013-07-31 12:29:21,548 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-9) START, GetDeviceListVDSCommand(HostName = tigris03.scl.lab.tlv.redhat.com, HostId = c9ce4f60-4d58-453d-9588-e5a7b5beb6fd, storageType=ISCSI), log id: 71461cc0
2013-07-31 12:29:21,617 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-9) Failed in GetDeviceListVDS method
2013-07-31 12:29:21,618 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-9) Error code BlockDeviceActionError and error message VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: ()
2013-07-31 12:29:21,618 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-9) Command org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand return value 
 
LUNListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=600, mMessage=Error block device action: ()]]

2013-07-31 12:29:21,618 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-9) HostName = tigris03.scl.lab.tlv.redhat.com
2013-07-31 12:29:21,618 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-9) Command GetDeviceListVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: ()
2013-07-31 12:29:21,618 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-9) FINISH, GetDeviceListVDSCommand, log id: 71461cc0
2013-07-31 12:29:21,619 ERROR [org.ovirt.engine.core.bll.storage.GetDeviceListQuery] (ajp-/127.0.0.1:8702-9) Query GetDeviceListQuery failed. Exception message is VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: () (Failed with VDSM error BlockDeviceActionError and code 600)
2013-07-31 12:29:23,597 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (ajp-/127.0.0.1:8702-8) START, GetDeviceListVDSCommand(HostName = tigris02.scl.lab.tlv.redhat.com, HostId = 3cca5914-a984-45e5-9e02-1e22f2073049, storageType=ISCSI), log id: 60c24d17


/var/log/vdsm/vdsm.log
Thread-129470::DEBUG::2013-07-31 10:27:02,432::BindingXMLRPC::177::vds::(wrapper) client [10.35.161.29]
Thread-129470::DEBUG::2013-07-31 10:27:02,433::task::579::TaskManager.Task::(_updateState) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::moving from state init -> state preparing
Thread-129470::INFO::2013-07-31 10:27:02,433::logUtils::44::dispatcher::(wrapper) Run and protect: getDeviceList(storageType=3, options={})
Thread-129470::DEBUG::2013-07-31 10:27:02,433::misc::807::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage)
Thread-129470::DEBUG::2013-07-31 10:27:02,434::misc::809::SamplingMethod::(__call__) Got in to sampling method
Thread-129470::DEBUG::2013-07-31 10:27:02,434::misc::807::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan)
Thread-129470::DEBUG::2013-07-31 10:27:02,434::misc::809::SamplingMethod::(__call__) Got in to sampling method
Thread-129470::DEBUG::2013-07-31 10:27:02,435::iscsiadm::91::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None)
Thread-129470::DEBUG::2013-07-31 10:27:02,466::iscsiadm::91::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-129470::DEBUG::2013-07-31 10:27:02,467::misc::817::SamplingMethod::(__call__) Returning last result
Thread-129470::ERROR::2013-07-31 10:27:02,467::task::850::TaskManager.Task::(_setError) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1886, in getDeviceList
    devices = self._getDeviceList(storageType)
  File "/usr/share/vdsm/storage/hsm.py", line 1890, in _getDeviceList
    sdCache.refreshStorage()
  File "/usr/share/vdsm/storage/misc.py", line 826, in helper
    return sm(*args, **kwargs)
  File "/usr/share/vdsm/storage/misc.py", line 811, in __call__
    self.__lastResult = self.__func(*args, **kwargs)
  File "/usr/share/vdsm/storage/sdc.py", line 83, in refreshStorage
    multipath.rescan()
  File "/usr/share/vdsm/storage/multipath.py", line 108, in rescan
    supervdsm.getProxy().forceScsiScan()
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in forceScsiScan
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 722, in _callmethod
    self._connect()
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 709, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib64/python2.6/multiprocessing/connection.py", line 143, in Client
    c = SocketClient(address)
  File "/usr/lib64/python2.6/multiprocessing/connection.py", line 263, in SocketClient
    s.connect(address)
  File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Thread-129470::DEBUG::2013-07-31 10:27:02,470::task::869::TaskManager.Task::(_run) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::Task._run: 73d4d3eb-8487-4979-ba50-171053f6086c (3,) {} failed - stopping task
Thread-129470::DEBUG::2013-07-31 10:27:02,470::task::1194::TaskManager.Task::(stop) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::stopping in state preparing (force False)
Thread-129470::DEBUG::2013-07-31 10:27:02,470::task::974::TaskManager.Task::(_decref) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::ref 1 aborting True
Thread-129470::INFO::2013-07-31 10:27:02,471::task::1151::TaskManager.Task::(prepare) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::aborting: Task is aborted: u'[Errno 2] No such file or directory' - code 100
Thread-129470::DEBUG::2013-07-31 10:27:02,471::task::1156::TaskManager.Task::(prepare) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::Prepare: aborted: [Errno 2] No such file or directory
Thread-129470::DEBUG::2013-07-31 10:27:02,471::task::974::TaskManager.Task::(_decref) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::ref 0 aborting True
Thread-129470::DEBUG::2013-07-31 10:27:02,472::task::909::TaskManager.Task::(_doAbort) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::Task._doAbort: force False
Thread-129470::DEBUG::2013-07-31 10:27:02,472::resourceManager::976::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-129470::DEBUG::2013-07-31 10:27:02,472::task::579::TaskManager.Task::(_updateState) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::moving from state preparing -> state aborting
Thread-129470::DEBUG::2013-07-31 10:27:02,472::task::534::TaskManager.Task::(__state_aborting) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::_aborting: recover policy none
Thread-129470::DEBUG::2013-07-31 10:27:02,473::task::579::TaskManager.Task::(_updateState) Task=`73d4d3eb-8487-4979-ba50-171053f6086c`::moving from state aborting -> state failed
Thread-129470::DEBUG::2013-07-31 10:27:02,473::resourceManager::939::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-129470::DEBUG::2013-07-31 10:27:02,473::resourceManager::976::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-129470::ERROR::2013-07-31 10:27:02,473::dispatcher::70::Storage.Dispatcher.Protect::(run) [Errno 2] No such file or directory
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 62, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 1159, in prepare
    raise self.error
error: [Errno 2] No such file or directory
Comment 1 Barak 2013-08-01 09:25:42 EDT
This is a problem in python multiprocessing library (it validates the existence of the socket each communication).

Usually in linux deletion of a LDS  does not effect the processes that holds the inode (like in crubRPC).

However this is not an interesting bug,
No one should delete manually the svdsm.sock file under /var/run

It is the same like deleting the vdsm py pyo and pyc files .... vdsm will not work.

Hence moving it to CLOSE NOTABUG

Note You need to log in before you can comment on or make changes to this bug.