Bug 875486 - 3.2 - SuperVdsm is not functional after proccesing an exception resulted in delete network operation.
3.2 - SuperVdsm is not functional after proccesing an exception resulted in d...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2.0
x86_64 Linux
urgent Severity high
: ---
: 3.2.0
Assigned To: Yaniv Bronhaim
Meni Yakove
infra
: ZStream
Depends On:
Blocks: 884722
  Show dependency treegraph
 
Reported: 2012-11-11 10:40 EST by Meni Yakove
Modified: 2016-02-10 14:32 EST (History)
12 users (show)

See Also:
Fixed In Version: vdsm-4.10.2-1.0.el6
Doc Type: Bug Fix
Doc Text:
SuperVdsm (svdsm) was not functional after processing an exception which caused a network delete operation. Consequently, storage and network actions which required svdsm permissions failed. With this update, when svdsm recognizes that the vdsm pid does not exist, it sends a SIGKILL signal and terminates itself. The exception which causes the svdsm to malfunction does not occur.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vdsm.log (10.17 MB, text/x-log)
2012-11-11 10:43 EST, Meni Yakove
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 341473 None None None Never

  None (edit)
Description Meni Yakove 2012-11-11 10:40:06 EST
Description of problem:
This is the error that trigger supervdsm faulty stat:
MainProcess|Thread-205192::ERROR::2012-11-11 13:45:57,309::configNetwork::1373::setupNetworks::(setupNetworks) (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0")
Traceback (most recent call last):
  File "/usr/share/vdsm/configNetwork.py", line 1315, in setupNetworks
    implicitBonding=False)
  File "/usr/share/vdsm/configNetwork.py", line 1068, in delNetwork
    (nics, bonding))
ConfigNetworkError: (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0")
MainProcess|Thread-205192::ERROR::2012-11-11 13:45:57,318::supervdsmServer::68::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer.py", line 66, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer.py", line 118, in setupNetworks
    return configNetwork.setupNetworks(networks, bondings, **options)
  File "/usr/share/vdsm/configNetwork.py", line 1315, in setupNetworks
    implicitBonding=False)
  File "/usr/share/vdsm/configNetwork.py", line 1068, in delNetwork
    (nics, bonding))
ConfigNetworkError: (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0")
Thread-205192::ERROR::2012-11-11 13:45:57,319::API::1138::vds::(setupNetworks) delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1136, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/share/vdsm/supervdsm.py", line 69, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 60, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
ConfigNetworkError: (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0")

This is the error when vdsm tries to communicate with supervdsm:
Thread-19::ERROR::2012-11-11 17:13:56,921::task::853::TaskManager.Task::(_setError) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2435, in getStorageDomainsList
    sdCache.refreshStorage()
  File "/usr/share/vdsm/storage/misc.py", line 1063, in helper
    return sm(*args, **kwargs)
  File "/usr/share/vdsm/storage/misc.py", line 1048, in __call__
    self.__lastResult = self.__func(*args, **kwargs)
  File "/usr/share/vdsm/storage/sdc.py", line 82, in refreshStorage
    multipath.rescan()
  File "/usr/share/vdsm/storage/multipath.py", line 70, in rescan
    supervdsm.getProxy().forceIScsiScan()
  File "/usr/share/vdsm/supervdsm.py", line 69, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 59, in <lambda>
    getattr(self._supervdsmProxy._svdsm, self._funcName)(*args,
AttributeError: 'ProxyCaller' object has no attribute 'forceIScsiScan'
Thread-19::DEBUG::2012-11-11 17:13:56,922::task::872::TaskManager.Task::(_run) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Task._run: dc1304b1-e013-4a52-b93d-4b649bc23ffe ('00000000-0000-0000-0000-000000000000', None, None, None) {} failed - stopping task
Thread-19::DEBUG::2012-11-11 17:13:56,922::task::1199::TaskManager.Task::(stop) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::stopping in state preparing (force False)
Thread-19::DEBUG::2012-11-11 17:13:56,923::task::978::TaskManager.Task::(_decref) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::ref 1 aborting True
Thread-19::INFO::2012-11-11 17:13:56,923::task::1157::TaskManager.Task::(prepare) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::aborting: Task is aborted: u"'ProxyCaller' object has no attribute 'forceIScsiScan'" - code 100
Thread-19::DEBUG::2012-11-11 17:13:56,923::task::1162::TaskManager.Task::(prepare) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Prepare: aborted: 'ProxyCaller' object has no attribute 'forceIScsiScan'
Thread-19::DEBUG::2012-11-11 17:13:56,923::task::978::TaskManager.Task::(_decref) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::ref 0 aborting True
Thread-19::DEBUG::2012-11-11 17:13:56,924::task::913::TaskManager.Task::(_doAbort) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Task._doAbort: force False
Thread-19::DEBUG::2012-11-11 17:13:56,924::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-19::DEBUG::2012-11-11 17:13:56,924::task::588::TaskManager.Task::(_updateState) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::moving from state preparing -> state aborting
Thread-19::DEBUG::2012-11-11 17:13:56,924::task::537::TaskManager.Task::(__state_aborting) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::_aborting: recover policy none
Thread-19::DEBUG::2012-11-11 17:13:56,925::task::588::TaskManager.Task::(_updateState) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::moving from state aborting -> state failed
Thread-19::DEBUG::2012-11-11 17:13:56,925::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-19::DEBUG::2012-11-11 17:13:56,925::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-19::ERROR::2012-11-11 17:13:56,925::dispatcher::69::Storage.Dispatcher.Protect::(run) 'ProxyCaller' object has no attribute 'forceIScsiScan'
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
    raise self.error
AttributeError: 'ProxyCaller' object has no attribute 'forceIScsiScan'



Current state and implications:

Each action that requires sudo (supervdsm) permissions fails. (all storage/network actions).

How to reproduce:

make network operation trow exaction.

mitigation:
kill supervdsm service.(restart vdsm doesn't help!!!)
Comment 2 Meni Yakove 2012-11-11 10:43:55 EST
Created attachment 642901 [details]
vdsm.log
Comment 3 Yaniv Bronhaim 2012-11-11 12:19:22 EST
We missed the panic call, it is between the delNetwork error to supervdsm attributeError:

Thread-208741::ERROR::2012-11-11 15:00:04,171::sdc::150::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `46e8e180-16e8-4881-b867-f391d35d5cd3`
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 145, in _findDomain
    return mod.findDomain(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1095, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 1069, in findDomainPath
    if _isSD(vg):
  File "/usr/share/vdsm/storage/blockSD.py", line 1092, in _isSD
    return STORAGE_DOMAIN_TAG in vg.tags
  File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: 46e8e180-16e8-4881-b867-f391d35d5cd3
Thread-208741::ERROR::2012-11-11 15:00:04,175::misc::184::Storage.Misc::(panic) Panic: Unrecoverable errors during SPM stop process.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 397, in stopSpm
    self.masterDomain.releaseClusterLock()
  File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
    return getattr(self.getRealDomain(), attrName)
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 121, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 152, in _findDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('46e8e180-16e8-4881-b867-f391d35d5cd3',)
MainThread::INFO::2012-11-11 15:00:04,247::vdsm::70::vds::(run) I am the actual vdsm 4.9-41.0

This shows we called misc.panic and somehow old svdsm pid was still exist.
supervdsm should die and delete its pid in such case, and it didn't.

We still need reproduction for that to be sure.
Comment 4 Yaniv Bronhaim 2012-11-12 11:37:12 EST
patch upstream: http://gerrit.ovirt.org/#/c/9196/

Verifies that svdsm will kill itself when vdsm dies.
Comment 10 Meni Yakove 2012-12-26 03:40:19 EST
Verified on vdsm-4.10.2-1.0.el6.x86_64
Comment 12 Itamar Heim 2013-06-11 04:49:56 EDT
3.2 has been released
Comment 13 Itamar Heim 2013-06-11 04:50:03 EDT
3.2 has been released
Comment 14 Itamar Heim 2013-06-11 04:50:03 EDT
3.2 has been released
Comment 15 Itamar Heim 2013-06-11 04:54:26 EDT
3.2 has been released
Comment 16 Itamar Heim 2013-06-11 05:24:55 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.