Description of problem: This is the error that trigger supervdsm faulty stat: MainProcess|Thread-205192::ERROR::2012-11-11 13:45:57,309::configNetwork::1373::setupNetworks::(setupNetworks) (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0") Traceback (most recent call last): File "/usr/share/vdsm/configNetwork.py", line 1315, in setupNetworks implicitBonding=False) File "/usr/share/vdsm/configNetwork.py", line 1068, in delNetwork (nics, bonding)) ConfigNetworkError: (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0") MainProcess|Thread-205192::ERROR::2012-11-11 13:45:57,318::supervdsmServer::68::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks Traceback (most recent call last): File "/usr/share/vdsm/supervdsmServer.py", line 66, in wrapper return func(*args, **kwargs) File "/usr/share/vdsm/supervdsmServer.py", line 118, in setupNetworks return configNetwork.setupNetworks(networks, bondings, **options) File "/usr/share/vdsm/configNetwork.py", line 1315, in setupNetworks implicitBonding=False) File "/usr/share/vdsm/configNetwork.py", line 1068, in delNetwork (nics, bonding)) ConfigNetworkError: (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0") Thread-205192::ERROR::2012-11-11 13:45:57,319::API::1138::vds::(setupNetworks) delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0 Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1136, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File "/usr/share/vdsm/supervdsm.py", line 69, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 60, in <lambda> **kwargs) File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) ConfigNetworkError: (23, "delNetwork: ['eth2', 'eth3', 'vnet1'] are not all nics enslaved to bond0") This is the error when vdsm tries to communicate with supervdsm: Thread-19::ERROR::2012-11-11 17:13:56,921::task::853::TaskManager.Task::(_setError) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2435, in getStorageDomainsList sdCache.refreshStorage() File "/usr/share/vdsm/storage/misc.py", line 1063, in helper return sm(*args, **kwargs) File "/usr/share/vdsm/storage/misc.py", line 1048, in __call__ self.__lastResult = self.__func(*args, **kwargs) File "/usr/share/vdsm/storage/sdc.py", line 82, in refreshStorage multipath.rescan() File "/usr/share/vdsm/storage/multipath.py", line 70, in rescan supervdsm.getProxy().forceIScsiScan() File "/usr/share/vdsm/supervdsm.py", line 69, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 59, in <lambda> getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, AttributeError: 'ProxyCaller' object has no attribute 'forceIScsiScan' Thread-19::DEBUG::2012-11-11 17:13:56,922::task::872::TaskManager.Task::(_run) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Task._run: dc1304b1-e013-4a52-b93d-4b649bc23ffe ('00000000-0000-0000-0000-000000000000', None, None, None) {} failed - stopping task Thread-19::DEBUG::2012-11-11 17:13:56,922::task::1199::TaskManager.Task::(stop) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::stopping in state preparing (force False) Thread-19::DEBUG::2012-11-11 17:13:56,923::task::978::TaskManager.Task::(_decref) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::ref 1 aborting True Thread-19::INFO::2012-11-11 17:13:56,923::task::1157::TaskManager.Task::(prepare) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::aborting: Task is aborted: u"'ProxyCaller' object has no attribute 'forceIScsiScan'" - code 100 Thread-19::DEBUG::2012-11-11 17:13:56,923::task::1162::TaskManager.Task::(prepare) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Prepare: aborted: 'ProxyCaller' object has no attribute 'forceIScsiScan' Thread-19::DEBUG::2012-11-11 17:13:56,923::task::978::TaskManager.Task::(_decref) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::ref 0 aborting True Thread-19::DEBUG::2012-11-11 17:13:56,924::task::913::TaskManager.Task::(_doAbort) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::Task._doAbort: force False Thread-19::DEBUG::2012-11-11 17:13:56,924::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-19::DEBUG::2012-11-11 17:13:56,924::task::588::TaskManager.Task::(_updateState) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::moving from state preparing -> state aborting Thread-19::DEBUG::2012-11-11 17:13:56,924::task::537::TaskManager.Task::(__state_aborting) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::_aborting: recover policy none Thread-19::DEBUG::2012-11-11 17:13:56,925::task::588::TaskManager.Task::(_updateState) Task=`dc1304b1-e013-4a52-b93d-4b649bc23ffe`::moving from state aborting -> state failed Thread-19::DEBUG::2012-11-11 17:13:56,925::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-19::DEBUG::2012-11-11 17:13:56,925::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-19::ERROR::2012-11-11 17:13:56,925::dispatcher::69::Storage.Dispatcher.Protect::(run) 'ProxyCaller' object has no attribute 'forceIScsiScan' Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run result = ctask.prepare(self.func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 1164, in prepare raise self.error AttributeError: 'ProxyCaller' object has no attribute 'forceIScsiScan' Current state and implications: Each action that requires sudo (supervdsm) permissions fails. (all storage/network actions). How to reproduce: make network operation trow exaction. mitigation: kill supervdsm service.(restart vdsm doesn't help!!!)
Created attachment 642901 [details] vdsm.log
We missed the panic call, it is between the delNetwork error to supervdsm attributeError: Thread-208741::ERROR::2012-11-11 15:00:04,171::sdc::150::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `46e8e180-16e8-4881-b867-f391d35d5cd3` Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 145, in _findDomain return mod.findDomain(sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1095, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/blockSD.py", line 1069, in findDomainPath if _isSD(vg): File "/usr/share/vdsm/storage/blockSD.py", line 1092, in _isSD return STORAGE_DOMAIN_TAG in vg.tags File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: 46e8e180-16e8-4881-b867-f391d35d5cd3 Thread-208741::ERROR::2012-11-11 15:00:04,175::misc::184::Storage.Misc::(panic) Panic: Unrecoverable errors during SPM stop process. Traceback (most recent call last): File "/usr/share/vdsm/storage/sp.py", line 397, in stopSpm self.masterDomain.releaseClusterLock() File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 121, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 152, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('46e8e180-16e8-4881-b867-f391d35d5cd3',) MainThread::INFO::2012-11-11 15:00:04,247::vdsm::70::vds::(run) I am the actual vdsm 4.9-41.0 This shows we called misc.panic and somehow old svdsm pid was still exist. supervdsm should die and delete its pid in such case, and it didn't. We still need reproduction for that to be sure.
patch upstream: http://gerrit.ovirt.org/#/c/9196/ Verifies that svdsm will kill itself when vdsm dies.
Verified on vdsm-4.10.2-1.0.el6.x86_64
3.2 has been released