Bug 875486
Summary: | 3.2 - SuperVdsm is not functional after proccesing an exception resulted in delete network operation. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Meni Yakove <myakove> | ||||
Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Meni Yakove <myakove> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.2.0 | CC: | bazulay, chetan, cpelland, dornelas, dyasny, hateya, iheim, lpeer, mavital, mburman, ybronhei, ykaul | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | 3.2.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | infra | ||||||
Fixed In Version: | vdsm-4.10.2-1.0.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
SuperVdsm (svdsm) was not functional after processing an exception which caused a network delete operation. Consequently, storage and network actions which required svdsm permissions failed. With this update, when svdsm recognizes that the vdsm pid does not exist, it sends a SIGKILL signal and terminates itself. The exception which causes the svdsm to malfunction does not occur.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | Type: | Bug | |||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 884722 | ||||||
Attachments: |
|
Description
Meni Yakove
2012-11-11 15:40:06 UTC
Created attachment 642901 [details]
vdsm.log
We missed the panic call, it is between the delNetwork error to supervdsm attributeError: Thread-208741::ERROR::2012-11-11 15:00:04,171::sdc::150::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `46e8e180-16e8-4881-b867-f391d35d5cd3` Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 145, in _findDomain return mod.findDomain(sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1095, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/blockSD.py", line 1069, in findDomainPath if _isSD(vg): File "/usr/share/vdsm/storage/blockSD.py", line 1092, in _isSD return STORAGE_DOMAIN_TAG in vg.tags File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: 46e8e180-16e8-4881-b867-f391d35d5cd3 Thread-208741::ERROR::2012-11-11 15:00:04,175::misc::184::Storage.Misc::(panic) Panic: Unrecoverable errors during SPM stop process. Traceback (most recent call last): File "/usr/share/vdsm/storage/sp.py", line 397, in stopSpm self.masterDomain.releaseClusterLock() File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 121, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 152, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('46e8e180-16e8-4881-b867-f391d35d5cd3',) MainThread::INFO::2012-11-11 15:00:04,247::vdsm::70::vds::(run) I am the actual vdsm 4.9-41.0 This shows we called misc.panic and somehow old svdsm pid was still exist. supervdsm should die and delete its pid in such case, and it didn't. We still need reproduction for that to be sure. patch upstream: http://gerrit.ovirt.org/#/c/9196/ Verifies that svdsm will kill itself when vdsm dies. Verified on vdsm-4.10.2-1.0.el6.x86_64 3.2 has been released 3.2 has been released 3.2 has been released 3.2 has been released 3.2 has been released |