Bug 1006203
Summary: | no SPM failover after SPM lost connection to storage | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Aharon Canan <acanan> | ||||
Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||
Status: | CLOSED ERRATA | QA Contact: | Elad <ebenahar> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.3.0 | CC: | acathrow, bazulay, iheim, lpeer, lveyde, pstehlik, rhev-integ, smizrahi, ybronhei, yeylon | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.3.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | infra | ||||||
Fixed In Version: | is16 | Doc Type: | Bug Fix | ||||
Doc Text: |
After the Storage Pool Manager (SPM) lost connection to the storage and became non-operational, the expected behavior was for another host to take its place as the SPM, but this did not happen. This was because SuperVdsm was not passing kwargs to the fuser. This has now been fixed, so when the SPM becomes non-operational a failover takes place.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-01-21 16:15:36 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Aharon Canan
2013-09-10 08:36:57 UTC
Created attachment 795900 [details]
logs
vdsm on spm node was restarted properly [1] but failed to startup due to failure in unmounting (looks like a regression introduced by: http://gerrit.ovirt.org/13779) Which means that engine will not start spm on the new host without fencing the first host. looks like supervdsmServer is not passing kwargs to fuser. If so patch should be: - def fuser(self, *args): - return fuser.fuser(*args) + def fuser(self, *args, **kwargs): + return fuser.fuser(*args, **kwargs) [1] MainThread::DEBUG::2013-09-10 11:07:28,050::vdsm::45::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2013-09-10 11:07:28,050::clientIF::232::vds::(prepareForShutdown) cannot run prepareForShutdown twice MainThread::INFO::2013-09-10 11:07:28,735::vdsm::101::vds::(run) (PID: 17959) I am the actual vdsm 4.12.0-92.gita04386d.el6ev camel-vdsc.qa.lab.tlv.redhat.com (2.6.32-358.el6.x86_64) MainThread::DEBUG::2013-09-10 11:07:30,322::resourceManager::420::ResourceManager::(registerNamespace) Registering namespace 'Storage' MainThread::DEBUG::2013-09-10 11:07:30,322::threadPool::35::Misc.ThreadPool::(__init__) Enter - numThreads: 10.0, waitTimeout: 3, maxTasks: 500.0 MainThread::WARNING::2013-09-10 11:07:30,326::fileUtils::167::Storage.fileUtils::(createdir) Dir /rhev/data-center/mnt already exists MainThread::DEBUG::2013-09-10 11:07:30,328::sp::387::Storage.StoragePool::(cleanupMasterMount) unmounting /rhev/data-center/mnt/blockSD/3a260f93-26e5-4aeb-9854-a7ccb6fba54b/master MainThread::DEBUG::2013-09-10 11:07:30,918::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/umount /rhev/data-center/mnt/blockSD/3a260f93-26e5-4aeb-9854-a7ccb6fba54b/master' (cwd None) MainThread::DEBUG::2013-09-10 11:07:30,933::supervdsm::77::SuperVdsmProxy::(_connect) Trying to connect to Super Vdsm MainThread::ERROR::2013-09-10 11:07:30,949::clientIF::260::vds::(_initIRS) Error initializing IRS Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 258, in _initIRS self.irs = Dispatcher(HSM()) File "/usr/share/vdsm/storage/hsm.py", line 346, in __init__ sp.StoragePool.cleanupMasterMount() File "/usr/share/vdsm/storage/sp.py", line 389, in cleanupMasterMount blockSD.BlockStorageDomain.doUnmountMaster(master) File "/usr/share/vdsm/storage/blockSD.py", line 1181, in doUnmountMaster pids = svdsmp.fuser(masterMount.fs_file, mountPoint=True) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in fuser File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) TypeError: fuser() got an unexpected keyword argument 'mountPoint' MainThread::INFO::2013-09-10 11:07:31,015::momIF::47::MOM::(__init__) Starting up MOM MainThread::INFO::2013-09-10 11:07:31,047::vmChannels::187::vds::(settimeout) Setting channels' timeout to 30 seconds. clientIFinit::DEBUG::2013-09-10 11:07:31,047::libvirtconnection::124::libvirtconnection::(get) trying to connect libvirt VM Channels Listener::INFO::2013-09-10 11:07:31,049::vmChannels::170::vds::(run) Starting VM channels listener thread http://gerrit.ovirt.org/19250 - please help to verify and review your suggestion. thanks. Host becomes non-op and SPM fail-over is taking place after SPM looses its connectivity to the storage. Verified on RHEVM3.3 is18 This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html |