Created attachment 711441 [details] logs Description of problem: to verify bug 910013 I deleted 150 vm with wipe=true. at some point I started getting "Exception: No free file handlers in pool " and than vdsm restarted and could not recover with "AttributeError: 'list' object has no attribute 'split'" I had to manually restart vdsm Version-Release number of selected component (if applicable): sf10 4.10-11.0 How reproducible: Steps to Reproduce: 1. create a 2 hosts iscsi pool with 3 domains 100G each 2. create a wipe=true template (1GB disk) 3. create 3 pools from the template with 50 vm's on each pool 4. detach and remove the vm from each pool (I detached -> removed each pool at a time without waiting for the delete to end on the previouse pool). Actual results: we get "Exception: No free file handlers in pool" and the vdsm suddenly restart and fails to recover Expected results: vdsm should recover Additional info: logs
I also reproduced this issue with a much simpler scenrio. 1. run two vm's with two disks both thin provision 2. live migrate disks on both vm's twice (move disks -> wait to finish -> move again) vdsm crashed: MainThread::ERROR::2013-03-18 21:55:51,991::clientIF::263::vds::(_initIRS) Error initializing IRS Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 261, in _initIRS self.irs = Dispatcher(HSM()) File "/usr/share/vdsm/storage/hsm.py", line 344, in __init__ sp.StoragePool.cleanupMasterMount() File "/usr/share/vdsm/storage/sp.py", line 356, in cleanupMasterMount blockSD.BlockStorageDomain.doUnmountMaster(master) File "/usr/share/vdsm/storage/blockSD.py", line 1128, in doUnmountMaster pids = fuser(masterMount.fs_file, mountPoint=True) File "/usr/share/vdsm/storage/fuser.py", line 34, in fuser return [int(pid) for pid in out.split()] AttributeError: 'list' object has no attribute 'split' and we also have an attribute error form the vm channel: Thread-15::ERROR::2013-03-18 21:55:52,999::guestIF::103::vm.Vm::(__init__) vmId=`8df501ee-12eb-4f21-b709-0a44b2d33051`::Failed to prepare vmchannel Traceback (most recent call last): File "/usr/share/vdsm/guestIF.py", line 101, in __init__ self._prepare_socket() File "/usr/share/vdsm/guestIF.py", line 113, in _prepare_socket supervdsm.getProxy().prepareVmChannel(self._socketName) File "/usr/share/vdsm/supervdsm.py", line 76, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 66, in <lambda> getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, AttributeError: 'ProxyCaller' object has no attribute 'prepareVmChannel' clientIFinit::ERROR::2013-03-18 21:55:55,263::clientIF::409::vds::(_recoverExistingVms) Vm's recovery failed Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 395, in _recoverExistingVms not self.irs.getConnectedStoragePoolsList()['poollist']: AttributeError: 'NoneType' object has no attribute 'getConnectedStoragePoolsList'
Created attachment 712225 [details] logs
Goodness. storage.fuser.fuser() has never worked. When solving this bug, please write a unit test for the function.
This bug is currently attached to errata RHBA-2012:14332. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
verified on vdsm-4.10.2-14.0.el6ev.x86_64 vdsm did not crash but I also tested a storage issue in which vdsm had to restart and it was able to recover without the spit issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html