Bug 922515

Summary: vdsm: vdsm fails to recover after restart with 'AttributeError: 'list' object has no attribute 'split'' error
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: amureini, bazulay, danken, eedri, hateya, iheim, lpeer, sgrinber, ybronhei, ykaul, zdover
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: vdsm-4.10.2-13.0.el6ev Doc Type: Bug Fix
Doc Text:
Previously, VDSM failed to recover after restarts, and reported an error "AttributeError: 'list' object has no attribute 'split'". The function storage.fuser.fuser() was patched, and VDSM now recovers as expected after restarts.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-10 20:45:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
logs none

Description Dafna Ron 2013-03-17 15:33:08 UTC
Created attachment 711441 [details]
logs

Description of problem:

to verify bug 910013 I deleted 150 vm with wipe=true. 
at some point I started getting "Exception: No free file handlers in pool
" and than vdsm restarted and could not recover with "AttributeError: 'list' object has no attribute 'split'" 

I had to manually restart vdsm 

Version-Release number of selected component (if applicable):

sf10
4.10-11.0 

How reproducible:


Steps to Reproduce:
1. create a 2 hosts iscsi pool with 3 domains 100G each
2. create a wipe=true template (1GB disk)
3. create 3 pools from the template with 50 vm's on each pool
4. detach and remove the vm from each pool (I detached -> removed each pool at a time without waiting for the delete to end on the previouse pool). 
  
Actual results:

we get "Exception: No free file handlers in pool" and the vdsm suddenly restart and fails to recover

Expected results:

vdsm should recover

Additional info: logs

Comment 2 Dafna Ron 2013-03-18 20:05:54 UTC
I also reproduced this issue with a much simpler scenrio. 

1. run two vm's with two disks both thin provision 
2. live migrate disks on both vm's twice (move disks -> wait to finish -> move again)

vdsm crashed:

MainThread::ERROR::2013-03-18 21:55:51,991::clientIF::263::vds::(_initIRS) Error initializing IRS
Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 261, in _initIRS
    self.irs = Dispatcher(HSM())
  File "/usr/share/vdsm/storage/hsm.py", line 344, in __init__
    sp.StoragePool.cleanupMasterMount()
  File "/usr/share/vdsm/storage/sp.py", line 356, in cleanupMasterMount
    blockSD.BlockStorageDomain.doUnmountMaster(master)
  File "/usr/share/vdsm/storage/blockSD.py", line 1128, in doUnmountMaster
    pids = fuser(masterMount.fs_file, mountPoint=True)
  File "/usr/share/vdsm/storage/fuser.py", line 34, in fuser
    return [int(pid) for pid in out.split()]
AttributeError: 'list' object has no attribute 'split'


and we also have an attribute error form the vm channel: 

Thread-15::ERROR::2013-03-18 21:55:52,999::guestIF::103::vm.Vm::(__init__) vmId=`8df501ee-12eb-4f21-b709-0a44b2d33051`::Failed to prepare vmchannel
Traceback (most recent call last):
  File "/usr/share/vdsm/guestIF.py", line 101, in __init__
    self._prepare_socket()
  File "/usr/share/vdsm/guestIF.py", line 113, in _prepare_socket
    supervdsm.getProxy().prepareVmChannel(self._socketName)
  File "/usr/share/vdsm/supervdsm.py", line 76, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 66, in <lambda>
    getattr(self._supervdsmProxy._svdsm, self._funcName)(*args,
AttributeError: 'ProxyCaller' object has no attribute 'prepareVmChannel'

clientIFinit::ERROR::2013-03-18 21:55:55,263::clientIF::409::vds::(_recoverExistingVms) Vm's recovery failed
Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 395, in _recoverExistingVms
    not self.irs.getConnectedStoragePoolsList()['poollist']:
AttributeError: 'NoneType' object has no attribute 'getConnectedStoragePoolsList'

Comment 3 Dafna Ron 2013-03-18 20:08:10 UTC
Created attachment 712225 [details]
logs

Comment 4 Dan Kenigsberg 2013-03-24 09:47:20 UTC
Goodness. storage.fuser.fuser() has never worked. When solving this bug, please write a unit test for the function.

Comment 5 Cheryn Tan 2013-04-03 07:01:47 UTC
This bug is currently attached to errata RHBA-2012:14332. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.

* Consequence: What happens when the bug presents.

* Fix: What was done to fix the bug.

* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Thanks in advance.

Comment 6 Dafna Ron 2013-04-07 12:23:47 UTC
verified on vdsm-4.10.2-14.0.el6ev.x86_64
vdsm did not crash but I also tested a storage issue in which vdsm had to restart and it was able to recover without the spit issue.

Comment 8 errata-xmlrpc 2013-06-10 20:45:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html