Bug 616055

Summary: [vdsm] [libvirt] (scale) OSError: [Errno 11] Resource temporarily unavailable (python leak?)
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED DUPLICATE QA Contact: yeylon <yeylon>
Severity: medium Docs Contact:
Priority: low    
Version: 6.1CC: abaron, bazulay, hateya, iheim, mgoldboi, Rhev-m-bugs, smizrahi, srevivo, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-02 10:40:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 650588    
Bug Blocks:    
Attachments:
Description Flags
oserror.vdsm.log
none
lsof.oserror.vdsm none

Description Haim 2010-07-19 14:33:21 UTC
Created attachment 432905 [details]
oserror.vdsm.log

Description of problem:

the following issue happened twice so far running on particular setup (scale of 180 vsm - 3 per host), where I start to get the following messages in vdsm log:
 
OSError: [Errno 11] Resource temporarily unavailable

it seems like vdsm has 827 open files (using lsof) - see attachment. 

when it occurs system stop to function, host goes to non-operational, and there is nothing to do (maybe kill vdsm service and try again). 

Thread-6432::ERROR::2010-07-19 16:09:11,523::misc::58::irs::[Errno 11] Resource temporarily unavailable
Thread-6494::ERROR::2010-07-19 16:09:11,523::misc::59::irs::Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 973, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1440, in public_getVolumeSize
    apparentsize = str(volume.Volume.getVSize(sdUUID, spUUID, imgUUID, volUUID, bs=1))
  File "/usr/share/vdsm/storage/volume.py", line 249, in getVSize
    return mysd.getVolumeClass().getVSize(mysd, imgUUID, volUUID, bs)
  File "/usr/share/vdsm/storage/blockVolume.py", line 45, in getVSize
    return int(int(sdobj.vg.getLVInfo(volUUID))/bs)
  File "/usr/share/vdsm/storage/vg.py", line 727, in getLVInfo
    return self.lvSize(name)
  File "/usr/share/vdsm/storage/vg.py", line 720, in lvSize
    (rc, out, err) = self.syncExecCmd(name, cmd, exclusive=True)
  File "/usr/share/vdsm/storage/vg.py", line 89, in syncExecCmd
    return misc.execCmd(cmd)
  File "/usr/share/vdsm/storage/misc.py", line 102, in execCmd
    stdin=infile, stdout=outfile, stderr=subprocess.PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 595, in __init__
    if startupinfo is not None:
  File "/usr/lib64/python2.6/subprocess.py", line 1009, in _execute_child
    fcntl.fcntl(fd, fcntl.F_SETFD, old | cloexec_flag)
OSError: [Errno 11] Resource temporarily unavailable

we are not sure if its vdsm leak or pythons. it requires further investigation. 

repro steps (might be hard to reproduce, though it happened twice, so I decided to open it). 

1) make sure setup consist of a 2 hosts or more which runs 60 vms over iscsi
2) reboot libvirtd service 
3) kill some of the vms with kill -9  
4) start more vms

Comment 1 Haim 2010-07-19 14:36:25 UTC
Created attachment 432907 [details]
lsof.oserror.vdsm

Comment 2 Barak 2010-11-28 16:11:42 UTC
It depends on bug 650588.
Added conditional nak on capacity, once the python bug is fixed the issue will be re-examined

Comment 3 Dan Kenigsberg 2011-01-02 10:40:27 UTC

*** This bug has been marked as a duplicate of bug 650588 ***