Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 922517

Summary: vdsm: [scale] vdsm restarts after getting "No free file handlers in pool" when deleting (+ wipe after delete) ~150 vms
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: bazulay, ewarszaw, hateya, iheim, lpeer, sgrinber, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 948210 (view as bug list) Environment:
Last Closed: 2013-04-04 10:18:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 948210    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-03-17 15:36:54 UTC
Created attachment 711443 [details]
logs

Description of problem:

to verify bug 910013 I deleted 150 vm's with wipe=true
we start seeing "No free file handlers in pool" and than vdsm restarts

Version-Release number of selected component (if applicable):

sf10
4.10-11.0

How reproducible:

Steps to Reproduce:
1. create a 2 hosts iscsi pool with 3 domains 100G each
2. create a wipe=true template (1GB disk)
3. create 3 pools from the template with 50 vm's on each pool
4. detach and remove the vm from each pool (I detached -> removed each pool at a time without waiting for the delete to end on the previouse pool). 
  
Actual results:

vdsm restart

Expected results:

vdsm should not restart

Additional info:logs

Comment 2 Barak 2013-03-19 09:40:31 UTC
What are the numbers we need to support here ?

Comment 3 Barak 2013-03-19 09:41:52 UTC
Is it a duplicate of Bug 920532 ?

Comment 4 Simon Grinberg 2013-03-19 10:54:28 UTC
(In reply to comment #2)
> What are the numbers we need to support here ?

There is no hard number per host, but a ratio between VCPU and cores so for Desktops use case we say 10:1. So if we take that to the current limits, on a 64Core Machine it may come to 640VMs. http://www.redhat.com/resourcelibrary/whitepapers/rhev-server-whitepaper-specvirt got to 7:1 on average.


The ratio of VMs to threads is for you to calculate it's internal design as far as I see it.

Comment 5 Barak 2013-03-24 13:03:19 UTC
Eduardo, what is the reason we run out of file descriptors ?
Is it due to the pool thread limitation ?

Comment 6 Barak 2013-03-24 17:07:48 UTC
Is Bug 920532 a duplicate ?

Comment 7 Barak 2013-04-02 08:59:21 UTC
per comment 10 in Bug 920532  we will increase the amount of FDs allowed in VDSM .
Not closing as duplicate since both scenarios need to be tested.

Comment 9 Eduardo Warszawski 2013-04-03 12:56:02 UTC
(In reply to comment #5)
> Eduardo, what is the reason we run out of file descriptors ?
> Is it due to the pool thread limitation ?

This is not a duplicate of Bug 910013.

The file descriptors are open as part of the scheduling due to the postZero=True flag.

This is unnecessary, specially in case of file SD's.

In addition there is probably a fd leak related to Bug 920532. (Open fd's that are not closed.)

IMHO the solution is in two parts:
1) Storage ops can (and should) avoid to use the task recovery mechanism, as we did with deletes with postZero=False.
(This should alleviate the issue.)

2) Prove or discard a fd leak.

Comment 11 Barak 2013-04-04 10:18:04 UTC
Since the same solution on this issue for 3.2 is to increase the ulimit,
Hence closing this bug as duplicate of Bug 920532

*** This bug has been marked as a duplicate of bug 920532 ***