Bug 922517 - vdsm: [scale] vdsm restarts after getting "No free file handlers in pool" when deleting (+ wipe after delete) ~150 vms
Summary: vdsm: [scale] vdsm restarts after getting "No free file handlers in pool" wh...
Keywords:
Status: CLOSED DUPLICATE of bug 920532
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.2.0
Assignee: Yaniv Bronhaim
QA Contact:
URL:
Whiteboard: infra
Depends On:
Blocks: 948210
TreeView+ depends on / blocked
 
Reported: 2013-03-17 15:36 UTC by Dafna Ron
Modified: 2016-02-10 19:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 948210 (view as bug list)
Environment:
Last Closed: 2013-04-04 10:18:04 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (1.19 MB, application/x-gzip)
2013-03-17 15:36 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2013-03-17 15:36:54 UTC
Created attachment 711443 [details]
logs

Description of problem:

to verify bug 910013 I deleted 150 vm's with wipe=true
we start seeing "No free file handlers in pool" and than vdsm restarts

Version-Release number of selected component (if applicable):

sf10
4.10-11.0

How reproducible:

Steps to Reproduce:
1. create a 2 hosts iscsi pool with 3 domains 100G each
2. create a wipe=true template (1GB disk)
3. create 3 pools from the template with 50 vm's on each pool
4. detach and remove the vm from each pool (I detached -> removed each pool at a time without waiting for the delete to end on the previouse pool). 
  
Actual results:

vdsm restart

Expected results:

vdsm should not restart

Additional info:logs

Comment 2 Barak 2013-03-19 09:40:31 UTC
What are the numbers we need to support here ?

Comment 3 Barak 2013-03-19 09:41:52 UTC
Is it a duplicate of Bug 920532 ?

Comment 4 Simon Grinberg 2013-03-19 10:54:28 UTC
(In reply to comment #2)
> What are the numbers we need to support here ?

There is no hard number per host, but a ratio between VCPU and cores so for Desktops use case we say 10:1. So if we take that to the current limits, on a 64Core Machine it may come to 640VMs. http://www.redhat.com/resourcelibrary/whitepapers/rhev-server-whitepaper-specvirt got to 7:1 on average.


The ratio of VMs to threads is for you to calculate it's internal design as far as I see it.

Comment 5 Barak 2013-03-24 13:03:19 UTC
Eduardo, what is the reason we run out of file descriptors ?
Is it due to the pool thread limitation ?

Comment 6 Barak 2013-03-24 17:07:48 UTC
Is Bug 920532 a duplicate ?

Comment 7 Barak 2013-04-02 08:59:21 UTC
per comment 10 in Bug 920532  we will increase the amount of FDs allowed in VDSM .
Not closing as duplicate since both scenarios need to be tested.

Comment 9 Eduardo Warszawski 2013-04-03 12:56:02 UTC
(In reply to comment #5)
> Eduardo, what is the reason we run out of file descriptors ?
> Is it due to the pool thread limitation ?

This is not a duplicate of Bug 910013.

The file descriptors are open as part of the scheduling due to the postZero=True flag.

This is unnecessary, specially in case of file SD's.

In addition there is probably a fd leak related to Bug 920532. (Open fd's that are not closed.)

IMHO the solution is in two parts:
1) Storage ops can (and should) avoid to use the task recovery mechanism, as we did with deletes with postZero=False.
(This should alleviate the issue.)

2) Prove or discard a fd leak.

Comment 11 Barak 2013-04-04 10:18:04 UTC
Since the same solution on this issue for 3.2 is to increase the ulimit,
Hence closing this bug as duplicate of Bug 920532

*** This bug has been marked as a duplicate of bug 920532 ***


Note You need to log in before you can comment on or make changes to this bug.