Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 922517

Summary:

vdsm: [scale] vdsm restarts after getting "No free file handlers in pool" when deleting (+ wipe after delete) ~150 vms

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Dafna Ron <dron>

Component:

vdsm

Assignee:

Yaniv Bronhaim <ybronhei>

Status:

CLOSED DUPLICATE

QA Contact:

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.2.0

CC:

bazulay, ewarszaw, hateya, iheim, lpeer, sgrinber, ykaul

Target Milestone:

---

Keywords:

Regression

Target Release:

3.2.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

infra

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

948210 (view as bug list)

Environment:

Last Closed:

2013-04-04 10:18:04 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Infra

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

948210

Attachments:

Description	Flags
logs	none

Description Dafna Ron 2013-03-17 15:36:54 UTC

Created attachment 711443 [details]
logs

Description of problem:

to verify bug 910013 I deleted 150 vm's with wipe=true
we start seeing "No free file handlers in pool" and than vdsm restarts

Version-Release number of selected component (if applicable):

sf10
4.10-11.0

How reproducible:

Steps to Reproduce:
1. create a 2 hosts iscsi pool with 3 domains 100G each
2. create a wipe=true template (1GB disk)
3. create 3 pools from the template with 50 vm's on each pool
4. detach and remove the vm from each pool (I detached -> removed each pool at a time without waiting for the delete to end on the previouse pool). 
  
Actual results:

vdsm restart

Expected results:

vdsm should not restart

Additional info:logs

Comment 2 Barak 2013-03-19 09:40:31 UTC

What are the numbers we need to support here ?

Comment 3 Barak 2013-03-19 09:41:52 UTC

Is it a duplicate of Bug 920532 ?

Comment 4 Simon Grinberg 2013-03-19 10:54:28 UTC

(In reply to comment #2)
> What are the numbers we need to support here ?

There is no hard number per host, but a ratio between VCPU and cores so for Desktops use case we say 10:1. So if we take that to the current limits, on a 64Core Machine it may come to 640VMs. http://www.redhat.com/resourcelibrary/whitepapers/rhev-server-whitepaper-specvirt got to 7:1 on average.


The ratio of VMs to threads is for you to calculate it's internal design as far as I see it.

Comment 5 Barak 2013-03-24 13:03:19 UTC

Eduardo, what is the reason we run out of file descriptors ?
Is it due to the pool thread limitation ?

Comment 6 Barak 2013-03-24 17:07:48 UTC

Is Bug 920532 a duplicate ?

Comment 7 Barak 2013-04-02 08:59:21 UTC

per comment 10 in Bug 920532  we will increase the amount of FDs allowed in VDSM .
Not closing as duplicate since both scenarios need to be tested.

Comment 9 Eduardo Warszawski 2013-04-03 12:56:02 UTC

(In reply to comment #5)
> Eduardo, what is the reason we run out of file descriptors ?
> Is it due to the pool thread limitation ?

This is not a duplicate of Bug 910013.

The file descriptors are open as part of the scheduling due to the postZero=True flag.

This is unnecessary, specially in case of file SD's.

In addition there is probably a fd leak related to Bug 920532. (Open fd's that are not closed.)

IMHO the solution is in two parts:
1) Storage ops can (and should) avoid to use the task recovery mechanism, as we did with deletes with postZero=False.
(This should alleviate the issue.)

2) Prove or discard a fd leak.

Comment 11 Barak 2013-04-04 10:18:04 UTC

Since the same solution on this issue for 3.2 is to increase the ulimit,
Hence closing this bug as duplicate of Bug 920532

*** This bug has been marked as a duplicate of bug 920532 ***