Bug 1372958 - supervdsm running too many open files
Summary: supervdsm running too many open files
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: vdsm
Classification: oVirt
Component: SuperVDSM
Version: 4.18.0
Hardware: x86_64
OS: Linux
high
high vote
Target Milestone: ovirt-4.2.0
: ---
Assignee: Yaniv Bronhaim
QA Contact: eberman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-04 11:14 UTC by Eldad Marciano
Modified: 2017-07-04 10:48 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-04 10:48:15 UTC
oVirt Team: Infra
oourfali: ovirt-4.2?
mgoldboi: planning_ack+
rule-engine: devel_ack?
pstehlik: testing_ack+


Attachments (Terms of Use)

Description Eldad Marciano 2016-09-04 11:14:54 UTC
Description of problem:
vdsm and supervdsm throws exceptions related to too many open files

the engine marks those hosts with the same problem with "activating" status.


Version-Release number of selected component (if applicable):
vdsm-4.18.11-1.el7ev.x86_64

How reproducible:
not clear

Steps to Reproduce:
1. running a setup with 33 hosts 12 NFS SD's and 77 vm per host.
2. sporadically some of the hosts having this problem.
3.

Actual results:
OS error too many open files, engine mark this host as activating for long time.

Expected results:
stable amount of open files.

Additional info:

Comment 8 Yaniv Bronhaim 2016-09-20 09:07:28 UTC
Although its indeed critical I can't say anything about the environment. I don't know what calls were ran perior vdsm got to this state. The logs are not relevant and I can't reproduce it. I'm suspectthat it happened because the storage domain was not reachable for a while and hbaRescan() function hanged forever until reaching to open fds limit - every call to supervdsm opens new fd until the call returns. If all calls get stuck, we end up crossing the limit. 

Nir, can you say if this sounds like a reasonable scenario? Eldad, can you check if such flow reproduce the same behavior?

Comment 9 Eldad Marciano 2016-09-20 15:04:12 UTC
(In reply to Yaniv Bronhaim from comment #8)
> Although its indeed critical I can't say anything about the environment. I
> don't know what calls were ran perior vdsm got to this state. The logs are
> not relevant and I can't reproduce it. I'm suspectthat it happened because
> the storage domain was not reachable for a while and hbaRescan() function
> hanged forever until reaching to open fds limit - every call to supervdsm
> opens new fd until the call returns. If all calls get stuck, we end up
> crossing the limit. 
> 
> Nir, can you say if this sounds like a reasonable scenario? Eldad, can you
> check if such flow reproduce the same behavior?

yes, i'll try fetch up some priority for that.
but what you mentioning, opening too many fd's for that scenarios sounds like a overhead, what about trying to do that via single fd at least for the hbascan?, also what about check if there some existing fds for that purpose.

Comment 10 Oved Ourfali 2016-09-21 11:37:07 UTC
Restoring needinfo on Nir.

Comment 11 Oved Ourfali 2016-11-17 12:37:16 UTC
Moving to 4.0.7 as we don't get the required info.

Comment 12 Eldad Marciano 2016-12-21 10:06:15 UTC
Oved, please lets keep it open or re target it, since we dont have much priority for it.

Comment 13 Oved Ourfali 2016-12-21 10:07:13 UTC
It is open.


Note You need to log in before you can comment on or make changes to this bug.