Created attachment 711443 [details] logs Description of problem: to verify bug 910013 I deleted 150 vm's with wipe=true we start seeing "No free file handlers in pool" and than vdsm restarts Version-Release number of selected component (if applicable): sf10 4.10-11.0 How reproducible: Steps to Reproduce: 1. create a 2 hosts iscsi pool with 3 domains 100G each 2. create a wipe=true template (1GB disk) 3. create 3 pools from the template with 50 vm's on each pool 4. detach and remove the vm from each pool (I detached -> removed each pool at a time without waiting for the delete to end on the previouse pool). Actual results: vdsm restart Expected results: vdsm should not restart Additional info:logs
What are the numbers we need to support here ?
Is it a duplicate of Bug 920532 ?
(In reply to comment #2) > What are the numbers we need to support here ? There is no hard number per host, but a ratio between VCPU and cores so for Desktops use case we say 10:1. So if we take that to the current limits, on a 64Core Machine it may come to 640VMs. http://www.redhat.com/resourcelibrary/whitepapers/rhev-server-whitepaper-specvirt got to 7:1 on average. The ratio of VMs to threads is for you to calculate it's internal design as far as I see it.
Eduardo, what is the reason we run out of file descriptors ? Is it due to the pool thread limitation ?
Is Bug 920532 a duplicate ?
per comment 10 in Bug 920532 we will increase the amount of FDs allowed in VDSM . Not closing as duplicate since both scenarios need to be tested.
(In reply to comment #5) > Eduardo, what is the reason we run out of file descriptors ? > Is it due to the pool thread limitation ? This is not a duplicate of Bug 910013. The file descriptors are open as part of the scheduling due to the postZero=True flag. This is unnecessary, specially in case of file SD's. In addition there is probably a fd leak related to Bug 920532. (Open fd's that are not closed.) IMHO the solution is in two parts: 1) Storage ops can (and should) avoid to use the task recovery mechanism, as we did with deletes with postZero=False. (This should alleviate the issue.) 2) Prove or discard a fd leak.
Since the same solution on this issue for 3.2 is to increase the ulimit, Hence closing this bug as duplicate of Bug 920532 *** This bug has been marked as a duplicate of bug 920532 ***