RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1435966 - [scale] Sanlock fails with "Too many open files" when running hundreds of vms (HA or not) due the sanlock locking manager that is needed to hot plug leases.
Summary: [scale] Sanlock fails with "Too many open files" when running hundreds of vms...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sanlock
Version: 7.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: pre-dev-freeze
: ---
Assignee: David Teigland
QA Contact:
URL:
Whiteboard:
: 1435965 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-26 10:24 UTC by guy chen
Modified: 2021-09-03 12:11 UTC (History)
11 users (show)

Fixed In Version: sanlock-3.5.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-25 14:55:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description guy chen 2017-03-26 10:24:12 UTC
Description of problem:

There are a lot of sanlock leases on the system - about 1 for each VM, that leads to "Too many open file" error eventually, although "Target Storage Domain for VM Lease" High availability property is "No VM Lease"

Version-Release number of selected component (if applicable):
vdsm 4.19.9

How reproducible:
Always

Steps to Reproduce:
1.Created 1005 VMS
2.Start the VMS
3.Getting error of "Too many open file" on the VDSM log and sanlock process get to 100%.

Actual results:
Getting error of "Too many open file" on the VDSM log and sanlock process get to 100%.

Expected results:
Sanlock should not open 1 lease per VM

Additional info:
Logs will be attached

Comment 3 guy chen 2017-03-26 10:31:25 UTC
A workaround for the Too many open files error is to increase the sanlock ulimits, i have put the below value for an amount of 1000 VMS :

LimitNOFILE=10000 at 
/usr/lib/systemd/system/sanlock.service

Comment 5 guy chen 2017-03-26 13:53:22 UTC
*** Bug 1435965 has been marked as a duplicate of this bug. ***

Comment 6 Nir Soffer 2017-03-26 14:02:36 UTC
Please attach the output of:

    readlink /proc/<sanlock pid>/fd/*
    sanlock client status

When you get the error about too many open files.

And /var/log/sanlock.log.

Comment 7 guy chen 2017-03-26 14:04:55 UTC
Will attached the need info, also I have checked both the VMS and the templates of the pool - in both cases the HA is disabled :


<name>2_HDD_No_NIC</name>
<link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/permissions" rel="permissions"/>
<link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/tags" rel="tags"/>
<link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/watchdogs" rel="watchdogs"/>
<link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/cdroms" rel="cdroms"/>
<link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/nics" rel="nics"/>
<link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/diskattachments" rel="diskattachments"/>
<link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/graphicsconsoles" rel="graphicsconsoles"/>
<bios>...</bios>
<cpu>...</cpu>
<cpu_shares>0</cpu_shares>
<creation_time>2017-03-15T19:05:28.341+02:00</creation_time>
<delete_protected>false</delete_protected>
<display>...</display>
<high_availability>
<enabled>false</enabled>
<priority>1</priority>
</high_availability>
<io>

Comment 8 guy chen 2017-03-27 06:36:56 UTC
Created attachment 1266557 [details]
sanlock log

Comment 10 guy chen 2017-03-27 06:40:30 UTC
Added (In reply to Nir Soffer from comment #6)
> Please attach the output of:
> 
>     readlink /proc/<sanlock pid>/fd/*
>     sanlock client status
> 
> When you get the error about too many open files.
> 
> And /var/log/sanlock.log.

added fs and sanlock log once the error occurred.
sanlock client status doesn't print any output.

Comment 11 guy chen 2017-03-27 13:26:04 UTC
Just a comment, after investigation with Raz it does not related to HA.

Comment 12 Nir Soffer 2017-03-27 14:21:03 UTC
(In reply to guy chen from comment #10)
> added fs and sanlock log once the error occurred.
> sanlock client status doesn't print any output.

In sanlock_fs, we see exactly 1024 files - makes sense since this is the default
limit, and sanlock probably does not modify the default.

$ wc -l sanlock_fs 
1024 sanlock_fs

If we looks at the values, we see lot of sockets:

$ sort sanlock_fs | grep socket | wc -l
1012

$ sort sanlock_fs | grep socket | head -10
socket:[110826]
socket:[1282058]
socket:[1282113]
socket:[1282114]
socket:[1282190]
socket:[1282225]
socket:[1282246]
socket:[1282247]
socket:[1282248]
socket:[1282269]

These sockets are a client connection to sanlock, this may be because this vm
has a lease, or there other client connection. We need the output of "sanlock
client status" to understand this, but we have no output.

I guess that when sanlock is out of file descriptors, sanlock client commands
will fail to connect to sanlock, so it is expected to fail.

Looking in sanlock log and sanlock log dump we don't see traces for unexpected
leases:

$ egrep 's[0-9]:r[0-9]' sanlock.log 
2017-03-20 21:08:21+0200 11419 [1695]: s4:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:179aceeb-6889-457a-a754-6c576c3fd987:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:976224256 for 94,110,12497
2017-03-20 21:21:44+0200 12222 [15618]: s6:r2 resource f8f4650b-abf2-4322-8aa7-197e3be6059b:179aceeb-6889-457a-a754-6c576c3fd987:/dev/f8f4650b-abf2-4322-8aa7-197e3be6059b/leases:113246208 for 94,110,12497
2017-03-20 21:21:46+0200 12224 [15510]: s6:r3 resource f8f4650b-abf2-4322-8aa7-197e3be6059b:fa76eda2-9afa-41af-b1ee-02d9dbaeb808:/dev/f8f4650b-abf2-4322-8aa7-197e3be6059b/leases:114294784 for 94,110,12497
2017-03-20 21:57:11+0200 14349 [1695]: s4:r4 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:edf38333-dd12-4900-bfb6-9a4f1e030cfd:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:975175680 for 94,110,12497
2017-03-21 15:05:26+0200 76044 [1695]: s4:r5 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 94,110,12497
2017-03-22 08:37:06+0200 244 [1640]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,12078
2017-03-22 12:40:06+0200 249 [1677]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,17,13058
2017-03-22 14:34:31+0200 249 [1679]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,13022
2017-03-23 16:06:54+0200 286 [1682]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,12569
2017-03-26 12:30:35+0300 242907 [1682]: s1:r2 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,58689
2017-03-26 17:28:58+0300 261 [16088]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,14,13273

But we see huge amount of renewal failures:

$ grep 'renewal error' sanlock.log | wc -l
1798

Please shutdown some vms until you get output from sanlock client status, and then
attach the output. If shutting down vms does not give the needed output, please
reboot the host and collect the output from "sanlock client status" while 
adding vms, so we have the output just before the point when sanlock is out
of open files.

Another way is to increase sanlock open files limit and then get the output from
"sanlock client status".

Comment 13 Nir Soffer 2017-03-27 14:28:32 UTC
David, can you take a look and explain why sanlock is out of open file descriptors?

We don't use vm or volume leases in the ~1000 vms running on this host.

Comment 15 Nir Soffer 2017-03-29 12:01:28 UTC
In sanlock client status, we see many processes connected to sanlock:

$ egrep "^p\b.+_" sanlock_client_status.log | head -10
p 19530 2_HDD_with_NIC-155
p 19600 2_HDD_with_NIC-125
p 19687 2_HDD_with_NIC-174
p 19876 2_HDD_with_NIC-178
p 20164 2_HDD_with_NIC-106
p 20324 2_HDD_with_NIC-223
p 20500 2_HDD_with_NIC-264
p 20595 2_HDD_with_NIC-145
p 20751 2_HDD_with_NIC-116
p 20906 2_HDD_with_NIC-233

$ egrep "^p\b.+_" sanlock_client_status.log | wc -l
999

But we don't use any leases, this is very strange, maybe this is libvirt bug.

We need to debug this environment in this state.

Comment 16 Nir Soffer 2017-03-29 13:22:49 UTC
Looking on another host with 240 vms we see the same picture - every qemu-kvm
process has an open unix socket to /var/run/sanlock/sanlock.sock - although
no leases are used.

[root@buri01 ~]# lsof -U /var/run/sanlock/sanlock.sock | awk '/qemu-kvm/ {print $2}' | uniq | wc -l
239

[root@buri01 ~]# ps aux | grep qemu-kvm | grep -v grep | wc -l
239

# lsof -U /var/run/sanlock/sanlock.sock | awk '/qemu-kvm/ {print $2}' | sort | head -1
10001

# lsof -p 10001 | egrep unix.+socket
qemu-kvm 10001 qemu    3u     unix 0xffff8817502c9000        0t0 645039 socket
qemu-kvm 10001 qemu   17u     unix 0xffff8817502cbc00        0t0 645049 socket
qemu-kvm 10001 qemu   18u     unix 0xffff8817502cd800        0t0 645050 socket
qemu-kvm 10001 qemu   30u     unix 0xffff8813b4fe2c00        0t0 642408 socket
qemu-kvm 10001 qemu   31u     unix 0xffff8813b4fe6000        0t0 642409 socket

Peter, maybe libvirt is opening a sanlock connection and letting qemu inherit the
file descriptor by mistake, when no lease is used?

Comment 17 Nir Soffer 2017-03-29 13:37:32 UTC
I discussed this with Peter on irc, the sanlock connection per vm is probably
required by the lock manager driver to allow hot plug a lease.

It seems that sanlock needs to increase the default open file limit to support
this scale.

Waiting for Daniel confirmation on this.

Comment 18 Yaniv Lavi 2017-03-29 14:20:42 UTC
In general only HA VMs should be in this list since you can't hot plug a lease to a non HA VM.

Comment 19 Yaniv Lavi 2017-03-29 14:22:25 UTC
This can wait for 4.1.2. Lowering severity and priority since the scale is above the tested supported limits.

Comment 20 Nir Soffer 2017-03-29 14:48:05 UTC
Guy, for completing scale testing, please increase sanlock open file limit to 
2048 in sanlock unit file.

Comment 21 guy chen 2017-03-29 16:22:11 UTC
Changed to 2048 in sanlock unit file.
No error if passing 1000 VMS.

Comment 22 Nir Soffer 2017-04-08 13:56:44 UTC
Fixed upstream in sanlock:
https://pagure.io/sanlock/c/64502dfee897d1690e6fa2a18fa0e5587bb7bbd3?branch=master

Comment 23 Nir Soffer 2017-04-08 14:03:03 UTC
(adding back needinfo removed in comment 21)

Daniel, can you confirm that having an open sanlock connection in qemu for every
vm even when a vm does not have a lease is expected?

See comment 16 and comment 17.

Comment 24 Daniel Berrangé 2017-04-10 08:28:13 UTC
Libvirtd opens the sanlock connection when starting the VM and then passes the FD to QEMU. So you'll need to allow 1 FD per VM you want running for sanlock.

Comment 25 Nir Soffer 2017-04-16 11:05:16 UTC
Moving to sanlock since this is sanlock configuration issue, and it was already
fixed in upstream.

Comment 28 David Teigland 2017-05-25 14:27:33 UTC
This is already in 7.4, but I missed this bz.  We could just close this bug as fixed in the next release, or we could get flags/acks and attach it to the errata (I'd suggest the former.)

Comment 29 Allon Mureinik 2017-05-25 14:45:59 UTC
(In reply to David Teigland from comment #28)
> This is already in 7.4, but I missed this bz.  We could just close this bug
> as fixed in the next release, or we could get flags/acks and attach it to
> the errata (I'd suggest the former.)

With no customer ticket attached, I don't think it's worth the hassle adding it to the erratum.

I'd like to have an exact version, though, so we can require it in VDSM.

Comment 30 David Teigland 2017-05-25 14:55:19 UTC
The patch is in sanlock-3.5.0-1


Note You need to log in before you can comment on or make changes to this bug.