Bug 1435966
| Summary: | [scale] Sanlock fails with "Too many open files" when running hundreds of vms (HA or not) due the sanlock locking manager that is needed to hot plug leases. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | guy chen <guchen> |
| Component: | sanlock | Assignee: | David Teigland <teigland> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.3 | CC: | agk, amureini, berrange, bugs, cluster-maint, guchen, mgoldboi, pkrempa, teigland, tnisan, ylavi |
| Target Milestone: | pre-dev-freeze | Keywords: | Performance |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | sanlock-3.5.0-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-05-25 14:55:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
guy chen
2017-03-26 10:24:12 UTC
A workaround for the Too many open files error is to increase the sanlock ulimits, i have put the below value for an amount of 1000 VMS : LimitNOFILE=10000 at /usr/lib/systemd/system/sanlock.service *** Bug 1435965 has been marked as a duplicate of this bug. *** Please attach the output of:
readlink /proc/<sanlock pid>/fd/*
sanlock client status
When you get the error about too many open files.
And /var/log/sanlock.log.
Will attached the need info, also I have checked both the VMS and the templates of the pool - in both cases the HA is disabled : <name>2_HDD_No_NIC</name> <link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/permissions" rel="permissions"/> <link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/tags" rel="tags"/> <link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/watchdogs" rel="watchdogs"/> <link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/cdroms" rel="cdroms"/> <link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/nics" rel="nics"/> <link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/diskattachments" rel="diskattachments"/> <link href="/ovirt-engine/api/templates/685e3777-1198-4503-a9c1-5b57b182885a/graphicsconsoles" rel="graphicsconsoles"/> <bios>...</bios> <cpu>...</cpu> <cpu_shares>0</cpu_shares> <creation_time>2017-03-15T19:05:28.341+02:00</creation_time> <delete_protected>false</delete_protected> <display>...</display> <high_availability> <enabled>false</enabled> <priority>1</priority> </high_availability> <io> Created attachment 1266557 [details]
sanlock log
Added (In reply to Nir Soffer from comment #6) > Please attach the output of: > > readlink /proc/<sanlock pid>/fd/* > sanlock client status > > When you get the error about too many open files. > > And /var/log/sanlock.log. added fs and sanlock log once the error occurred. sanlock client status doesn't print any output. Just a comment, after investigation with Raz it does not related to HA. (In reply to guy chen from comment #10) > added fs and sanlock log once the error occurred. > sanlock client status doesn't print any output. In sanlock_fs, we see exactly 1024 files - makes sense since this is the default limit, and sanlock probably does not modify the default. $ wc -l sanlock_fs 1024 sanlock_fs If we looks at the values, we see lot of sockets: $ sort sanlock_fs | grep socket | wc -l 1012 $ sort sanlock_fs | grep socket | head -10 socket:[110826] socket:[1282058] socket:[1282113] socket:[1282114] socket:[1282190] socket:[1282225] socket:[1282246] socket:[1282247] socket:[1282248] socket:[1282269] These sockets are a client connection to sanlock, this may be because this vm has a lease, or there other client connection. We need the output of "sanlock client status" to understand this, but we have no output. I guess that when sanlock is out of file descriptors, sanlock client commands will fail to connect to sanlock, so it is expected to fail. Looking in sanlock log and sanlock log dump we don't see traces for unexpected leases: $ egrep 's[0-9]:r[0-9]' sanlock.log 2017-03-20 21:08:21+0200 11419 [1695]: s4:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:179aceeb-6889-457a-a754-6c576c3fd987:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:976224256 for 94,110,12497 2017-03-20 21:21:44+0200 12222 [15618]: s6:r2 resource f8f4650b-abf2-4322-8aa7-197e3be6059b:179aceeb-6889-457a-a754-6c576c3fd987:/dev/f8f4650b-abf2-4322-8aa7-197e3be6059b/leases:113246208 for 94,110,12497 2017-03-20 21:21:46+0200 12224 [15510]: s6:r3 resource f8f4650b-abf2-4322-8aa7-197e3be6059b:fa76eda2-9afa-41af-b1ee-02d9dbaeb808:/dev/f8f4650b-abf2-4322-8aa7-197e3be6059b/leases:114294784 for 94,110,12497 2017-03-20 21:57:11+0200 14349 [1695]: s4:r4 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:edf38333-dd12-4900-bfb6-9a4f1e030cfd:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:975175680 for 94,110,12497 2017-03-21 15:05:26+0200 76044 [1695]: s4:r5 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 94,110,12497 2017-03-22 08:37:06+0200 244 [1640]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,12078 2017-03-22 12:40:06+0200 249 [1677]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,17,13058 2017-03-22 14:34:31+0200 249 [1679]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,13022 2017-03-23 16:06:54+0200 286 [1682]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,12569 2017-03-26 12:30:35+0300 242907 [1682]: s1:r2 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,16,58689 2017-03-26 17:28:58+0300 261 [16088]: s1:r1 resource 69f885a3-3eaf-4971-8a94-fa9f127fbc82:SDM:/dev/69f885a3-3eaf-4971-8a94-fa9f127fbc82/leases:1048576 for 2,14,13273 But we see huge amount of renewal failures: $ grep 'renewal error' sanlock.log | wc -l 1798 Please shutdown some vms until you get output from sanlock client status, and then attach the output. If shutting down vms does not give the needed output, please reboot the host and collect the output from "sanlock client status" while adding vms, so we have the output just before the point when sanlock is out of open files. Another way is to increase sanlock open files limit and then get the output from "sanlock client status". David, can you take a look and explain why sanlock is out of open file descriptors? We don't use vm or volume leases in the ~1000 vms running on this host. In sanlock client status, we see many processes connected to sanlock: $ egrep "^p\b.+_" sanlock_client_status.log | head -10 p 19530 2_HDD_with_NIC-155 p 19600 2_HDD_with_NIC-125 p 19687 2_HDD_with_NIC-174 p 19876 2_HDD_with_NIC-178 p 20164 2_HDD_with_NIC-106 p 20324 2_HDD_with_NIC-223 p 20500 2_HDD_with_NIC-264 p 20595 2_HDD_with_NIC-145 p 20751 2_HDD_with_NIC-116 p 20906 2_HDD_with_NIC-233 $ egrep "^p\b.+_" sanlock_client_status.log | wc -l 999 But we don't use any leases, this is very strange, maybe this is libvirt bug. We need to debug this environment in this state. Looking on another host with 240 vms we see the same picture - every qemu-kvm
process has an open unix socket to /var/run/sanlock/sanlock.sock - although
no leases are used.
[root@buri01 ~]# lsof -U /var/run/sanlock/sanlock.sock | awk '/qemu-kvm/ {print $2}' | uniq | wc -l
239
[root@buri01 ~]# ps aux | grep qemu-kvm | grep -v grep | wc -l
239
# lsof -U /var/run/sanlock/sanlock.sock | awk '/qemu-kvm/ {print $2}' | sort | head -1
10001
# lsof -p 10001 | egrep unix.+socket
qemu-kvm 10001 qemu 3u unix 0xffff8817502c9000 0t0 645039 socket
qemu-kvm 10001 qemu 17u unix 0xffff8817502cbc00 0t0 645049 socket
qemu-kvm 10001 qemu 18u unix 0xffff8817502cd800 0t0 645050 socket
qemu-kvm 10001 qemu 30u unix 0xffff8813b4fe2c00 0t0 642408 socket
qemu-kvm 10001 qemu 31u unix 0xffff8813b4fe6000 0t0 642409 socket
Peter, maybe libvirt is opening a sanlock connection and letting qemu inherit the
file descriptor by mistake, when no lease is used?
I discussed this with Peter on irc, the sanlock connection per vm is probably required by the lock manager driver to allow hot plug a lease. It seems that sanlock needs to increase the default open file limit to support this scale. Waiting for Daniel confirmation on this. In general only HA VMs should be in this list since you can't hot plug a lease to a non HA VM. This can wait for 4.1.2. Lowering severity and priority since the scale is above the tested supported limits. Guy, for completing scale testing, please increase sanlock open file limit to 2048 in sanlock unit file. Changed to 2048 in sanlock unit file. No error if passing 1000 VMS. Fixed upstream in sanlock: https://pagure.io/sanlock/c/64502dfee897d1690e6fa2a18fa0e5587bb7bbd3?branch=master (adding back needinfo removed in comment 21) Daniel, can you confirm that having an open sanlock connection in qemu for every vm even when a vm does not have a lease is expected? See comment 16 and comment 17. Libvirtd opens the sanlock connection when starting the VM and then passes the FD to QEMU. So you'll need to allow 1 FD per VM you want running for sanlock. Moving to sanlock since this is sanlock configuration issue, and it was already fixed in upstream. This is already in 7.4, but I missed this bz. We could just close this bug as fixed in the next release, or we could get flags/acks and attach it to the errata (I'd suggest the former.) (In reply to David Teigland from comment #28) > This is already in 7.4, but I missed this bz. We could just close this bug > as fixed in the next release, or we could get flags/acks and attach it to > the errata (I'd suggest the former.) With no customer ticket attached, I don't think it's worth the hassle adding it to the erratum. I'd like to have an exact version, though, so we can require it in VDSM. The patch is in sanlock-3.5.0-1 |