Description of problem: In the host we see: ~~~ Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Failed to connect socket to '/run/libvirt/virtlogd-sock': Permission denied ~~~ The file has the correct permission and is present: ~~~ $ ls -laZ /run/libvirt/virtlogd-sock srw-------. 1 root root system_u:object_r:virtlogd_var_run_t:s0 0 Dec 1 07:44 /run/libvirt/virtlogd-sock ~~~ Just before this occurs, libvirtd is reporting Too many open files: ~~~ Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Unable to create pipes: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Unable to create pipes: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: unable to clear socket security context 'system_u:system_r:svirt_t:s0:c434,c889': Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Unable to create UNIX socket: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: unable to clear socket security context 'system_u:system_r:svirt_t:s0:c434,c889': Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Unable to create pipes: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Unable to create pipes: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: cannot open directory '/var/lib/libvirt/qemu/domain-5010-fsc-q83exxgb-windows': Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Unable to create pipes: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Hook script execution failed: Unable to create pipes: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Unable to create pipes: Too many open files Feb 11 00:19:03 hostname.example.com libvirtd[4742]: Hook script execution failed: Unable to create pipes: Too many open files ~~~ If we check lsof output: ~~~ $ grep ^libvirtd lsof | wc -l 8270 ~~~ Not sure what these STREAM are exactly, but that is the majority of the entries: ~~~ $ grep ^libvirtd lsof | grep STREAM | wc -l 8101 ~~~ Most look something like: ~~~ libvirtd 4742 0 1u unix 0xffff8d1f9626f080 0t0 652 type=STREAM libvirtd 4742 0 2u unix 0xffff8d1f9626f080 0t0 652 type=STREAM ~~~ In the libvirtd.service file: ~~~ # At least 1 FD per guest, often 2 (eg qemu monitor + qemu agent). # eg if we want to support 4096 guests, we'll typically need 8192 FDs # If changing this, also consider virtlogd.service & virtlockd.service # limits which are also related to number of guests LimitNOFILE=8192 ~~~ Currently there are no guests running: ~~~ $ cat sos_commands/virsh/virsh_-r_list_--all Id Name State -------------------- ~~~ On another host, with similar symptoms, we can see the VM ID count is above 4000, although there are only 9 VM's actually running: ~~~ $ cat sos_commands/virsh/virsh_-r_list_--all Id Name State ---------------------------------------- 1 redacted-name-here running 1679 redacted-name-here running 1904 redacted-name-here running 3109 redacted-name-here running 3407 redacted-name-here running 3455 redacted-name-here running 3459 redacted-name-here running 3502 redacted-name-here running 4123 redacted-name-here running ~~~ It *seems* like VM's that have been shutdown do not release the open files of libvirtd, and that eventually causes it to exceed the limit and then we have a bunch of failures. It's also possible I am looking at this the wrong way, but it seems like virtlogd socket issue was related to above. Rebooting the host resolved the issue, but it will likely re-occur as time goes on. Version-Release number of selected component (if applicable): RHV-H 4.4.9 How reproducible: Only in 1 customer case as of now. Actual results: virtlogd fails to connect to socket, and libvirtd is holding a lot of files open that should not be Expected results: virtlogd works and libvirtd doesn't hold lots of open files that are unneeded Additional info: We could increase the LimitNOFILE of libvirtd, but it will likely hit that eventually as well if the files continue to stay open.
Milan, can you please check this?
A similar issue has been mentioned on libvirt-users mailing list recently: https://listman.redhat.com/archives/libvirt-users/2022-February/msg00044.html There is a related bug: BZ 2007168 And a fix: https://gitlab.com/libvirt/libvirt/-/commit/5de203f8795d96229d2663e9ea1a24fba5db38fc If we had a way to reproduce the problem, we could check if the fix above helps. Maybe we could try to set LimitNOFILE to a low value, to start-stop VMs and see if the error occurs.
I can reproduce a file descriptor leak by setting LimitNOFILE=54 and starting and restarting several VMs quickly. After a while, no VM can be started anymore, with the error "Hook script execution failed: Unable to create pipes: Too many open files". This happens with both libvirt libvirt-daemon-7.6.0-6.el8s.x86_64 and 8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64. Michal, any idea about this?
(In reply to Milan Zamazal from comment #3) > A similar issue has been mentioned on libvirt-users mailing list recently: > https://listman.redhat.com/archives/libvirt-users/2022-February/msg00044.html > There is a related bug: BZ 2007168 > And a fix: This ^^^ > https://gitlab.com/libvirt/libvirt/-/commit/ > 5de203f8795d96229d2663e9ea1a24fba5db38fc > and this ^^^ are two separate problems. In both the problem lies in glib and libvirt is merely just working around broken glib. I admit that my initial gut feeling when reading the libvirt-users e-mail was that it's the same issue, but it turned out, the ML issue is the bug you've linked. Also, the ML issue is reported against CentOS-8, where CentOS is still behind RHEL (which is fixed). (In reply to Milan Zamazal from comment #4) > I can reproduce a file descriptor leak by setting LimitNOFILE=54 and > starting and restarting several VMs quickly. After a while, no VM can be > started anymore, with the error "Hook script execution failed: Unable to > create pipes: Too many open files". This happens with both libvirt > libvirt-daemon-7.6.0-6.el8s.x86_64 and > 8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64. > > Michal, any idea about this? Just to be perfectly clear, you have to allow libvirt to have FDs, otherwise it won't be able to spawn any machines. Therefore, setting too tight limit on NOFILE and seeing errors when starting VMs is expected behaviour. Each VM requires at least one FD for QEMU monitor, another one for the guest agent, and possibly other FDs (depending on config). Not to mention, libvirt opens (and closes) various files/pipes/sockets when starting a VM. Therefore, in order to be sure that it is an FD leak you're seeing you have to check the number of opened FDs *BEFORE* any VM is started, then start VM(s), shut them down, disconnect all the clients and check the number of FDs again. At this point, the numbers should be equal.
> Therefore, in order to be sure that it is an FD leak you're seeing you have to check the number of opened FDs *BEFORE* any VM is started, then start VM(s), shut them down, disconnect all the clients and check the number of FDs again. At this point, the numbers should be equal. Checking with `lsof -p PID' where PID is the one reported by `systemctl status libvirtd'. Starting a VM without a guest OS and powering it off makes one more open file descriptor left per each VM run. After restarting vdsmd and supervdsmd, when no VMs are running, the number of open file descriptors is the same as before the restarted. The leaked file descriptors apparently are (after 4 VM runs): libvirtd 661466 root 41u unix 0xffff990fbf70da00 0t0 29580486 type=STREAM libvirtd 661466 root 42u unix 0xffff991030b7cc80 0t0 29583488 type=STREAM libvirtd 661466 root 43u unix 0xffff991001def500 0t0 29586122 type=STREAM libvirtd 661466 root 44u unix 0xffff991030b7ba80 0t0 29602433 type=STREAM libvirt-daemon-8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64 glib2-2.56.4-158.el8.x86_64 (the same also with glib2-2.56.4-156.el8.x86_64) Is it a known or unknown leak or is there anything else I should check?
(In reply to Milan Zamazal from comment #6) > > Therefore, in order to be sure that it is an FD leak you're seeing you have to check the number of opened FDs *BEFORE* any VM is started, then start VM(s), shut them down, disconnect all the clients and check the number of FDs again. At this point, the numbers should be equal. > > Checking with `lsof -p PID' where PID is the one reported by `systemctl > status libvirtd'. Starting a VM without a guest OS and powering it off makes > one more open file descriptor left per each VM run. After restarting vdsmd > and supervdsmd, when no VMs are running, the number of open file descriptors > is the same as before the restarted. The leaked file descriptors apparently > are (after 4 VM runs): > > libvirtd 661466 root 41u unix 0xffff990fbf70da00 0t0 29580486 > type=STREAM > libvirtd 661466 root 42u unix 0xffff991030b7cc80 0t0 29583488 > type=STREAM > libvirtd 661466 root 43u unix 0xffff991001def500 0t0 29586122 > type=STREAM > libvirtd 661466 root 44u unix 0xffff991030b7ba80 0t0 29602433 > type=STREAM > This smells like bug 2045879. Basically, glib screwed up backport which broke the way we work around its another bug. So we needed another workaround to the workaround. > libvirt-daemon-8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64 > glib2-2.56.4-158.el8.x86_64 (the same also with glib2-2.56.4-156.el8.x86_64) And this assures me that you are seeing that bug. Should be fixed with libvirt-8.0.0-3.module+el8.6.0+14098+5bee65f4 Thanks glib!
Thank you, Michal, for explanation. Indeed, after upgrading libvirt to libvirt-daemon-kvm-8.0.0-4.module+el8.6.0+14253+42d4e2d6.x86_64, the problem is gone. So this problem will be fixed in the next RHV release, once we switch to RHEL 8.6.
Test version: RHVH-4.4-20220321.0-RHVH-x86_64-dvd1.iso libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40.x86_64 # imgbase w You are on rhvh-4.4.10.3-0.20220321.0+1 rpm -qa| grep libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40 libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40.x86_64 RHVH include the correct libvirt package, so the bug is fixed, change bug status to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Virtualization Host security and enhancement update [ovirt-4.4.10] Async #2), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1053