Bug 2057048
Summary: | consume libvirt fix for: Failed to connect socket to '/run/libvirt/virtlogd-sock' - possibly caused by Too many open files from libvirtd | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | amashah |
Component: | redhat-virtualization-host | Assignee: | Sandro Bonazzola <sbonazzo> |
Status: | CLOSED ERRATA | QA Contact: | cshao <cshao> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4.9 | CC: | abpatil, ahadas, emarcus, mavital, mgokhool, michal.skrivanek, mkalinin, mprivozn, mzamazal, sfroemer, usurse |
Target Milestone: | ovirt-4.4.10-3 | Keywords: | Rebase, Tracking, ZStream |
Target Release: | 4.4.10 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40 | Doc Type: | Rebase: Bug Fixes Only |
Doc Text: |
Rebase package(s) to version: libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40
Highlights and important bug fixes: consume libvirt fix for failure to connect socket to '/run/libvirt/virtlogd-sock' - possibly caused by too many open files from libvirtd.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-24 13:30:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2045879, 2060945, 2063286 | ||
Bug Blocks: |
Description
amashah
2022-02-22 15:57:40 UTC
Milan, can you please check this? A similar issue has been mentioned on libvirt-users mailing list recently: https://listman.redhat.com/archives/libvirt-users/2022-February/msg00044.html There is a related bug: BZ 2007168 And a fix: https://gitlab.com/libvirt/libvirt/-/commit/5de203f8795d96229d2663e9ea1a24fba5db38fc If we had a way to reproduce the problem, we could check if the fix above helps. Maybe we could try to set LimitNOFILE to a low value, to start-stop VMs and see if the error occurs. I can reproduce a file descriptor leak by setting LimitNOFILE=54 and starting and restarting several VMs quickly. After a while, no VM can be started anymore, with the error "Hook script execution failed: Unable to create pipes: Too many open files". This happens with both libvirt libvirt-daemon-7.6.0-6.el8s.x86_64 and 8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64. Michal, any idea about this? (In reply to Milan Zamazal from comment #3) > A similar issue has been mentioned on libvirt-users mailing list recently: > https://listman.redhat.com/archives/libvirt-users/2022-February/msg00044.html > There is a related bug: BZ 2007168 > And a fix: This ^^^ > https://gitlab.com/libvirt/libvirt/-/commit/ > 5de203f8795d96229d2663e9ea1a24fba5db38fc > and this ^^^ are two separate problems. In both the problem lies in glib and libvirt is merely just working around broken glib. I admit that my initial gut feeling when reading the libvirt-users e-mail was that it's the same issue, but it turned out, the ML issue is the bug you've linked. Also, the ML issue is reported against CentOS-8, where CentOS is still behind RHEL (which is fixed). (In reply to Milan Zamazal from comment #4) > I can reproduce a file descriptor leak by setting LimitNOFILE=54 and > starting and restarting several VMs quickly. After a while, no VM can be > started anymore, with the error "Hook script execution failed: Unable to > create pipes: Too many open files". This happens with both libvirt > libvirt-daemon-7.6.0-6.el8s.x86_64 and > 8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64. > > Michal, any idea about this? Just to be perfectly clear, you have to allow libvirt to have FDs, otherwise it won't be able to spawn any machines. Therefore, setting too tight limit on NOFILE and seeing errors when starting VMs is expected behaviour. Each VM requires at least one FD for QEMU monitor, another one for the guest agent, and possibly other FDs (depending on config). Not to mention, libvirt opens (and closes) various files/pipes/sockets when starting a VM. Therefore, in order to be sure that it is an FD leak you're seeing you have to check the number of opened FDs *BEFORE* any VM is started, then start VM(s), shut them down, disconnect all the clients and check the number of FDs again. At this point, the numbers should be equal. > Therefore, in order to be sure that it is an FD leak you're seeing you have to check the number of opened FDs *BEFORE* any VM is started, then start VM(s), shut them down, disconnect all the clients and check the number of FDs again. At this point, the numbers should be equal.
Checking with `lsof -p PID' where PID is the one reported by `systemctl status libvirtd'. Starting a VM without a guest OS and powering it off makes one more open file descriptor left per each VM run. After restarting vdsmd and supervdsmd, when no VMs are running, the number of open file descriptors is the same as before the restarted. The leaked file descriptors apparently are (after 4 VM runs):
libvirtd 661466 root 41u unix 0xffff990fbf70da00 0t0 29580486 type=STREAM
libvirtd 661466 root 42u unix 0xffff991030b7cc80 0t0 29583488 type=STREAM
libvirtd 661466 root 43u unix 0xffff991001def500 0t0 29586122 type=STREAM
libvirtd 661466 root 44u unix 0xffff991030b7ba80 0t0 29602433 type=STREAM
libvirt-daemon-8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64
glib2-2.56.4-158.el8.x86_64 (the same also with glib2-2.56.4-156.el8.x86_64)
Is it a known or unknown leak or is there anything else I should check?
(In reply to Milan Zamazal from comment #6) > > Therefore, in order to be sure that it is an FD leak you're seeing you have to check the number of opened FDs *BEFORE* any VM is started, then start VM(s), shut them down, disconnect all the clients and check the number of FDs again. At this point, the numbers should be equal. > > Checking with `lsof -p PID' where PID is the one reported by `systemctl > status libvirtd'. Starting a VM without a guest OS and powering it off makes > one more open file descriptor left per each VM run. After restarting vdsmd > and supervdsmd, when no VMs are running, the number of open file descriptors > is the same as before the restarted. The leaked file descriptors apparently > are (after 4 VM runs): > > libvirtd 661466 root 41u unix 0xffff990fbf70da00 0t0 29580486 > type=STREAM > libvirtd 661466 root 42u unix 0xffff991030b7cc80 0t0 29583488 > type=STREAM > libvirtd 661466 root 43u unix 0xffff991001def500 0t0 29586122 > type=STREAM > libvirtd 661466 root 44u unix 0xffff991030b7ba80 0t0 29602433 > type=STREAM > This smells like bug 2045879. Basically, glib screwed up backport which broke the way we work around its another bug. So we needed another workaround to the workaround. > libvirt-daemon-8.0.0-1.module+el8.6.0+13896+a8fa8f67.x86_64 > glib2-2.56.4-158.el8.x86_64 (the same also with glib2-2.56.4-156.el8.x86_64) And this assures me that you are seeing that bug. Should be fixed with libvirt-8.0.0-3.module+el8.6.0+14098+5bee65f4 Thanks glib! Thank you, Michal, for explanation. Indeed, after upgrading libvirt to libvirt-daemon-kvm-8.0.0-4.module+el8.6.0+14253+42d4e2d6.x86_64, the problem is gone. So this problem will be fixed in the next RHV release, once we switch to RHEL 8.6. Test version: RHVH-4.4-20220321.0-RHVH-x86_64-dvd1.iso libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40.x86_64 # imgbase w You are on rhvh-4.4.10.3-0.20220321.0+1 rpm -qa| grep libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40 libvirt-7.6.0-6.1.module+el8.5.0+14474+b3410d40.x86_64 RHVH include the correct libvirt package, so the bug is fixed, change bug status to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Virtualization Host security and enhancement update [ovirt-4.4.10] Async #2), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1053 |