Bug 659855
Summary: | libvirt crash on src/util/util.c in __virExec | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | ritz <rkhadgar> | ||||
Component: | libvirt | Assignee: | Eric Blake <eblake> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.0 | CC: | apevec, bsarathy, dallan, deubeulyou, eblake, gren, jyang, vbian, xen-maint, yoyzhang | ||||
Target Milestone: | rc | Keywords: | Patch | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-0.8.7-1.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 665549 (view as bug list) | Environment: | |||||
Last Closed: | 2011-05-19 13:24:43 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 665549 | ||||||
Attachments: |
|
The proposed patch is wrong. fdset is already type const fd_set*, so taking &fdset would end up using fd_set**. However, the crash is real, so we need to get the correct patch written. Can you give me more of the backtrace, to see if this is a bug in the caller passing an incorrect fdset argument? Then again, there is only one place that sets hook=qemudSecurityHook: qemudStartVMDaemon; but it has already been rewritten upstream to use upstream's new virCommand (therefore, contrary to comment 2, upstream does NOT need a fix). I need to know what 'rpm -q libvirt' shows for you. At any rate, I think I do see an issue in __virExec, whether or not it is root cause of the crash you provided. We're iterating over [3,openmax), but if openmax > FD_SETSIZE, then this can result in undefined behavior as it accesses outside the bounds of *fdset. Here's the upstream patch for the real bug identified in comment 4: https://www.redhat.com/archives/libvir-list/2010-December/msg00242.html On Linux, FD_SETSIZE is 1024, and sysconf(_SC_OPEN_MAX) is generally 1024 (I didn't test it could get larger, but if we have that many fd's open, we've probably got other issues with an fd leak), so RHEL is probably okay without backporting this patch. But other platforms definitely need the patch - for example, on cygwin, FD_SETSIZE is 64, but _SC_OPEN_MAX starts at 32, and can grow as high as 3200, and FD_ISSET is an open-coded macro that will happily dereference beyond array bounds, and can result in leaking fds larger than 32 to the child if reading beyond bounds happened to hit a 1 bit outside of the fdset. Was the original report an actual crash, or just a backtrace of a working program where you mis-read the code? In other words, is the upstream patch sufficient to call this issue resolved upstream? FYI, this does bite me on a debian "squeeze" host, where sysconf(_SC_OPEN_MAX) returns 65535. The end result is indeed a segfault in FD_ISSET (line number messed up due to local changes of mine): ==24130== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==24130== Bad permissions for mapped region at address 0xC34F988 ==24130== at 0x506BAFE: virExecDaemonize (util.c:823) ==24130== by 0x44DA36: qemudStartVMDaemon (qemu_driver.c:4144) ==24130== by 0x4507E2: qemudDomainCreate (qemu_driver.c:4882) ==24130== by 0x50A9A8A: virDomainCreateXML (libvirt.c:1999) ==24130== by 0x429BC7: remoteDispatchDomainCreateXml (remote.c:1273) ==24130== by 0x42C546: remoteDispatchClientRequest (dispatch.c:530) ==24130== by 0x41CA57: qemudWorker (libvirtd.c:1586) ==24130== by 0x710C8B9: start_thread (pthread_create.c:300) ==24130== by 0x75F502C: clone (clone.S:112) In all other __virExec calls I looked at while debugging this, subprocesses closed their 65535 file descriptors successfully, as they have a null keepfd. Pretty crazy in itself, calling close that much, but anyway :p I'm posting this just in case it's useful to know that it's an actual problem (instead of something that might never trigger) and a linux one too (not just other platforms). I'll add a paranoid remark: the proposed patch ends up not closing fds > FD_SETSIZE, when technically they are not in the "keep" set and for that reason should be closed. Thanks for the additional information. RHEL is affected after all. For example: $ cat foo.c #include <unistd.h> #include <stdio.h> int main(void) { printf("%ld\n", sysconf(_SC_OPEN_MAX)); return 0; } $ ./foo 1024 $ ulimit -n 2048 $ ./foo 2048 So even though the default RHEL ulimit settings will prevent the problem, RHEL is not immune, and this patch MUST be backported to RHEL. You are also correct that the logic is wrong in the original upstream patch. Updated patches will be posted shortly. Here's an alternate way to expose the bug that does not trigger the crash (the crash depends on the sizing of your stack and whether your ulimit -n is large enough for the out-of-bounds reference to go beyond stack bounds). This formula depends on the stack having '1' bits, so you won't necessarily have the same fds leaked to the qemu process, but the presence or absence of the leak with a smaller ulimit is directly correlated to the presence of a crash with a larger ulimit. Using two terminals both logged in as root (designated 1# and 2# in their prompt), and interspersed with explanatory comments: 1# # set up our shell to have high-valued fds to leak into children 1# ulimit -n 2048 1# for i in $(seq 1700 1800); do eval exec $i\>/dev/null; done 1# # the libvirt daemon probably has sane limits, so stop it 1# service libvirtd stop 1# # and start a new instance in the foreground with our extra fds 1# libvirtd 2# # make sure libvirtd really has high-valued fds... 2# ps -ea | grep libvirt 25568 pts/0 00:00:00 libvirtd 2# ls /proc/25568/fd 0 17 1708 1717 1726 1735 1744 1753 1762 1771 1780 1789 1798 6 1 1700 1709 1718 1727 1736 1745 1754 1763 1772 1781 1790 1799 7 10 1701 1710 1719 1728 1737 1746 1755 1764 1773 1782 1791 18 8 11 1702 1711 1720 1729 1738 1747 1756 1765 1774 1783 1792 1800 9 12 1703 1712 1721 1730 1739 1748 1757 1766 1775 1784 1793 19 13 1704 1713 1722 1731 1740 1749 1758 1767 1776 1785 1794 2 14 1705 1714 1723 1732 1741 1750 1759 1768 1777 1786 1795 3 15 1706 1715 1724 1733 1742 1751 1760 1769 1778 1787 1796 4 16 1707 1716 1725 1734 1743 1752 1761 1770 1779 1788 1797 5 2# # start a qemu VM; here, I have a VM named rhel6 2# virsh start rhel6 2# # see what leaked into the qemu child 2# ps -ea | grep qemu 25709 ? 00:00:00 qemu-kvm 2# ls /proc/25709/fd 0 12 16 1702 1737 1743 1760 1768 1772 1800 4 8 1 13 17 1703 1738 1754 1762 1769 1773 2 5 9 10 14 1700 1708 1739 1755 1763 1770 1774 22 6 11 15 1701 1733 1741 1756 1766 1771 1799 3 7 2# # buggy libvirtd leaked some, but not all, of 1700-1800 to child 2# # a fixed libvirtd should not have any fds >= 1700 open Built into libvirt-0.8.7-1.el6 checked with : libvirt-0.8.1-27.el6.x86_64.rpm --- reproducer pkg libvirt-0.8.7-6.el6.x86_64.rpm --- verification pkg [reproducer] 1# ulimit -n 2048 1# for i in $(seq 1700 1800); do eval exec $i\>/dev/null; done 1# service libvirtd stop Stopping libvirtd daemon: [ OK ] 1# libvirtd 2# ps -ea|grep libvirt 30913 pts/1 00:00:00 libvirtd 2# ls /proc/30913/fd 0 17 1708 1717 1726 1735 1744 1753 1762 1771 1780 1789 1798 8 1 1700 1709 1718 1727 1736 1745 1754 1763 1772 1781 1790 1799 9 10 1701 1710 1719 1728 1737 1746 1755 1764 1773 1782 1791 1800 11 1702 1711 1720 1729 1738 1747 1756 1765 1774 1783 1792 2 12 1703 1712 1721 1730 1739 1748 1757 1766 1775 1784 1793 3 13 1704 1713 1722 1731 1740 1749 1758 1767 1776 1785 1794 4 14 1705 1714 1723 1732 1741 1750 1759 1768 1777 1786 1795 5 15 1706 1715 1724 1733 1742 1751 1760 1769 1778 1787 1796 6 16 1707 1716 1725 1734 1743 1752 1761 1770 1779 1788 1797 7 2# virsh start rhel5.5 Domain rhel5.5 started 2# ps -ea|grep qemu 30980 ? 00:00:00 qemu-kvm 2# ls /proc/30980/fd 0 11 14 17 1702 1735 1741 1758 1763 1769 1772 1798 23 5 8 1 12 15 1700 1703 1737 1754 1759 1764 1770 1773 18 3 6 9 10 13 16 1701 1733 1740 1757 1761 1768 1771 1774 2 4 7 leak has fds >= 1700 open [verfication] 1# ulimit -n 2048 1# for i in $(seq 1700 1800); do eval exec $i\>/dev/null; done 1# service libvirtd stop Stopping libvirtd daemon: [ OK ] 1# libvirtd 2# ps -ea|grep libvirt 31107 pts/1 00:00:00 libvirtd 2# ls /proc/31107/fd 0 17 1708 1717 1726 1735 1744 1753 1762 1771 1780 1789 1798 8 1 1700 1709 1718 1727 1736 1745 1754 1763 1772 1781 1790 1799 9 10 1701 1710 1719 1728 1737 1746 1755 1764 1773 1782 1791 1800 11 1702 1711 1720 1729 1738 1747 1756 1765 1774 1783 1792 2 12 1703 1712 1721 1730 1739 1748 1757 1766 1775 1784 1793 3 13 1704 1713 1722 1731 1740 1749 1758 1767 1776 1785 1794 4 14 1705 1714 1723 1732 1741 1750 1759 1768 1777 1786 1795 5 15 1706 1715 1724 1733 1742 1751 1760 1769 1778 1787 1796 6 16 1707 1716 1725 1734 1743 1752 1761 1770 1779 1788 1797 7 2# virsh start rhel5.5 Domain rhel5.5 started 2# ps -ea|grep qemu 31184 ? 00:00:00 qemu-kvm 2# ls /proc/31184/fd 0 1 10 11 12 13 14 15 16 17 18 2 23 3 4 5 6 7 8 9 Doesn't have libvirt leak to the qemu child . So set bug status to VERIFIED An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |
Created attachment 464654 [details] fix typo Description of problem: Crash in src/util/util.c in __virExec. patch attached bt follows (gdb) fram 0 #0 0x0000000000438020 in __virExec (argv=0x7fcc5c005900, envp=0x7fcc5c001a20, keepfd=0x7fcc6d189350, retpid=0x7fcc6d189448, infd=-1, outfd=0x7fcc6d189444, errfd=0x7fcc6d189444, flags=7, hook=0x43f5f0 <qemudSecurityHook>, data=0x7fcc6d1893d0, pidfile=0x7fcc5c001010 "/var/run/libvirt/qemu/Windows2008.pid") at util/util.c:570 570 !FD_ISSET(i, keepfd))) (gdb) bt #0 0x0000000000438020 in __virExec (argv=0x7fcc5c005900, envp=0x7fcc5c001a20, keepfd=0x7fcc6d189350, retpid=0x7fcc6d189448, infd=-1, outfd=0x7fcc6d189444, errfd=0x7fcc6d189444, flags=7, hook=0x43f5f0 <qemudSecurityHook>, data=0x7fcc6d1893d0, pidfile=0x7fcc5c001010 "/var/run/libvirt/qemu/Windows2008.pid") at util/util.c:570 #1 0x0000000000438c16 in virExecWithHook (argv=0x7fcc5c005900, envp=0x7fcc5c001a20, keepfd=0x7fcc6d189350, retpid=0x7fcc6d189448, infd=-1, outfd=0x7fcc6d189444, errfd=0x7fcc6d189444, flags=7, hook=0x43f5f0 <qemudSecurityHook>, data=0x7fcc6d1893d0, pidfile=0x7fcc5c001010 "/var/run/libvirt/qemu/Windows2008.pid") at util/util.c:712 #2 0x0000000000438d05 in virExecDaemonize (argv=<value optimized out>, envp=<value optimized out>, keepfd=<value optimized out>, retpid=0x7fcc6d189448, infd=<value optimized out>, outfd=<v