RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 659855 - libvirt crash on src/util/util.c in __virExec
Summary: libvirt crash on src/util/util.c in __virExec
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 665549
TreeView+ depends on / blocked
 
Reported: 2010-12-03 21:16 UTC by ritz
Modified: 2018-11-14 16:13 UTC (History)
10 users (show)

Fixed In Version: libvirt-0.8.7-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 665549 (view as bug list)
Environment:
Last Closed: 2011-05-19 13:24:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fix typo (411 bytes, patch)
2010-12-03 21:16 UTC, ritz
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Description ritz 2010-12-03 21:16:15 UTC
Created attachment 464654 [details]
fix typo

Description of problem:

Crash in src/util/util.c in __virExec. patch attached 

bt follows

(gdb) fram 0
#0  0x0000000000438020 in __virExec (argv=0x7fcc5c005900, envp=0x7fcc5c001a20, keepfd=0x7fcc6d189350, 
    retpid=0x7fcc6d189448, infd=-1, outfd=0x7fcc6d189444, errfd=0x7fcc6d189444, flags=7, 
    hook=0x43f5f0 <qemudSecurityHook>, data=0x7fcc6d1893d0, 
    pidfile=0x7fcc5c001010 "/var/run/libvirt/qemu/Windows2008.pid") at util/util.c:570
570	             !FD_ISSET(i, keepfd)))
(gdb) bt
#0  0x0000000000438020 in __virExec (argv=0x7fcc5c005900, envp=0x7fcc5c001a20, keepfd=0x7fcc6d189350, 
    retpid=0x7fcc6d189448, infd=-1, outfd=0x7fcc6d189444, errfd=0x7fcc6d189444, flags=7, 
    hook=0x43f5f0 <qemudSecurityHook>, data=0x7fcc6d1893d0, 
    pidfile=0x7fcc5c001010 "/var/run/libvirt/qemu/Windows2008.pid") at util/util.c:570
#1  0x0000000000438c16 in virExecWithHook (argv=0x7fcc5c005900, envp=0x7fcc5c001a20, keepfd=0x7fcc6d189350, 
    retpid=0x7fcc6d189448, infd=-1, outfd=0x7fcc6d189444, errfd=0x7fcc6d189444, flags=7, 
    hook=0x43f5f0 <qemudSecurityHook>, data=0x7fcc6d1893d0, 
    pidfile=0x7fcc5c001010 "/var/run/libvirt/qemu/Windows2008.pid") at util/util.c:712
#2  0x0000000000438d05 in virExecDaemonize (argv=<value optimized out>, envp=<value optimized out>, 
    keepfd=<value optimized out>, retpid=0x7fcc6d189448, infd=<value optimized out>, outfd=<v

Comment 2 Eric Blake 2010-12-03 21:35:54 UTC
The proposed patch is wrong.  fdset is already type const fd_set*, so taking &fdset would end up using fd_set**.

However, the crash is real, so we need to get the correct patch written.

Comment 3 Eric Blake 2010-12-03 21:41:03 UTC
Can you give me more of the backtrace, to see if this is a bug in the caller passing an incorrect fdset argument?  Then again, there is only one place that sets hook=qemudSecurityHook: qemudStartVMDaemon; but it has already been rewritten upstream to use upstream's new virCommand (therefore, contrary to comment 2, upstream does NOT need a fix).  I need to know what 'rpm -q libvirt' shows for you.

Comment 4 Eric Blake 2010-12-03 21:47:34 UTC
At any rate, I think I do see an issue in __virExec, whether or not it is root cause of the crash you provided.  We're iterating over [3,openmax), but if openmax > FD_SETSIZE, then this can result in undefined behavior as it accesses outside the bounds of *fdset.

Comment 5 Eric Blake 2010-12-10 21:06:54 UTC
Here's the upstream patch for the real bug identified in comment 4:
https://www.redhat.com/archives/libvir-list/2010-December/msg00242.html

On Linux, FD_SETSIZE is 1024, and sysconf(_SC_OPEN_MAX) is generally 1024 (I didn't test it could get larger, but if we have that many fd's open, we've probably got other issues with an fd leak), so RHEL is probably okay without backporting this patch.

But other platforms definitely need the patch - for example, on cygwin, FD_SETSIZE is 64, but _SC_OPEN_MAX starts at 32, and can grow as high as 3200, and FD_ISSET is an open-coded macro that will happily dereference beyond array bounds, and can result in leaking fds larger than 32 to the child if reading beyond bounds happened to hit a 1 bit outside of the fdset.

Was the original report an actual crash, or just a backtrace of a working program where you mis-read the code?  In other words, is the upstream patch sufficient to call this issue resolved upstream?

Comment 9 Stefan Praszalowicz 2010-12-24 15:15:42 UTC
FYI, this does bite me on a debian "squeeze" host, where sysconf(_SC_OPEN_MAX) returns 65535. 

The end result is indeed a segfault in FD_ISSET (line number messed up due to local changes of mine):
==24130== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==24130==  Bad permissions for mapped region at address 0xC34F988
==24130==    at 0x506BAFE: virExecDaemonize (util.c:823)
==24130==    by 0x44DA36: qemudStartVMDaemon (qemu_driver.c:4144)
==24130==    by 0x4507E2: qemudDomainCreate (qemu_driver.c:4882)
==24130==    by 0x50A9A8A: virDomainCreateXML (libvirt.c:1999)
==24130==    by 0x429BC7: remoteDispatchDomainCreateXml (remote.c:1273)
==24130==    by 0x42C546: remoteDispatchClientRequest (dispatch.c:530)
==24130==    by 0x41CA57: qemudWorker (libvirtd.c:1586)
==24130==    by 0x710C8B9: start_thread (pthread_create.c:300)
==24130==    by 0x75F502C: clone (clone.S:112)

In all other __virExec calls I looked at while debugging this, subprocesses closed their 65535 file descriptors successfully, as they have a null keepfd. Pretty crazy in itself, calling close that much, but anyway :p

I'm posting this just in case it's useful to know that it's an actual problem (instead of something that might never trigger) and a linux one too (not just other platforms).

I'll add a paranoid remark: the proposed patch ends up not closing fds > FD_SETSIZE, when technically they are not in the "keep" set and for that reason should be closed.

Comment 10 Eric Blake 2010-12-24 15:32:14 UTC
Thanks for the additional information.  RHEL is affected after all.  For example:

$ cat foo.c
#include <unistd.h>
#include <stdio.h>
int main(void) {
  printf("%ld\n", sysconf(_SC_OPEN_MAX));
  return 0;
}
$ ./foo
1024
$ ulimit -n 2048
$ ./foo
2048

So even though the default RHEL ulimit settings will prevent the problem, RHEL is not immune, and this patch MUST be backported to RHEL.

You are also correct that the logic is wrong in the original upstream patch.  Updated patches will be posted shortly.

Comment 13 Eric Blake 2011-01-05 15:19:15 UTC
Here's an alternate way to expose the bug that does not trigger the crash (the crash depends on the sizing of your stack and whether your ulimit -n is large enough for the out-of-bounds reference to go beyond stack bounds).  This formula depends on the stack having '1' bits, so you won't necessarily have the same fds leaked to the qemu process, but the presence or absence of the leak with a smaller ulimit is directly correlated to the presence of a crash with a larger ulimit.

Using two terminals both logged in as root (designated 1# and 2# in their prompt), and interspersed with explanatory comments:

1# # set up our shell to have high-valued fds to leak into children
1# ulimit -n 2048
1# for i in $(seq 1700 1800); do eval exec $i\>/dev/null; done
1# # the libvirt daemon probably has sane limits, so stop it
1# service libvirtd stop
1# # and start a new instance in the foreground with our extra fds
1# libvirtd

2# # make sure libvirtd really has high-valued fds...
2# ps -ea | grep libvirt
25568 pts/0    00:00:00 libvirtd
2# ls /proc/25568/fd
0   17    1708  1717  1726  1735  1744  1753  1762  1771  1780  1789  1798  6
1   1700  1709  1718  1727  1736  1745  1754  1763  1772  1781  1790  1799  7
10  1701  1710  1719  1728  1737  1746  1755  1764  1773  1782  1791  18    8
11  1702  1711  1720  1729  1738  1747  1756  1765  1774  1783  1792  1800  9
12  1703  1712  1721  1730  1739  1748  1757  1766  1775  1784  1793  19
13  1704  1713  1722  1731  1740  1749  1758  1767  1776  1785  1794  2
14  1705  1714  1723  1732  1741  1750  1759  1768  1777  1786  1795  3
15  1706  1715  1724  1733  1742  1751  1760  1769  1778  1787  1796  4
16  1707  1716  1725  1734  1743  1752  1761  1770  1779  1788  1797  5
2# # start a qemu VM; here, I have a VM named rhel6
2# virsh start rhel6
2# # see what leaked into the qemu child
2# ps -ea | grep qemu
25709 ?        00:00:00 qemu-kvm
2# ls /proc/25709/fd
0   12  16    1702  1737  1743  1760  1768  1772  1800  4  8
1   13  17    1703  1738  1754  1762  1769  1773  2     5  9
10  14  1700  1708  1739  1755  1763  1770  1774  22    6
11  15  1701  1733  1741  1756  1766  1771  1799  3     7
2# # buggy libvirtd leaked some, but not all, of 1700-1800 to child
2# # a fixed libvirtd should not have any fds >= 1700 open

Comment 14 Jiri Denemark 2011-01-09 23:56:43 UTC
Built into libvirt-0.8.7-1.el6

Comment 17 Vivian Bian 2011-02-16 03:29:48 UTC
checked with :
libvirt-0.8.1-27.el6.x86_64.rpm  --- reproducer pkg
libvirt-0.8.7-6.el6.x86_64.rpm  --- verification pkg

[reproducer]

1# ulimit -n 2048
1# for i in $(seq 1700 1800); do eval exec $i\>/dev/null; done
1# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]
1# libvirtd


2# ps -ea|grep libvirt
30913 pts/1    00:00:00 libvirtd

2# ls /proc/30913/fd
0   17    1708  1717  1726  1735  1744  1753  1762  1771  1780  1789  1798  8
1   1700  1709  1718  1727  1736  1745  1754  1763  1772  1781  1790  1799  9
10  1701  1710  1719  1728  1737  1746  1755  1764  1773  1782  1791  1800
11  1702  1711  1720  1729  1738  1747  1756  1765  1774  1783  1792  2
12  1703  1712  1721  1730  1739  1748  1757  1766  1775  1784  1793  3
13  1704  1713  1722  1731  1740  1749  1758  1767  1776  1785  1794  4
14  1705  1714  1723  1732  1741  1750  1759  1768  1777  1786  1795  5
15  1706  1715  1724  1733  1742  1751  1760  1769  1778  1787  1796  6
16  1707  1716  1725  1734  1743  1752  1761  1770  1779  1788  1797  7

2# virsh start rhel5.5
Domain rhel5.5 started

2# ps -ea|grep qemu
30980 ?        00:00:00 qemu-kvm

2# ls /proc/30980/fd 
0   11  14  17    1702  1735  1741  1758  1763  1769  1772  1798  23  5  8
1   12  15  1700  1703  1737  1754  1759  1764  1770  1773  18    3   6  9
10  13  16  1701  1733  1740  1757  1761  1768  1771  1774  2     4   7

leak has fds >= 1700 open

[verfication]





1# ulimit -n 2048
1# for i in $(seq 1700 1800); do eval exec $i\>/dev/null; done
1# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]
1# libvirtd


2# ps -ea|grep libvirt
31107 pts/1    00:00:00 libvirtd

2# ls /proc/31107/fd
0   17    1708  1717  1726  1735  1744  1753  1762  1771  1780  1789  1798  8
1   1700  1709  1718  1727  1736  1745  1754  1763  1772  1781  1790  1799  9
10  1701  1710  1719  1728  1737  1746  1755  1764  1773  1782  1791  1800
11  1702  1711  1720  1729  1738  1747  1756  1765  1774  1783  1792  2
12  1703  1712  1721  1730  1739  1748  1757  1766  1775  1784  1793  3
13  1704  1713  1722  1731  1740  1749  1758  1767  1776  1785  1794  4
14  1705  1714  1723  1732  1741  1750  1759  1768  1777  1786  1795  5
15  1706  1715  1724  1733  1742  1751  1760  1769  1778  1787  1796  6
16  1707  1716  1725  1734  1743  1752  1761  1770  1779  1788  1797  7

2# virsh start rhel5.5
Domain rhel5.5 started

2# ps -ea|grep qemu
31184 ?        00:00:00 qemu-kvm

2# ls /proc/31184/fd
0  1  10  11  12  13  14  15  16  17  18  2  23  3  4  5  6  7  8  9

Doesn't have libvirt leak to the qemu child . 

So set bug status to VERIFIED

Comment 20 errata-xmlrpc 2011-05-19 13:24:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.