Bug 964359

Summary: virSetUIDGID can deadlock due to unsafe use of getpwuid_r
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: acathrow, ahadas, ajia, berrange, bharrington, bili, clalancette, dallan, dyuan, eblake, gickowic, gsun, itamar, jdenemar, jforbes, jyang, kcleveng, laine, libvirt-maint, lsu, lyarwood, mjenner, shyu, veillard, ydu, zhwang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.2-24.el6 Doc Type: Bug Fix
Doc Text:
Cause: Libvirtd was setting up supplemental groups of child processes by making a call between fork and exec to a function (getpwuid_r) that could grab a mutex. Consequence: If another thread already held the getpwuid_r mutex at the time libvirtd forked, the forked child would deadlock, which in turn hung libvirtd. Fix: Code to compute the set of supplemental groups was refactored so that no mutex is required after fork. Result: The deadlock scenario is no longer possible.
Story Points: ---
Clone Of: 964358 Environment:
Last Closed: 2013-11-21 09:01:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1006509, 1008328    

Description Eric Blake 2013-05-18 03:12:27 UTC
Cloning to RHEL.

+++ This bug was initially created as a clone of Bug #964358 +++

Description of problem:
I hit an (admittedly rare) deadlock today where a multithreaded parent spawned off a child, then the child called getpwuid_r.  POSIX says that getpwuid_r is NOT async-signal-safe, and therefore unsafe to be used between fork and exec, precisely because of the possibility of the deadlock I hit.  Libvirt needs to rewrite virSetUIDGID to take the additional information learned by getpwuid_r as a parameter learned in advance, prior to the fork, rather than learning it inline after the fork.

Version-Release number of selected component (if applicable):
Deadlock hit me on latest libvirt.git, but git blame says that virSetUIDGID has been calling getpwuid_r since at least 2011, so I'm raising it against Fedora 18.

How reproducible:
Extremely rare

Steps to Reproduce:
1. Start a VM, and happen to hit the small window where the parent process holds the getpwuid_r mutex in one thread while another thread tries to fork and exec.  While I happened to get (extremely) (un-)lucky to hit it in real life, it will probably be a lot easier if you use gdb to help.
2.
3.
  
Actual results:
Deadlock.
Parent process hung with the following backtrace:
(gdb) t a a bt

Thread 2 (Thread 0x7fd56bbf2700 (LWP 19367)):
#0  0x00007fd5761ec54d in read () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fd578aace46 in saferead (fd=20, buf=0x7fd56bbf1117, count=1)
    at util/virfile.c:926
#2  0x00007fd578a9e97a in virCommandHandshakeWait (cmd=0x7fd55c231f10)
    at util/vircommand.c:2463
#3  0x00007fd56326ff40 in qemuProcessStart (conn=0x7fd53c000df0, 
    driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, migrateFrom=0x0, stdin_fd=-1, 
    stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, 
    flags=1) at qemu/qemu_process.c:3729
#4  0x00007fd5632af536 in qemuDomainObjStart (conn=0x7fd53c000df0, 
    driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, flags=0)
    at qemu/qemu_driver.c:5453
#5  0x00007fd5632af796 in qemuDomainCreateWithFlags (dom=0x7fd55c307d30, 
    flags=0) at qemu/qemu_driver.c:5502
#6  0x00007fd5632af829 in qemuDomainCreate (dom=0x7fd55c307d30)
    at qemu/qemu_driver.c:5520
#7  0x00007fd578b91569 in virDomainCreate (domain=0x7fd55c307d30)
    at libvirt.c:8462
#8  0x00007fd5795ee7aa in remoteDispatchDomainCreate (server=0x7fd579d95280, 
    client=0x7fd579da5010, msg=0x7fd579da87c0, rerr=0x7fd56bbf1aa0, 
    args=0x7fd55c307d70) at remote_dispatch.h:2910
---Type <return> to continue, or q <return> to quit---
#9  0x00007fd5795ee6a0 in remoteDispatchDomainCreateHelper (
    server=0x7fd579d95280, client=0x7fd579da5010, msg=0x7fd579da87c0, 
    rerr=0x7fd56bbf1aa0, args=0x7fd55c307d70, ret=0x7fd55c305430)
    at remote_dispatch.h:2888
#10 0x00007fd578bfdbfa in virNetServerProgramDispatchCall (
    prog=0x7fd579d91cf0, server=0x7fd579d95280, client=0x7fd579da5010, 
    msg=0x7fd579da87c0) at rpc/virnetserverprogram.c:439
#11 0x00007fd578bfd771 in virNetServerProgramDispatch (prog=0x7fd579d91cf0, 
    server=0x7fd579d95280, client=0x7fd579da5010, msg=0x7fd579da87c0)
    at rpc/virnetserverprogram.c:305
#12 0x00007fd578c03ff5 in virNetServerProcessMsg (srv=0x7fd579d95280, 
    client=0x7fd579da5010, prog=0x7fd579d91cf0, msg=0x7fd579da87c0)
    at rpc/virnetserver.c:162
#13 0x00007fd578c040e4 in virNetServerHandleJob (jobOpaque=0x7fd579da91d0, 
    opaque=0x7fd579d95280) at rpc/virnetserver.c:183
#14 0x00007fd578ae3ab6 in virThreadPoolWorker (opaque=0x7fd579d731a0)
    at util/virthreadpool.c:144
#15 0x00007fd578ae347b in virThreadHelper (data=0x7fd579d73110)
    at util/virthreadpthread.c:161
#16 0x00007fd5761e5851 in start_thread (arg=0x7fd56bbf2700)
    at pthread_create.c:301
#17 0x00007fd575b2c90d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Thread 1 (Thread 0x7fd579597800 (LWP 19341)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fd578ae32a1 in virCondWait (c=0x7fd579d95490, m=0x7fd579d95438)
    at util/virthreadpthread.c:117
#2  0x00007fd578ae3f84 in virThreadPoolFree (pool=0x7fd579d95400)
    at util/virthreadpool.c:264
#3  0x00007fd578c06b93 in virNetServerDispose (obj=0x7fd579d95280)
    at rpc/virnetserver.c:1168
#4  0x00007fd578acecb4 in virObjectUnref (anyobj=0x7fd579d95280)
    at util/virobject.c:264
#5  0x00007fd5795e5031 in main (argc=1, argv=0x7fff3eaec128) at libvirtd.c:1500



Child process hung with the following backtrace:
(gdb) t a a bt

Thread 1 (Thread 0x7fd56bbf2700 (LWP 3212)):
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007fd5761e7388 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007fd5761e7257 in __pthread_mutex_lock (mutex=0x7fd56be00360)
    at pthread_mutex_lock.c:61
#3  0x00007fd56bbf9fc5 in _nss_files_getpwuid_r (uid=0, result=0x7fd56bbf0c70, 
    buffer=0x7fd55c2a65f0 "", buflen=1024, errnop=0x7fd56bbf25b8)
    at nss_files/files-pwd.c:40
#4  0x00007fd575aeff1d in __getpwuid_r (uid=0, resbuf=0x7fd56bbf0c70, 
    buffer=0x7fd55c2a65f0 "", buflen=1024, result=0x7fd56bbf0cb0)
    at ../nss/getXXbyYY_r.c:253
#5  0x00007fd578aebafc in virSetUIDGID (uid=0, gid=0) at util/virutil.c:1031
#6  0x00007fd578aebf43 in virSetUIDGIDWithCaps (uid=0, gid=0, capBits=0, 
    clearExistingCaps=true) at util/virutil.c:1388
#7  0x00007fd578a9a20b in virExec (cmd=0x7fd55c231f10) at util/vircommand.c:654
#8  0x00007fd578a9dfa2 in virCommandRunAsync (cmd=0x7fd55c231f10, pid=0x0)
    at util/vircommand.c:2247
#9  0x00007fd578a9d74e in virCommandRun (cmd=0x7fd55c231f10, exitstatus=0x0)
    at util/vircommand.c:2100
#10 0x00007fd56326fde5 in qemuProcessStart (conn=0x7fd53c000df0, 
    driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, migrateFrom=0x0, stdin_fd=-1, 
---Type <return> to continue, or q <return> to quit---
    stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, 
    flags=1) at qemu/qemu_process.c:3694
#11 0x00007fd5632af536 in qemuDomainObjStart (conn=0x7fd53c000df0, 
    driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, flags=0)
    at qemu/qemu_driver.c:5453
#12 0x00007fd5632af796 in qemuDomainCreateWithFlags (dom=0x7fd55c307d30, 
    flags=0) at qemu/qemu_driver.c:5502
#13 0x00007fd5632af829 in qemuDomainCreate (dom=0x7fd55c307d30)
    at qemu/qemu_driver.c:5520
#14 0x00007fd578b91569 in virDomainCreate (domain=0x7fd55c307d30)
    at libvirt.c:8462
#15 0x00007fd5795ee7aa in remoteDispatchDomainCreate (server=0x7fd579d95280, 
    client=0x7fd579da5010, msg=0x7fd579da87c0, rerr=0x7fd56bbf1aa0, 
    args=0x7fd55c307d70) at remote_dispatch.h:2910
#16 0x00007fd5795ee6a0 in remoteDispatchDomainCreateHelper (
    server=0x7fd579d95280, client=0x7fd579da5010, msg=0x7fd579da87c0, 
    rerr=0x7fd56bbf1aa0, args=0x7fd55c307d70, ret=0x7fd55c305430)
    at remote_dispatch.h:2888
#17 0x00007fd578bfdbfa in virNetServerProgramDispatchCall (
    prog=0x7fd579d91cf0, server=0x7fd579d95280, client=0x7fd579da5010, 
    msg=0x7fd579da87c0) at rpc/virnetserverprogram.c:439
#18 0x00007fd578bfd771 in virNetServerProgramDispatch (prog=0x7fd579d91cf0, 
    server=0x7fd579d95280, client=0x7fd579da5010, msg=0x7fd579da87c0)
---Type <return> to continue, or q <return> to quit---
    at rpc/virnetserverprogram.c:305
#19 0x00007fd578c03ff5 in virNetServerProcessMsg (srv=0x7fd579d95280, 
    client=0x7fd579da5010, prog=0x7fd579d91cf0, msg=0x7fd579da87c0)
    at rpc/virnetserver.c:162
#20 0x00007fd578c040e4 in virNetServerHandleJob (jobOpaque=0x7fd579da91d0, 
    opaque=0x7fd579d95280) at rpc/virnetserver.c:183
#21 0x00007fd578ae3ab6 in virThreadPoolWorker (opaque=0x7fd579d731a0)
    at util/virthreadpool.c:144
#22 0x00007fd578ae347b in virThreadHelper (data=0x7fd579d73110)
    at util/virthreadpthread.c:161
#23 0x00007fd5761e5851 in start_thread (arg=0x7fd56bbf2700)
    at pthread_create.c:301
#24 0x00007fd575b2c90d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115



Expected results:
No deadlock - getpwuid_r must be called BEFORE fork, and the results cached and passed in as a parameter to virSetUIDGID after fork, so that we aren't using non-async-signal-safe functions that deadlock the child, which in turn prevents us from locking up the parent.

Additional info:
In searching for duplicates, I found that bug 452629 described similar symptoms, but without a root cause analysis to the POSIX async-signal safety aspect.

Comment 1 Laine Stump 2013-05-18 05:17:09 UTC
getpwuid_r() has been called by libvirt between fork() and exec() since commit b7bd75c, which was in v0.8.5. (this code was in a different location, but was later moved into the general-purpose virSetUIDGID() function, and called by even more places. I haven't looked, but this code movement could make backporting a fix to older branches a bit tedious.

Comment 2 Eric Blake 2013-05-21 20:19:56 UTC
I hit this again today while testing a fix for bug 903433; I'm surprised that no one else has seemed to hit this issue.

Comment 3 Eric Blake 2013-05-22 20:47:52 UTC
Testing notes: in /etc/libvirt/qemu.conf, set 'user' and 'group' to qemu (not root), and restart libvirtd.  As a sanity check, inspect /proc/nnn/status of a qemu process started by this libvirt; assuming default installation, it should have:

Uid:    107     107     107     107
Gid:    107     107     107     107
...
Groups: 36 107 

both pre- and post-patch (that is, this validates that we are correctly assigning supplemental group 36 (kvm) to the qemu process).

I have not yet played with root-squash NFS in the context of this bug, but suspect it should be possible to write a test case where a file that is owned by the kvm group but NOT the qemu uid or gid should still be usable by a guest (ie. using root-squash NFS as proof that the cases where we have to fork a child to work around root-squash limitations still adds the right supplementary groups to mimic what the real qemu user can access).

So far, I haven't found a RELIABLE way to trigger the deadlock pre-patch; I'm still messing around with running libvirtd under gdb, and trying to figure out where to put a breakpoint in glibc's getpwuid_r, followed by appropriate steps to cause one thread of libvirtd to do a uid lookup; the idea is that once the breakpoint is hit, I then let another thread run that forks and does a uid lookup in the child, to reliably prove that the pre-patch deadlock exists.  If I come up with a reproducible formula (rather than just starting and stopping multiple domains in parallel while running a hook script in /etc/libvirt/hooks/qemu), I'll add it as another note.

Comment 4 Eric Blake 2013-05-23 16:59:14 UTC
v1 of potential upstream patch has been posted upstream; it will need a v2 depending on what gnulib gives us, but I plan on doing a scratch build soon.

https://www.redhat.com/archives/libvir-list/2013-May/msg01600.html

Comment 6 Eric Blake 2013-07-19 17:39:40 UTC
Upstream is now in place with this series of commits:
bfc183c
29fe5d7
fdb3bde
ee777e9
75c1256
c1983ba

and optionally a gnulib submodule update (I'll have to figure out how to avoid backporting that, as it messes with the build process)

Comment 9 Eric Blake 2013-08-05 20:09:57 UTC
An additional issue was identified upstream that affects Debian's choice of using a non-primary group for running qemu, while leaving the files only owned by the primary group:
https://www.redhat.com/archives/libvir-list/2013-August/msg00209.html
But since RHEL doesn't use Debian's uid setup, it should not matter if that patch is not backported to RHEL.

Comment 12 yanbing du 2013-08-15 08:55:34 UTC
HI Eric,
I'm trying to verify this bug with latest libvirt, but not clear about how to trigger this bug, could u give some suggestion so i can verify it? Thanks!

Comment 13 Eric Blake 2013-08-15 12:03:55 UTC
Positive verification is easy - both pre-patch and post-patch, run with /etc/libvirt/qemu.conf set to a non-root user that belongs to more than one group, and ensure that qemu child processes are associated with all of the supplementary groups of that user rather than just the primary group.

Negative verification (triggering the pre-patch race) is much harder.  I happened to hit the race while running a hook script associated with domain startup, where all the hook script did was sleep 2 seconds (http://libvirt.org/hooks.html), then attempting to start multiple domains in parallel.  That was enough to make multiple threads of libvirt be probing the group database while other threads were trying to fork; but even then, I only saw the deadlock twice across hundreds of attempts.  I think you will only get reliable tests if you run libvirtd under gdb and put a breakpoint inside glibc's mutex within getpwuid_r, although I have not attempted to figure out how to set up such a test.  I'm also trying to work on an LD_PRELOAD addition to libvirt/tests that could serve a similar purpose, but don't have that finished yet.

Comment 14 yanbing du 2013-08-21 07:00:24 UTC
Hi, Eric
Test with libvirt-0.10.2-22.el6.x86_64, the qemu processes groups always 0, this maybe a problem, details as following:

Positive test with libvirt-0.10.2-21.el6.x86_64
1. Set user/group to "qemu", restart libvirt, then start a guest
2. Check the qemu child processes:
#  ps -eaL|grep qemu
10695 10695 ?        00:00:00 qemu-kvm
10695 10697 ?        00:00:00 qemu-kvm
10695 10698 ?        00:00:00 qemu-kvm
# cat /proc/10698/status 
...	0
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	36 107 
# cat /proc/10697/status 
...	0
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	36 107 

# cat /proc/10695/status 
...
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	36 107 

------------
Then update libvirt to libvirt-0.10.2-22.el6.x86_64, repeat step1&2:
#  ps -eaL|grep qemu
11006 11006 ?        00:00:00 qemu-kvm
11006 11008 ?        00:00:00 qemu-kvm
11006 11009 ?        00:00:00 qemu-kvm

# cat /proc/11009/status
...	0
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	0 

# cat /proc/11008/status
...
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	0 

# cat /proc/11006/status
...
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	0

Comment 15 Eric Blake 2013-08-22 12:36:34 UTC
(In reply to yanbing du from comment #14)
> Hi, Eric
> Test with libvirt-0.10.2-22.el6.x86_64, the qemu processes groups always 0,
> this maybe a problem, details as following:

Yes, it IS a problem.  Reopening this bug.


> ------------
> Then update libvirt to libvirt-0.10.2-22.el6.x86_64, repeat step1&2:
> #  ps -eaL|grep qemu
> 11006 11006 ?        00:00:00 qemu-kvm
> 11006 11008 ?        00:00:00 qemu-kvm
> 11006 11009 ?        00:00:00 qemu-kvm
> 
> # cat /proc/11009/status
> ...	0
> Uid:	107	107	107	107
> Gid:	107	107	107	107
> Utrace:	0
> FDSize:	64
> Groups:	0 

That should not be happening.  I'm investigating...

Comment 19 Eric Blake 2013-08-26 14:37:47 UTC
*** Bug 997350 has been marked as a duplicate of this bug. ***

Comment 20 Eric Blake 2013-08-29 19:29:37 UTC
Now upstream as of commit 745aa55fb, and v2 is in POST on rhvirt-archives (would post the URL, but rhvirt-archives appears to be stuck since the 27th...)

Comment 21 Eric Blake 2013-08-29 19:38:18 UTC
(In reply to Eric Blake from comment #20)
> Now upstream as of commit 745aa55fb, and v2 is in POST on rhvirt-archives
> (would post the URL, but rhvirt-archives appears to be stuck since the
> 27th...)

here's the message id to look for, if the archives ever get unstuck
<1377804152-29348-1-git-send-email-eblake>

Comment 23 Jiri Denemark 2013-09-02 08:19:37 UTC
*** Bug 1002115 has been marked as a duplicate of this bug. ***

Comment 26 Eric Blake 2013-09-06 21:59:35 UTC
*** Bug 1003488 has been marked as a duplicate of this bug. ***

Comment 27 zhenfeng wang 2013-09-09 09:57:48 UTC
Hi, Eric
I just try to verify this bug with libvirt-0.10.2-24.el6.x86_64, the following was my verify steps, please help check that whether was it enough to verify this bug, thanks.

Reproduce bug with libvirt-0.10.2-22.el6.x86_64
1. create new group "abc" and add the user "qemu" to it
# id qemu
uid=107(qemu) gid=107(qemu) groups=107(qemu),36(kvm),500(abc)

2. Set user/group to "qemu" in qemu.conf, restart libvirt, then start a guest
3. Check the qemu child processes:

# ps -eaL|grep qemu
29517 29517 ?        00:00:03 qemu-kvm
29517 29543 ?        00:00:40 qemu-kvm
29517 29544 ?        00:00:00 qemu-kvm
29517 30038 ?        00:00:00 qemu-kvm

# cat /proc/29544/status
Name:	qemu-kvm
State:	S (sleeping)
Tgid:	29517
Pid:	29544
PPid:	1
TracerPid:	0
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	0 492 

Then update libvirt to libvirt-0.10.2-24.el6.x86_64, repeat step2&3, qemu child processes are associated with all of the supplementary groups of that user :
# ps -eaL|grep qemu
30736 30736 ?        00:00:00 qemu-kvm
30736 30758 ?        00:00:00 qemu-kvm
30736 30759 ?        00:00:02 qemu-kvm
30736 30760 ?        00:00:00 qemu-kvm

# cat /proc/30760/status 
Name:	qemu-kvm
State:	S (sleeping)
Tgid:	30736
Pid:	30760
PPid:	1
TracerPid:	0
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	36 107 500 

# cat /proc/30759/status 
Name:	qemu-kvm
State:	S (sleeping)
Tgid:	30736
Pid:	30759
PPid:	1
TracerPid:	0
Uid:	107	107	107	107
Gid:	107	107	107	107
Utrace:	0
FDSize:	64
Groups:	36 107 500

Comment 28 Eric Blake 2013-09-09 13:42:52 UTC
*** Bug 1005781 has been marked as a duplicate of this bug. ***

Comment 29 Eric Blake 2013-09-10 12:11:14 UTC
*** Bug 999230 has been marked as a duplicate of this bug. ***

Comment 32 errata-xmlrpc 2013-11-21 09:01:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1581.html