Bug 772320 - gnome-keyring/p11-kit abuses pthread_atfork in a way that breaks libvirt's unit tests
Summary: gnome-keyring/p11-kit abuses pthread_atfork in a way that breaks libvirt's un...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: gnome-keyring
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tomáš Bžatek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-06 19:57 UTC by Eric Blake
Modified: 2015-03-03 23:04 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-14 02:31:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Eric Blake 2012-01-06 19:57:20 UTC
Description of problem:

On rawhide, running 'make check' on a fresh checkout of libvirt.git fails, due to bad interactions from the gnome-keyring code trying to reinitialize itself after fork(), but in a manner that violates POSIX constraints.

POSIX is clear that a multi-threaded app cannot rely on the state of a mutex between fork() and exec() where the parent process was multi-threaded, as the mutex may have been locked in a different thread and thus not available from the thread present in the fork().  Hence, POSIX excludes pthread_mutex_lock from the list of async-signal-safe functions, and portable apps cannot blindly use it after fork:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_atfork.html

"The pthread_atfork( ) function was intended to provide multi-threaded libraries with a means to protect themselves from innocent application programs that call fork( ), and to provide multi-threaded application programs with a standard mechanism for protecting themselves from fork( ) calls in a library routine or the application itself."

"The expected usage was that the prepare handler would acquire all mutex locks and the other two fork handlers would release them."

"For example, an application could have supplied a prepare routine that acquires the necessary mutexes the library maintains and supplied child and parent routines that release those mutexes, thus ensuring that the child would have got a consistent snapshot of the state of the library (and that no mutexes would have been left stranded). This is good in theory, but in reality not practical. Each and every mutex and lock in the process must be located and locked. Every component of a program including third-party components must participate and they must agree who is responsible for which mutex or lock. This is especially problematic for mutexes and locks in dynamically allocated memory. All mutexes and locks internal to the implementation must be locked, too. This possibly delays the thread calling fork( ) for a long time or even indefinitely since uses of these synchronization objects may not be under control of the application. A final problem to mention here is the problem of locking streams. At least the streams under control of the system (like stdin, stdout, stderr) must be protected by locking the stream with flockfile( ). But the application itself could have done that, possibly in the same thread calling fork( ). In this case, the process will deadlock."

"Alternatively, some libraries might have been able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed). This approach is not possible, though, because implementations are allowed to fail *_init( ) and *_destroy( ) calls for mutexes and locks if the
mutex or lock is still locked. In this case, the child routine is not able to reinitialize the mutexes and locks."

...
"As explained, there is no suitable solution for functionality which requires non-atomic operations to be protected through mutexes and locks. This is why the POSIX.1 standard since the 1996 release requires that the child process after fork() in a multi-threaded process only calls async-signal-safe interfaces."


p11-kit is calling pthread_atfork with _just_ a child handler routine (see modules.c:init_globals_unlocked()), then from that child handler, calls into the gnome-keyring callback (see gkm-rpc-module.c:rpc_C_Initialize) - this callback starts out with a call to pthread_mutex_lock(), which is a problem by everything mentioned above - the _only_ safe thing to do is use just async-signal-safe functions (http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03).  There are probably additional constraint violations being done in this atfork handler, where the code in libvirt's unit test sets up a situation that confuses the library reinitialization attempts (libvirt's ./commandtest program is a unit test for testing whether libvirt's wrappers around fork/exec behave sanely, and the program intentionally closes fds 3, 4, and 5 so that the child has reproducible behavior in reopening those fds rather than whatever fd is next available; the child has no need to use any features of gnome-keyring, but the parent happened to initialize the p11-kit library prior to closing fds).


Version-Release number of selected component (if applicable):
p11-kit-0.9-1.fc17.i686
gnome-keyring-3.3.3.1-3.fc17.i686

How reproducible:
100%

Steps to Reproduce:
1. git clone git://libvirt.org/libvirt.git
2. cd libvirt && ./autogen.sh && make
3. make -C tests check TESTS=commandtest VIR_TEST_DEBUG=1
  
Actual results:
The test fails, including output like:
    TEST: commandtest
    1) Command Exec test0 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    2) Command Exec test1 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    3) Command Exec test2 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    4) Command Exec test3 test ... 8ay)PRIVATE-GNOME-KEYRING-PKCS11-PROTOCOL-V-1WARNING: gnome-keyring:: couldn't receive data: Input/output error
    OK
    5) Command Exec test4 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    6) Command Exec test5 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    7) Command Exec test6 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    8) Command Exec test7 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    9) Command Exec test8 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    10) Command Exec test9 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    11) Command Exec test10 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    12) Command Exec test11 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    13) Command Exec test12 test ... WARNING: gnome-keyring:: couldn't receive data: Bad file descriptor
    OK
    14) Command Exec test13 test ... WARNING: gnome-keyring:: couldn't receive data: Bad file descriptor
     
    Offset 25
    Expect []
    Actual [END STDOUT
    ]
    ... FAILED
    15) Command Exec test14 test ... WARNING: gnome-keyring:: couldn't receive data: Bad file descriptor
     
    Offset 25
    Expect []
    Actual [END STDOUT
    ]
    ... FAILED
    16) Command Exec test15 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    17) Command Exec test16 test ... OK
    18) Command Exec test17 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    libvir: error : failed to setup stdout file handle: Bad file descriptor
    Cannot run child internal error Child process (true) status unexpected: exit status 1
    libvir: error : internal error Child process (true) status unexpected: exit status 1
    FAILED
    19) Command Exec test18 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    20) Command Exec test19 test ... WARNING: gnome-keyring:: couldn't send data: Bad file descriptor
    OK
    FAIL: commandtest
    =======================================
    1 of 1 test failed
    Please report to libvir-list



Expected results:
The test should pass, and there should be no warnings from gnome-keyring littering stderr output.

Additional info:
Backtrace of the problematic write() call from gnome-keyring:
    (gdb) b virFork
    Breakpoint 1 at 0x805f1d0: file util/command.c, line 212.
    (gdb) c
    Continuing.
    Breakpoint 1, virFork (pid=pid@entry=0xbfb5c050) at util/command.c:212
    212 {
    (gdb) set follow-fork-mode child
    (gdb) b write
    Breakpoint 2 at 0x4faa40e0: write. (2 locations)
    (gdb) c
    Continuing.
    [New process 2083]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/libthread_db.so.1".
    [Switching to Thread 0xb786c780 (LWP 2083)]
     
    Breakpoint 2, write () at ../sysdeps/unix/syscall-template.S:82
    82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
    (gdb) bt
    #0 write () at ../sysdeps/unix/syscall-template.S:82
    #1 0x0055d389 in call_write (cs=cs@entry=0x8995698,
    data=data@entry=0xbfb5bc48 "", len=<optimized out>, len@entry=4)
    at gkm-rpc-module.c:390
    #2 0x0055d59b in call_send_recv (cs=0x8995698) at gkm-rpc-module.c:490
    #3 call_run (cs=0x8995698) at gkm-rpc-module.c:553
    #4 0x0055e4e8 in rpc_C_Initialize (init_args=0x89952b4)
    at gkm-rpc-module.c:1209
    #5 0x423d33a8 in initialize_module_unlocked_reentrant (mod=0x89952b0)
    at modules.c:512
    #6 0x423d358c in reinitialize_after_fork () at modules.c:556
    #7 0x4f99be6b in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:189
    #8 0x4faa69e4 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:26
    #9 0x0805f2da in virFork (pid=pid@entry=0xbfb5c050) at util/command.c:235
    #10 0x0805f60c in virExecWithHook (argv=0x8998478, envp=0x0, keepfd=0x0,
    keepfd_size=0, retpid=retpid@entry=0x89984fc, infd=-1, outfd=0x89984d0,
    errfd=0x89984d4, flags=0, data=data@entry=0x8998488, pidfile=0x0,
    hook=0x805e900 <virCommandHook>) at util/command.c:452
    #11 0x08061db2 in virCommandRunAsync (cmd=cmd@entry=0x8998488,
    pid=pid@entry=0x0) at util/command.c:2145
    #12 0x0806269a in virCommandRun (cmd=cmd@entry=0x8998488,
    exitstatus=exitstatus@entry=0x0) at util/command.c:1938
    #13 0x0805c44f in test0 (unused=unused@entry=0x0) at commandtest.c:102
    ---Type <return> to continue, or q <return> to quit---
    #14 0x0805d1e0 in virtTestRun (
    title=title@entry=0x82663ac "Command Exec test0 test",
    nloops=nloops@entry=1, body=body@entry=0x805c420 <test0>,
    data=data@entry=0x0) at testutils.c:141
    #15 0x0805c3fc in mymain () at commandtest.c:811
    #16 0x0805d7da in virtTestMain (argc=argc@entry=1, argv=argv@entry=0xbfb5c954,
    func=func@entry=0x805c320 <mymain>) at testutils.c:696
    #17 0x0805c20f in main (argc=1, argv=0xbfb5c954) at commandtest.c:836
    (gdb)

Comment 1 Eric Blake 2012-01-06 22:20:05 UTC
Note that the WARNING message printed during the pthread_atfork was a result of libvirt's test pulling out the rug from under gnome-keyring, by calling close on an fd that gnome-keyring was expecting to be present in the pthread_atfork handler.  Libvirt has since fixed the testsuite flaw:
 http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=74ff57506c76

That said, there's still a bug in the gnome-keyring code, and I suspect it should be possible to construct a case where one thread is in the middle of code that holds the mutex while another thread calls fork() and deadlocks because of gnome-manager trying to obtain an impossible lock.

Comment 2 Matthias Clasen 2012-02-14 02:31:59 UTC
pthread_atfork usage has been removed in p11-kit 0.11


Note You need to log in before you can comment on or make changes to this bug.