When pthreads is transitively linked (the program links against a shared object which links against pthreads, but the program does not link against pthreads directly) pthread_mutex_unlock fails when the lock is acquired with pthread_mutex_trylock. Source code for a tiny demonstration is at http://graphics.stanford.edu/~eldridge/pthread_bug.tar The build looks like gcc -g -Wall -shared -o libtinylib.so tinylib.c -lpthread gcc -g -Wall -o program program.c -L. -Wl,-rpath,. -ltinylib gcc -g -Wall -o program2 program.c -L. -Wl,-rpath,. -ltinylib -lpthread > ./program After pthread_mutex_init mutex->__m_lock.__status=0 After pthread_mutex_trylock mutex->__m_lock.__status=1 After pthread_mutex_unlock mutex->__m_lock.__status=1 > ./program2 After pthread_mutex_init mutex->__m_lock.__status=0 After pthread_mutex_trylock mutex->__m_lock.__status=1 After pthread_mutex_unlock mutex->__m_lock.__status=0 In the first case pthread_mutex_unlock() fails to actually release the lock. I think this may be related to bug 17145, although my version of glibc (2.1.94-3) includes the patches from that bug. Thanks, -Matthew
The bug exhibits the "doesn't release the lock" behavior when acquired as while (pthread_mutex_trylock( sl ) == EBUSY) ; /* EMPTY */ but instead acts as "doesn't acquire the lock" (never modifies any of elements of the lock structure) when acquired as pthread_mutex_lock( &mutex ); I've updated the sample code.
Strange, cannot reproduce this: $ make; ./program; ./program2; ldd ./program; ldd ./program2; rpm -q glibc; rpm -q --qf '%{ARCH}\n' glibc gcc -g -Wall -shared -o libtinylib.so tinylib.c -lpthread tinylib.c: In function `tiny_function': tinylib.c:29: warning: implicit declaration of function `memset' gcc -g -Wall -o program program.c -L. -Wl,-rpath,. -ltinylib gcc -g -Wall -o program2 program.c -L. -Wl,-rpath,. -ltinylib -lpthread After pthread_mutex_init mutex->__m_lock.__status=0 After pthread_mutex_trylock mutex->__m_lock.__status=1 After pthread_mutex_unlock mutex->__m_lock.__status=0 After pthread_mutex_init mutex->__m_lock.__status=0 After pthread_mutex_trylock mutex->__m_lock.__status=1 After pthread_mutex_unlock mutex->__m_lock.__status=0 libtinylib.so => ./libtinylib.so (0x40018000) libc.so.6 => /lib/libc.so.6 (0x40023000) libpthread.so.0 => /lib/libpthread.so.0 (0x40149000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) libtinylib.so => ./libtinylib.so (0x40018000) libpthread.so.0 => /lib/libpthread.so.0 (0x40023000) libc.so.6 => /lib/libc.so.6 (0x40039000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) glibc-2.1.94-3 i686 Are you using the i686 or i386 rpm (wonder whether I should install it and check again)?
So here's the story -- Matthew and I are installing these machines automatically over a network (it's a cluster), and for some reason whoever created the directory full of "stuff to install after RH7.0" (i.e., the released updates) put both the i386 and the i686 versions of glibc-2.1.94 in that directory. They conflicted (duh), so RPM installed neither, and we were (stupidly) running the older 2.1.92 version of glibc. I haven't personally witnessed the new installation process, so I don't know whether or not we should have seen the conflicts or not, but the machines were behind on their libc. I manually upgraded one of our machines to the newer version of glibc, and the tiny demo now runs fine (as does *our* software, which was what we really cared about). Sorry for the confusion. Very strange bug (took a whole day to track down).
Then it is a up2date bug (actually the rhns server bug I think), and I was told it is being worked on.