From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901 Description of problem: There seems to be a race condition in electric fence and pthreads. I have a large multithreaded application that, during shutdown, has many threads exiting and many calls to "free" all at once. About 1 in 5 shutdowns, two threads get deadlocked. gdb reveals that one thread is as follows: (gdb) where #0 0x400cf8a5 in __sigsuspend (set=0x41b228ec) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x400920d9 in __pthread_wait_for_restart_signal (self=0x41b22c00) at pthread.c:934 #2 0x400930ac in __new_sem_wait (sem=0x400a173c) at restart.h:34 #3 0x4009f179 in lock () from /usr/lib/libefence.so.0 #4 0x4009fa10 in free () from /usr/lib/libefence.so.0 #5 0x40092bf3 in __pthread_destroy_specifics () at specific.c:165 #6 0x4008f2d7 in pthread_exit (retval=0x0) at join.c:37 #7 0x4008fc05 in pthread_start_thread (arg=0x41b22c00) at manager.c:265 And another is: (gdb) where #0 0x400cf8a5 in __sigsuspend (set=0x4132260c) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x400920d9 in __pthread_wait_for_restart_signal (self=0x41322c00) at pthread.c:934 #2 0x400930ac in __new_sem_wait (sem=0x400a173c) at restart.h:34 #3 0x4009f179 in lock () from /usr/lib/libefence.so.0 #4 0x4009fa10 in free () from /usr/lib/libefence.so.0 #5 0x08085064 in wms_free (deadbuf=0x45d1cff0) at wms.c:109 (wms_free() is my function that calls free). It seems that all other threads exit fine, these two are stuck, so I'm guessing some kind of bad interaction when a thread is exiting just as another is freeing memory. If you have trouble reproducing this, let me know. Version-Release number of selected component (if applicable): ElectricFence-2.2.2-7 How reproducible: Sometimes Steps to Reproduce: 1.Uhhh...run my large non-free multithreaded server 2.exit 3.OK so this is hard for you to reproduce. If a glance at the code doesn't make the bug obvious, let me know, I'll see if I can write a small program to reproduce. Now I feel really lazy telling you to look at the code instead of me. Really, I would, I'm trying to put together a release in the next 2 hours and I don't have time. Maybe later today I'll see what I can find if I can get this release done. Actual Results: Two threads got deadlocked. Expected Results: Threads should have just plain exited. Additional info: Would be wonderful to get efence working with threads. I think I have a problem accessing memory after freeing, I can't find it, but it happens very rarely (like once every month the server is running) so I need to be able to run production server with efence, and that isn't possible when the thread of this shutdown problem is there.
Created attachment 89771 [details] Patch to efence.c to fix the deadlock problem. The problem is that efence calls sem_wait which deadlocks inside pthread_exit. The attached patch to efence.c gives the option of using a pthread_mutex rather than a semaphore. Note: you *must* compile with -DUSE_MUTEX instead of -DUSE_SEMAPHORE. I've mailed the same patch to Bruce Perens. So far, the patch has solved my problem, which was identical to that of the bug reported.
Note: My patch was applied on a RedHat 8.0 box. The bug looks like it's present on all systems from 7.1 to 8.0.
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still running Red Hat Linux, you are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Red Hat apologizes that these issues have not been resolved yet. We do want to make sure that no important bugs slip through the cracks. Please check if this issue is still present in a current Fedora Core release. If so, please change the product and version to match, and check the box indicating that the requested information has been provided. Note that any bug still open against Red Hat Linux on will be closed as 'CANTFIX' on September 30, 2006. Thanks again for your help.
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still running Red Hat Linux, you are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Closing as CANTFIX.