Bug 54368 - Race condition in Electric Fence and Pthreads
Summary: Race condition in Electric Fence and Pthreads
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: ElectricFence
Version: 7.1
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Petr Machata
QA Contact:
URL: ? None
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-10-04 21:54 UTC by William Shubert
Modified: 2015-05-05 01:32 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-10-18 14:26:00 UTC
Embargoed:


Attachments (Terms of Use)
Patch to efence.c to fix the deadlock problem. (2.03 KB, patch)
2003-02-02 01:04 UTC, simra
no flags Details | Diff

Description William Shubert 2001-10-04 21:54:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901

Description of problem:
There seems to be a race condition in electric fence and pthreads. I have a
large multithreaded application that, during shutdown, has many threads
exiting and many calls to "free" all at once. About 1 in 5 shutdowns, two
threads get deadlocked. gdb reveals that one thread is as follows:

(gdb) where
#0  0x400cf8a5 in __sigsuspend (set=0x41b228ec)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x400920d9 in __pthread_wait_for_restart_signal (self=0x41b22c00)
    at pthread.c:934
#2  0x400930ac in __new_sem_wait (sem=0x400a173c) at restart.h:34
#3  0x4009f179 in lock () from /usr/lib/libefence.so.0
#4  0x4009fa10 in free () from /usr/lib/libefence.so.0
#5  0x40092bf3 in __pthread_destroy_specifics () at specific.c:165
#6  0x4008f2d7 in pthread_exit (retval=0x0) at join.c:37
#7  0x4008fc05 in pthread_start_thread (arg=0x41b22c00) at manager.c:265

And another is:

(gdb) where
#0  0x400cf8a5 in __sigsuspend (set=0x4132260c)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x400920d9 in __pthread_wait_for_restart_signal (self=0x41322c00)
    at pthread.c:934
#2  0x400930ac in __new_sem_wait (sem=0x400a173c) at restart.h:34
#3  0x4009f179 in lock () from /usr/lib/libefence.so.0
#4  0x4009fa10 in free () from /usr/lib/libefence.so.0
#5  0x08085064 in wms_free (deadbuf=0x45d1cff0) at wms.c:109

(wms_free() is my function that calls free). It seems that all other
threads exit fine, these two are stuck, so I'm guessing some kind of bad
interaction when a thread is exiting just as another is freeing memory. If
you have trouble reproducing this, let me know.

Version-Release number of selected component (if applicable):
ElectricFence-2.2.2-7

How reproducible:
Sometimes

Steps to Reproduce:
1.Uhhh...run my large non-free multithreaded server
2.exit
3.OK so this is hard for you to reproduce. If a glance at the code doesn't
make the bug obvious, let me know, I'll see if I can write a small program
to reproduce. Now I feel really lazy telling you to look at the code
instead of me. Really, I would, I'm trying to put together a release in the
next 2 hours and I don't have time. Maybe later today I'll see what I can
find if I can get this release done.
	

Actual Results:  Two threads got deadlocked.

Expected Results:  Threads should have just plain exited.

Additional info:

Would be wonderful to get efence working with threads. I think I have a
problem accessing memory after freeing, I can't find it, but it happens
very rarely (like once every month the server is running) so I need to be
able to run production server with efence, and that isn't possible when the
thread of this shutdown problem is there.

Comment 1 simra 2003-02-02 01:04:50 UTC
Created attachment 89771 [details]
Patch to efence.c to fix the deadlock problem.

The problem is that efence calls sem_wait which deadlocks inside pthread_exit. 
The attached patch to efence.c gives the option of using a pthread_mutex rather
than a semaphore.  Note: you *must* compile with -DUSE_MUTEX instead of
-DUSE_SEMAPHORE.

I've mailed the same patch to Bruce Perens.  So far, the patch has solved my
problem, which was identical to that of the bug reported.

Comment 2 simra 2003-02-02 01:08:14 UTC
Note: My patch was applied on a RedHat 8.0 box. The bug looks like it's present
on all systems from 7.1 to 8.0.


Comment 3 Bill Nottingham 2006-08-07 17:22:07 UTC
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still
running Red Hat Linux, you are strongly advised to upgrade to a
current Fedora Core release or Red Hat Enterprise Linux or comparable.
Some information on which option may be right for you is available at
http://www.redhat.com/rhel/migrate/redhatlinux/.

Red Hat apologizes that these issues have not been resolved yet. We do
want to make sure that no important bugs slip through the cracks.
Please check if this issue is still present in a current Fedora Core
release. If so, please change the product and version to match, and
check the box indicating that the requested information has been
provided. Note that any bug still open against Red Hat Linux on will be
closed as 'CANTFIX' on September 30, 2006. Thanks again for your help.


Comment 4 Bill Nottingham 2006-10-18 14:26:00 UTC
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still
running Red Hat Linux, you are strongly advised to upgrade to a
current Fedora Core release or Red Hat Enterprise Linux or comparable.
Some information on which option may be right for you is available at
http://www.redhat.com/rhel/migrate/redhatlinux/.

Closing as CANTFIX.


Note You need to log in before you can comment on or make changes to this bug.