Bug 138384 - posix_locks_deadlock() getting stuck in endless loop
posix_locks_deadlock() getting stuck in endless loop
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity high
: ---
: ---
Assigned To: Frank Hirtz
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2004-11-08 14:32 EST by David Lehman
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-12-20 15:56:56 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description David Lehman 2004-11-08 14:32:41 EST
*** This bug has been split off bug 132540 ***

Description of problem:

posix_locks_deadlock() is getting stuck in an endless loop when 
running samba stress.  This is because samba is using both flocks and 
posix locks, and, when flocks are blocked, they are added to the 
blocked_list without first checking for possible deadlocks with the 
function posix_locks_deadlock()--whereas all posix lock requests are 
checked for possible deadlocks *before* they are added to 

When there is a circular dependency in blocked_list, 
posix_locks_deadlock gets stuck in that circle.

The fix is to not add flock requests to blocked_list when they are 
blocked.  blocked_list is only used to check for possible deadlocks, 
which should only be done for posix locks.  Here's a patch:

--- locks.c.orig	2004-09-14 11:12:26.000000000 -0500
+++ locks.c	2004-09-14 11:13:32.000000000 -0500
@@ -459,7 +459,8 @@ static void locks_insert_block(struct fi
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	waiter->fl_next = blocker;
-	list_add(&waiter->fl_link, &blocked_list);
+	if (IS_POSIX(blocker))
+		list_add(&waiter->fl_link, &blocked_list);
 /* Wake up processes blocked waiting for blocker.

Version-Release number of selected component (if applicable):

How reproducible:
reproducible every time with the right setup, but it's hard to get 
the right setup.

Steps to Reproduce:

1.connect a SCSI drive and a USB hard drive to a server
2.share the SCSI drive, the USB drive, and a RAM drive with samba
3.connect 30 clients to the server, make each run 3 threads of 
network stress--one to each of the three samba shares
4.system fails in an hour or so  

Actual results:

system hangs, but SysRq is generally still functional.  SysRq shows 
that one task is stuck in posix_locks_deadlock(), while others are 
waiting for the big kernel lock that is held by posix_locks_deadlock

Expected results:

system should continue to run without problems.

Additional info:

------- Additional comment by Jason Baron on 2004.10.21 14:48 -------

this patch is included the lattest beta of rhel4.
Comment 2 Rik van Riel 2004-11-08 14:40:37 EST
Why are you reporting a RHEL3 bug for kernel 2.6.7-1.451.2.3 ?
Comment 3 David Lehman 2004-11-08 14:47:58 EST
My mistake. 

This bug was reported against kernel-2.4.21-20.EL.
Comment 6 Ernie Petrides 2004-11-08 14:56:06 EST
This problem was already fixed in U4 (on 22-Sep-2004 in kernel
version 2.4.21-20.10.EL).  Please verify fix in current U4 beta.
Comment 7 John Flanagan 2004-12-20 15:56:56 EST
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.