Bug 138384 - posix_locks_deadlock() getting stuck in endless loop
Summary: posix_locks_deadlock() getting stuck in endless loop
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Frank Hirtz
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-11-08 19:32 UTC by David Lehman
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-12-20 20:56:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:550 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4 2004-12-20 05:00:00 UTC

Description David Lehman 2004-11-08 19:32:41 UTC
*** This bug has been split off bug 132540 ***

Description of problem:

posix_locks_deadlock() is getting stuck in an endless loop when 
running samba stress.  This is because samba is using both flocks and 
posix locks, and, when flocks are blocked, they are added to the 
blocked_list without first checking for possible deadlocks with the 
function posix_locks_deadlock()--whereas all posix lock requests are 
checked for possible deadlocks *before* they are added to 
blocked_list.

When there is a circular dependency in blocked_list, 
posix_locks_deadlock gets stuck in that circle.

The fix is to not add flock requests to blocked_list when they are 
blocked.  blocked_list is only used to check for possible deadlocks, 
which should only be done for posix locks.  Here's a patch:


--- locks.c.orig	2004-09-14 11:12:26.000000000 -0500
+++ locks.c	2004-09-14 11:13:32.000000000 -0500
@@ -459,7 +459,8 @@ static void locks_insert_block(struct fi
 	}
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	waiter->fl_next = blocker;
-	list_add(&waiter->fl_link, &blocked_list);
+	if (IS_POSIX(blocker))
+		list_add(&waiter->fl_link, &blocked_list);
 }
 
 /* Wake up processes blocked waiting for blocker.



Version-Release number of selected component (if applicable):
2.6.7-1.451.2.3

How reproducible:
reproducible every time with the right setup, but it's hard to get 
the right setup.

Steps to Reproduce:

1.connect a SCSI drive and a USB hard drive to a server
2.share the SCSI drive, the USB drive, and a RAM drive with samba
3.connect 30 clients to the server, make each run 3 threads of 
network stress--one to each of the three samba shares
4.system fails in an hour or so  

Actual results:

system hangs, but SysRq is generally still functional.  SysRq shows 
that one task is stuck in posix_locks_deadlock(), while others are 
waiting for the big kernel lock that is held by posix_locks_deadlock
().


Expected results:

system should continue to run without problems.

Additional info:

------- Additional comment by Jason Baron on 2004.10.21 14:48 -------

this patch is included the lattest beta of rhel4.

Comment 2 Rik van Riel 2004-11-08 19:40:37 UTC
Why are you reporting a RHEL3 bug for kernel 2.6.7-1.451.2.3 ?

Comment 3 David Lehman 2004-11-08 19:47:58 UTC
My mistake. 

This bug was reported against kernel-2.4.21-20.EL.


Comment 6 Ernie Petrides 2004-11-08 19:56:06 UTC
This problem was already fixed in U4 (on 22-Sep-2004 in kernel
version 2.4.21-20.10.EL).  Please verify fix in current U4 beta.


Comment 7 John Flanagan 2004-12-20 20:56:56 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html



Note You need to log in before you can comment on or make changes to this bug.