*** This bug has been split off bug 132540 *** Description of problem: posix_locks_deadlock() is getting stuck in an endless loop when running samba stress. This is because samba is using both flocks and posix locks, and, when flocks are blocked, they are added to the blocked_list without first checking for possible deadlocks with the function posix_locks_deadlock()--whereas all posix lock requests are checked for possible deadlocks *before* they are added to blocked_list. When there is a circular dependency in blocked_list, posix_locks_deadlock gets stuck in that circle. The fix is to not add flock requests to blocked_list when they are blocked. blocked_list is only used to check for possible deadlocks, which should only be done for posix locks. Here's a patch: --- locks.c.orig 2004-09-14 11:12:26.000000000 -0500 +++ locks.c 2004-09-14 11:13:32.000000000 -0500 @@ -459,7 +459,8 @@ static void locks_insert_block(struct fi } list_add_tail(&waiter->fl_block, &blocker->fl_block); waiter->fl_next = blocker; - list_add(&waiter->fl_link, &blocked_list); + if (IS_POSIX(blocker)) + list_add(&waiter->fl_link, &blocked_list); } /* Wake up processes blocked waiting for blocker. Version-Release number of selected component (if applicable): 2.6.7-1.451.2.3 How reproducible: reproducible every time with the right setup, but it's hard to get the right setup. Steps to Reproduce: 1.connect a SCSI drive and a USB hard drive to a server 2.share the SCSI drive, the USB drive, and a RAM drive with samba 3.connect 30 clients to the server, make each run 3 threads of network stress--one to each of the three samba shares 4.system fails in an hour or so Actual results: system hangs, but SysRq is generally still functional. SysRq shows that one task is stuck in posix_locks_deadlock(), while others are waiting for the big kernel lock that is held by posix_locks_deadlock (). Expected results: system should continue to run without problems. Additional info: ------- Additional comment by Jason Baron on 2004.10.21 14:48 ------- this patch is included the lattest beta of rhel4.
Why are you reporting a RHEL3 bug for kernel 2.6.7-1.451.2.3 ?
My mistake. This bug was reported against kernel-2.4.21-20.EL.
This problem was already fixed in U4 (on 22-Sep-2004 in kernel version 2.4.21-20.10.EL). Please verify fix in current U4 beta.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html