Bug 143823

Summary: [PATCH] Stale POSIX flock
Product: Red Hat Enterprise Linux 3 Reporter: Garik E <kiragon>
Component: kernelAssignee: Peter Staubach <staubach>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: bnocera, jbaron, kiragon, peterm, petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-663 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 14:40:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156320    
Attachments:
Description Flags
Patch to remove a stale POSIX flock
none
Testcase to reproduce the problem
none
Patch to remove locks when the close/fcntl race is detected
none
Patch to remove locks when the close/fcntl race is detected none

Description Garik E 2004-12-29 04:57:44 UTC
Description of problem:
Hello,

I've got a strange bug:
POSIX lock is left in system after process termination. 
The lock is also blocking all other processes that try to obtain POSIX
flock for the same file. Removing the file does not release the lock. 
Attempt to print /proc/locks crashes system.

After some debugging I found that filp->f_count entry at the end of
sys_fctl64 function is 1(???) and subsequent call to fput releases
file sturcture, but leaves POSIX lock.
I've added check for FL_POSIX flag to locks_remove_flock and the
problem stoped.
This bug happens very often on my environment, but user mode setup is
a multi-process/multi-threaded application, so I have no clue how to
reduce it to a simple testing program.


I work with enterprise linux 3, IBM xSeries345, Pentium4 3000 x2,  2GB RAM

Comment 1 Garik E 2004-12-29 08:37:20 UTC
Created attachment 109163 [details]
Patch to remove a stale POSIX flock 

here is a patch I've used

Comment 3 Garik E 2005-01-04 05:08:49 UTC
Hello,

Here is another issue for that bug:
If a process that is blocked by the stale POSIX lock gets SIGINT (Ctrl-C from
terminal) it may crash. Here is a OOPS message:
CPU:    0                                                                       
EIP:    0060:[<8016582f>]    Tainted: PF                                        
EFLAGS: 00010282                                                                
                                                                                
EIP is at __fput [kernel] 0xf (2.4.21-20cpsmp/i686)                             
eax: f72ffc00   ebx: f72ffc00   ecx: 8667fce4   edx: f72ffc00                   
esi: 00000000   edi: f6337c80   ebp: 00000000   esp: f1e83f10                   
ds: 0068   es: 0068   ss: 0068                                                  
Process awk (pid: 9572, stackpage=f1e83000)                                     
Stack: 00000296 8667fce4 836fe580 f72ffc00 f6337c80 835801bc 80165817 8017ce72  
       836fe580 00000000 00000000 836fe5cc f6fb2e80 00000003 f6337c80 00000000  
       80163a97 f6fb2e80 f6337c80 00000001 00000003 f6337c80 00000001 8012d71c  
Call Trace:                                                                     
[<80165817>] fput [kernel] 0x17 (0xf1e83f28)                                    
[<8017ce72>] locks_remove_posix [kernel] 0x132 (0xf1e83f2c)                     
[<80163a97>] filp_close [kernel] 0x87 (0xf1e83f50)                              
[<8012d71c>] put_files_struct [kernel] 0x6c (0xf1e83f6c)                        
[<8012dfea>] do_exit [kernel] 0x1ba (0xf1e83f88)                                
[<8012e35b>] do_group_exit [kernel] 0x8b (0xf1e83fa4)                           
[<8012e3a3>] sys_exit_group [kernel] 0x13 (0xf1e83fb8)                          
                                                                                
Code:  8b 7d 08 89 04 24 e8 76 76 01 00 8b 4b 60 85 c9 0f 85 cd 00

Comment 4 Garik E 2005-01-04 05:22:39 UTC
This issue is already resolved in 2.6 - 1.63 revision of fs/locks.c:
http://linux.bkbits.net:8080/linux-2.6/diffs/fs/locks.c@1.63?nav=index.html|src/|src/fs|hist/fs/locks.c




Comment 5 D. Michaud 2005-02-16 15:24:30 UTC
We're seeing this same kernel crash. We have a multithreaded
application that uses file locking over NFS. Within 24 hours the
RHE3-u3-4 kernels will panic with similar traces as above, always
referencing "locks_remove_posix".

Applying the patch in comment #1 prevents the kernel panic, although
it error message added by the patch shows up frequently.

Request that an official patch be rolled into U5.

Comment 6 Garik E 2005-02-17 10:12:06 UTC
The patch does not solve the problem. 
It is a recovery operation after file counter becomes inconsistent.
 

Comment 10 Peter Staubach 2005-05-11 15:13:29 UTC
Is there anyway known way to reproduce this situation?  A reproducible
testcase sure would be a help.

Comment 14 Garik E 2005-05-16 04:19:15 UTC
Unfortunately I donât have a simple testcase for this crash. You can ask Linux
if he has one: he has fixed this bug in 2.6

Comment 15 Garik E 2005-05-16 04:21:45 UTC
... well its Linus .
My misspell :)

Comment 16 Peter Staubach 2005-05-16 14:33:52 UTC
*** Bug 157846 has been marked as a duplicate of this bug. ***

Comment 17 Ernie Petrides 2005-05-24 22:46:22 UTC
Removing from U5 blocker list.

Comment 18 Garik E 2005-05-25 04:19:09 UTC
Well, itâs a shameful decision, but frankly, I havenât been expecting else. 
So, does somebody at RH support have the balls to admit that this bug wonât be
solved in EL3 lifetime ?  


Comment 19 Ernie Petrides 2005-06-01 00:33:31 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.6.EL).


Comment 20 Peter Staubach 2005-06-01 15:03:07 UTC
Created attachment 115031 [details]
Testcase to reproduce the problem

This is a testcase which shows the dangling lock problem by reproducing
the fcntl/close tace.

Comment 21 Peter Staubach 2005-06-01 15:11:39 UTC
Created attachment 115032 [details]
Patch to remove locks when the close/fcntl race is detected

These changes detect the fcntl/close race and correctly release the lock
which was just acquired.

Comment 22 Peter Staubach 2005-06-01 15:12:05 UTC
Created attachment 115033 [details]
Patch to remove locks when the close/fcntl race is detected

These changes detect the fcntl/close race and correctly release the lock
which was just acquired.

Comment 24 Peter Staubach 2005-07-21 19:58:06 UTC
*** Bug 157846 has been marked as a duplicate of this bug. ***

Comment 28 Red Hat Bugzilla 2005-09-28 14:40:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html