Description of problem: Hello, I've got a strange bug: POSIX lock is left in system after process termination. The lock is also blocking all other processes that try to obtain POSIX flock for the same file. Removing the file does not release the lock. Attempt to print /proc/locks crashes system. After some debugging I found that filp->f_count entry at the end of sys_fctl64 function is 1(???) and subsequent call to fput releases file sturcture, but leaves POSIX lock. I've added check for FL_POSIX flag to locks_remove_flock and the problem stoped. This bug happens very often on my environment, but user mode setup is a multi-process/multi-threaded application, so I have no clue how to reduce it to a simple testing program. I work with enterprise linux 3, IBM xSeries345, Pentium4 3000 x2, 2GB RAM
Created attachment 109163 [details] Patch to remove a stale POSIX flock here is a patch I've used
Hello, Here is another issue for that bug: If a process that is blocked by the stale POSIX lock gets SIGINT (Ctrl-C from terminal) it may crash. Here is a OOPS message: CPU: 0 EIP: 0060:[<8016582f>] Tainted: PF EFLAGS: 00010282 EIP is at __fput [kernel] 0xf (2.4.21-20cpsmp/i686) eax: f72ffc00 ebx: f72ffc00 ecx: 8667fce4 edx: f72ffc00 esi: 00000000 edi: f6337c80 ebp: 00000000 esp: f1e83f10 ds: 0068 es: 0068 ss: 0068 Process awk (pid: 9572, stackpage=f1e83000) Stack: 00000296 8667fce4 836fe580 f72ffc00 f6337c80 835801bc 80165817 8017ce72 836fe580 00000000 00000000 836fe5cc f6fb2e80 00000003 f6337c80 00000000 80163a97 f6fb2e80 f6337c80 00000001 00000003 f6337c80 00000001 8012d71c Call Trace: [<80165817>] fput [kernel] 0x17 (0xf1e83f28) [<8017ce72>] locks_remove_posix [kernel] 0x132 (0xf1e83f2c) [<80163a97>] filp_close [kernel] 0x87 (0xf1e83f50) [<8012d71c>] put_files_struct [kernel] 0x6c (0xf1e83f6c) [<8012dfea>] do_exit [kernel] 0x1ba (0xf1e83f88) [<8012e35b>] do_group_exit [kernel] 0x8b (0xf1e83fa4) [<8012e3a3>] sys_exit_group [kernel] 0x13 (0xf1e83fb8) Code: 8b 7d 08 89 04 24 e8 76 76 01 00 8b 4b 60 85 c9 0f 85 cd 00
This issue is already resolved in 2.6 - 1.63 revision of fs/locks.c: http://linux.bkbits.net:8080/linux-2.6/diffs/fs/locks.c@1.63?nav=index.html|src/|src/fs|hist/fs/locks.c
We're seeing this same kernel crash. We have a multithreaded application that uses file locking over NFS. Within 24 hours the RHE3-u3-4 kernels will panic with similar traces as above, always referencing "locks_remove_posix". Applying the patch in comment #1 prevents the kernel panic, although it error message added by the patch shows up frequently. Request that an official patch be rolled into U5.
The patch does not solve the problem. It is a recovery operation after file counter becomes inconsistent.
Is there anyway known way to reproduce this situation? A reproducible testcase sure would be a help.
Unfortunately I donât have a simple testcase for this crash. You can ask Linux if he has one: he has fixed this bug in 2.6
... well its Linus . My misspell :)
*** Bug 157846 has been marked as a duplicate of this bug. ***
Removing from U5 blocker list.
Well, itâs a shameful decision, but frankly, I havenât been expecting else. So, does somebody at RH support have the balls to admit that this bug wonât be solved in EL3 lifetime ?
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-32.6.EL).
Created attachment 115031 [details] Testcase to reproduce the problem This is a testcase which shows the dangling lock problem by reproducing the fcntl/close tace.
Created attachment 115032 [details] Patch to remove locks when the close/fcntl race is detected These changes detect the fcntl/close race and correctly release the lock which was just acquired.
Created attachment 115033 [details] Patch to remove locks when the close/fcntl race is detected These changes detect the fcntl/close race and correctly release the lock which was just acquired.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html