Description of problem: When you try two flocks, one after the other from the same process, with different file descriptors on the same file, gfs2 trips the kernel BUG at fs/gfs2/glock.c:1118! Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Run the test program flucker.c like ./flucker /mnt/gfs2/foo 2. Boom 3. Actual results: Boom Expected results: No Boom. Additional info: ext3 behaves as expected: SH followed by SH - granted SH followed by EX - EAGAIN EX followed by SH - EAGAIN EX followed by EX - EAGAIN gfs2 trips this assert in all the above cases. Stack trace: original: gfs2_flock+0x16a/0x1e9 [gfs2] new: gfs2_flock+0x16a/0x1e9 [gfs2] ------------[ cut here ]------------ kernel BUG at fs/gfs2/glock.c:1118! invalid opcode: 0000 [#1] SMP last sysfs file: /fs/gfs2/niobe:gfs2/lock_module/block Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2 dlm configfs sunrpc ipv6 video sbs backlight i2c_ec button battery asus_acpi ac lp ata_piix libata sg floppy ide_cd parport_pc parport cdrom i2c_i810 i2c_algo_bit i2c_i801 i2c_core pcspkr e1000 dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<e0d1efb4>] Not tainted VLI EFLAGS: 00010246 (2.6.18-44.gfs2abhi.003 #1) EIP is at gfs2_glock_nq+0xe2/0x184 [gfs2] eax: 00000020 ebx: d2e4f854 ecx: e0d34090 edx: d12b7ed8 esi: d28bcb14 edi: d0e39980 ebp: d0e39980 esp: d12b7ed4 ds: 007b es: 007b ss: 0068 Process flucker (pid: 2553, ti=d12b7000 task=d08cf000 task.ti=d12b7000) Stack: e0d34090 00000006 00000001 e0d34083 000009f9 e0d34090 00000006 00000001 e0d34083 000009f9 d0e1c000 00000000 00000000 d28bcb14 de6786c0 00000001 e0d2732c d28bcb14 dfdb15fc 00000006 d040a9d4 d28bcb04 de6786c0 00000000 Call Trace: [<e0d2732c>] gfs2_flock+0x17a/0x1e9 [gfs2] [<c046c4a6>] cache_alloc_refill+0x14b/0x44f [<c044aba1>] audit_syscall_entry+0x11c/0x14e [<e0d271b2>] gfs2_flock+0x0/0x1e9 [gfs2] [<c0482ca1>] sys_flock+0x114/0x147 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: df 8b 56 20 b8 b3 40 d3 e0 e8 4e 08 72 df ff 76 0c 68 83 40 d3 e0 e8 44 78 70 df ff 77 20 ff 77 14 68 90 40 d3 e0 e8 34 78 70 df <0f> 0b 5e 04 87 3e d3 e0 83 c4 28 8b 5e 0c 8d 4f 48 8b 47 48 eb EIP: [<e0d1efb4>] gfs2_glock_nq+0xe2/0x184 [gfs2] SS:ESP 0068:d12b7ed4 <0>Kernel panic - not syncing: Fatal exception
Created attachment 183661 [details] Program to create the problem.
Created attachment 193221 [details] Initial patch Two scenarios when doing multiple flocks from the same process: a) flocks through single file descriptor One fd means same struct file* and same holder structure for all flocks. b) flocks through multiple file descriptors. Each fd has a different holder structure. This patch adds a new function gfs2_flock_glock_nq that's almost like gfs2_glock_nq. It does the list_add from add_to_queue() but does not perform the checks that disallow the same process from queueing multiple holders onto a glock. We need this because of scenario (b) where it's ok for multiple flocks to come from the same process through multiple file descriptors. In scenario (a), when a process requests the second flock through the same file descriptor, we dequeue the first flock, reinit the holder with the new flock and enqueue. In scenario (b), when a process requests the second flock through another file descriptor, we need to find the glock (held by first flock) and queue another holder (corresponding to the second file descriptor). This goes through gfs2_flock_glock_nq() which doesn't trip BUG()s if it's the same process requesting the glocks. Existing problems that this patch doesn't fix: 1) With gfs2, ctrl-c will not break out of a process that is blocked waiting for a flock. So, if we have a single-threaded process that does a SH flock followed by a blocking EX flock, it'll block. Since the SH flock can't be unlocked, we have a deadlock. If the process had two threads, one for each flock, things go smoothly when the first thread unlocks the SH flock. I'm not sure how this case can be handled, or whether it's ok to deadlock if the user's rogue program attempts such a thing. 2) When one process requests promotion or demotion of an flock (i.e. through the same file descriptor, scenario (a) from above), SH followed by EX or EX followed by SH, we currently unlock, reinit holder and relock. There's a race condition between the unlock and the relock where another process/node can capture the lock. I don't know if LM_FLAG_PRIORITY would help, but ideally we should have an atomic operation to promote or demote an flock. This bz does not specify this issue, but I have a gfs1 bz that does.
Created attachment 194051 [details] Second attempt This patch adds a new flag to the gfs2_holder structure GL_FLOCK. It is set on holders of glocks representing flocks. This flag is checked in add_to_queue() and a process is permitted to queue more than one holder onto a glock if it is set. I'm in the middle of testing this patch out and will update this bz with my results.
That patch looks much better I think.
http://post-office.corp.redhat.com/archives/rhkernel-list/2007-September/msg00320.html Posted the rhel5.1 version of this patch to rhkernel-list. Marking this bz POST.
in 2.6.18-48.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html