This is the BUG on RHEL4 somewhat mangled by netconsole - will try to get a cleaner version: ------------[ cut here ]------------ kernel BUG at fs/locks.c:1798! invalid operand: 0000 [#1] SMP CPU: 0 Modules linked in: nfs lockd nfs_acl md5 ipv6 parport_pc lp parport netconsole netdump autofs4 sunrpc dm_multipath usb_storage button battery ac uhci_hcd ehci_hcd hw_random snd_azx snd_hda_codec sn d_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc tg3 qla2xxx scsi_transport_fc dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod Process locker (pid: 4908, threadinfo=f6256000 task=f686a8b0) EIP: 0060:[<c016efc4>] Not tainted VLI EFLAGS: 00010246 (2.6.9-55.ELsmp) EIP is at locks_remove_flock+0xa1/0xe1 eax: f6b3612c ebx: f6e414b4 ecx: 00000000 edx: 00000081 esi: 00000000 edi: f6e4140c ebp: f67a7380 esp: f6256f28 ds: 007b es: 007b ss: 0068 Stack: 00000007 f67a7380 00000003 00000003 f667f00c 00000286 00000046 f667f00c c01fd679 00000000 00000286 0000001e c01fe418 0000001e 0000001e 00000000 b7f57d42 c0202c45 f667f00c 00000001 00000000 00000246 00000000 f67a7380 Pid: 4908, comm: locker EIP: 0060:[<c016efc4>] CPU: 0 EIP is at locks_remove_flock+0xa1/0xe1 EFLAGS: 00010246 Not tainted (2.6.9-55.ELsmp) EAX: f6b3612c EBX: f6e414b4 ECX: 00000000 EDX: 00000081 ESI: 00000000 EDI: f6e4140c EBP: f67a7380 DS: 007b ES: 007b CR0: 8005003b CR2: b7fa8000 CR3: 36fb6fa0 CR4: 000006f0 Code: 38 39 68 2c 75 2d 0f b6 50 30 f6 c2 02 74 09 89 d8 e8 9f df ff ff eb 1d f6 c2 20 74 0e ba 02 00 00 00 89 d8 e8 ba ec ff ff eb 0a <0f> 0b 06 07 33 ad 2e c0 89 c3 8b 03 eb c4 b8 00 f0 ff ff 21 e0 Call Trace: [<c015bd82>] __fput+0x41/0x100 [<c016ab21>] sys_fcntl64+0x79/0x80 [<c02d5ee3>] syscall_call+0x7/0xb
This is a problem with how the RHEL4/5 kernels handle POSIX locks. It is possible for an fcntl call to race with a close on the same file descriptor. A similar problem was fixed recently but this appears to be a separate issue (reproduced on 2.6.18-92.1.6.el5). Acquiring a POSIX lock via fcntl goes like this: sys_fcntl() fget() do_fcntl() fcntl_setlk() fput() if(!count) __fput() locks_remove_flock() The fcntl_setlk() call can potentially block for long durations (SETLKW waiting, slow/distant lockd) allowing another thread in the same process to e.g. close the file descriptor. And a close looks like this: sys_close() filp_close() locks_remove_posix() fput() if(!count) __fput() locks_remove_flock() If we close the file descriptor from one thread while another is still blocked in fcntl_setlk but before the lock has been granted (and before the struct file_lock is placed on the inode's i_lock list) the call to locks_remove_posix in the close path will "miss" the POSIX lock. We also fail to call locks_remove_flock at this point since the thread in fcntl_setlk is still holding a reference on the file. This causes the final fput from the sys_fcntl return path to trigger a BUG when __fput calls locks_remove_flock: /* * This function is called on the last close of an open file. */ void locks_remove_flock(struct file *filp) { [...] while ((fl = *before) != NULL) { if (fl->fl_file == filp) { if (IS_FLOCK(fl)) { locks_delete_lock(before); continue; } if (IS_LEASE(fl)) { lease_modify(before, F_UNLCK); continue; } if (IS_POSIX(fl)) continue; /* What? */ BUG(); <---- } before = &fl->fl_next; } unlock_kernel(); }
Discussing this with Al Viro this appears similar to a problem fixed in our trees back in May. 2.6.18-92.1.6.el5 should contain that fix however so this appears to be something different.
Created attachment 312392 [details] Lock a file with fcntl/POSIX locking. Lock a file with fcntl/POSIX locking.
I can reliably reproduce this with the code in comment #5 and comment #6. Steps to reproduce: 1. Mount a writable NFS volume from a remote host with nfslock running. 2. Compile the two test programs gcc -pthread -Wall -o locker locker.c gcc -Wall -o lockit lockit.c 3. In one terminal start lockit running on a file on the NFS volume: # lockit /mnt/test 4. In a 2nd terminal start locker running on the same file: # locker /mnt/test 5. Terminate lockit (CTRL-C) Actual results: # ./locker /mnt/locktest pid 4907 main launching thread pid 4907 thread 4908 in do_lock pid 4907 thread 4908 locking Read from remote host p380-1.gsslab: Connection reset by peer Connection to p380-1.gsslab closed. Host died with BUG() in comment #2 Expected results: No bug. Additional information: Should be possible to trigger this with just one program but I found having something already locking the file and then using SETLKW seemed to make it easier to trigger.
Reproducable locally: Same steps as in comment #7 but the nfs volume is exported locally from the test machine: kernel BUG at fs/locks.c:1798! invalid operand: 0000 [#1] SMP Modules linked in: netconsole netdump loop nfs nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 sunrpc dm_multipath usb_storage button battery ac uhci_hcd ehci_hcd hw_random snd_azx snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc tg3 qla2xxx scsi_transport_fc dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod CPU: 0 EIP: 0060:[<c016efc4>] Not tainted VLI EFLAGS: 00010246 (2.6.9-55.ELsmp) EIP is at locks_remove_flock+0xa1/0xe1 eax: f69068ac ebx: f61294b4 ecx: 00000000 edx: 00000081 esi: 00000000 edi: f612940c ebp: f6e4d180 esp: f62a2f28 ds: 007b es: 007b ss: 0068 Process locker (pid: 5333, threadinfo=f62a2000 task=f6d61230) Stack: 00000007 f6e4d180 00000003 f62a2f5c c011cd42 00000286 f6f81680 f6f81430 f6ed5c80 f7e3d980 f6f8159c f5f9cec4 c02d3d56 f62a2fbc c02d3d86 00000000 b7fb5d42 f6f81430 f6f81430 00000001 00000000 00000246 00000000 f6e4d180 Call Trace: [<c011cd42>] recalc_task_prio+0x128/0x133 [<c02d3d56>] schedule+0x84e/0x8ec [<c02d3d86>] schedule+0x87e/0x8ec [<c015bd82>] __fput+0x41/0x100 [<c016ab21>] sys_fcntl64+0x79/0x80 [<c02d5ee3>] syscall_call+0x7/0xb Code: 38 39 68 2c 75 2d 0f b6 50 30 f6 c2 02 74 09 89 d8 e8 9f df ff ff eb 1d f6 c2 20 74 0e ba 02 00 00 00 89 d8 e8 ba ec ff ff eb 0a <0f> 0b 06 07 33 ad 2e c0 89 c3 8b 03 eb c4 b8 00 f0 ff ff 21 e0
Created attachment 312399 [details] log of BUG in comment #8
This seems to be the same problem discussed here: http://lkml.org/lkml/2006/2/17/135 And this was also reported back in 2004: http://lkml.org/lkml/2004/10/1/116 Chris Wright provided a workaround patch for testing: http://lkml.org/lkml/2004/10/1/276 The patch adds a check in the return path from sys_fcntl that the current entry in the file table for our fd matches the file we're holding.
Created attachment 312401 [details] Patch from http://lkml.org/lkml/2004/10/1/276
Discussed this with Al Viro who mentioned: linux-2.6-fs-race-condition-in-dnotify.patch (and bug 443440 / bug 439759) Which fixes a similar fcntl/close race (I can't see the bugs but the patch header describes a reordering bug updating i_flock). I can't seem to trigger this on 2.6.9-78.EL with the steps in comment #7 but I think that might be down to the test case. This can be triggered using those steps on 2.6.18-92.1.6.el5 which does contain this patch.
Created attachment 312402 [details] reproducer - lock file in one thread and close in another Slightly less stupid test case. Still doesn't seem to trigger it on 2.6.9-78.EL so it may already be fixed there?
Used gcc-4.1.2-42.el5, and it complained "locker.c:13: error: expected declaration specifiers or '...' before 'gettid'". Modified locker.c as follows: 13 /* _syscall0(pid_t,gettid); */ 14 pid_t gettid(void) 15 { 16 return syscall(__NR_gettid); 17 }
So I tested this on 2.6.18-92.el5, and given the test scenario in comment #7, I managed to trigger the problem after numerous tries. intel-s5000phb-01.rhts.bos.redhat.com login: ------------[ cut here ]------------ kernel BUG at fs/locks.c:2054! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:02.0/0000:07:00.1/irq Modules linked in: nfs fscache nfsd exportfs lockd nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlight i2c_ec button battery asus_acpi ac parport_pc lp parport sg i5000_edac i2c_i801 ide_cd edac_mc cdrom i2c_core pcspkr e1000e serio_raw dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 3 EIP: 0060:[<c0482513>] Not tainted VLI EFLAGS: 00010246 (2.6.18-92.el5 #1) EIP is at locks_remove_flock+0xc1/0xde eax: f6e5c41c ebx: d88a89f0 ecx: 00000286 edx: 00000000 esi: f7b44cc0 edi: f7b44cc0 ebp: d88a8950 esp: da0fdf18 ds: 007b es: 007b ss: 0068 Process locker2 (pid: 16672, ti=da0fd000 task=c1a1b000 task.ti=da0fd000) Stack: 00000000 00000000 00000000 00000000 00000000 f7b44cc0 0000411f 00000000 00000000 00000000 f7b44cc0 00000212 00000000 00000000 ffffffff 7fffffff 00000000 00000000 00000000 00000000 00000000 db1375a0 00100100 00200200 Call Trace: [<c047189b>] __fput+0x88/0x167 [<c047fdf6>] sys_fcntl64+0x79/0x80 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: 34 39 70 28 75 2d 0f b6 50 2c f6 c2 02 74 09 89 d8 e8 b7 fe ff ff eb 1d 80 e2 20 74 0e ba 02 00 00 00 89 d8 e8 12 ff ff ff eb 0a <0f> 0b 06 08 6d f4 62 c0 89 c3 8b 03 85 c0 75 c6 e8 12 71 18 00 EIP: [<c0482513>] locks_remove_flock+0xc1/0xde SS:ESP 0068:da0fdf18 <0>Kernel panic - not syncing: Fatal exception
Bryn, I managed to trigger the problem on rhel-4 with kernel-2.6.9-78.ELsmp, so it's not fixed. I am going to redo the test with the latest kernel to make sure. dell-pesc420-01.rhts.bos.redhat.com login: ------------[ cut here ]------------ kernel BUG at fs/locks.c:1823! invalid operand: 0000 [#1] SMP Modules linked in: nfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 sunrpc cpufreq_powersave loop button battery ac uhci_hcd ehci_hcd tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod CPU: 0 EIP: 0060:[<c0170a30>] Not tainted VLI EFLAGS: 00010246 (2.6.9-78.ELsmp) EIP is at locks_remove_flock+0xa1/0xe1 eax: c2def78c ebx: f6320214 ecx: 00000000 edx: 00000081 esi: 00000000 edi: f632016c ebp: f6807a80 esp: d1253f24 ds: 007b es: 007b ss: 0068 Process locker2 (pid: 21124, threadinfo=d1253000 task=f71ce630) Stack: f632016c 00000007 f6807a80 00000003 00000003 f426700c 00000286 00000046 f426700c c02014f4 00000000 00000286 00000020 c0202299 00000020 00000020 00000000 b7fb7d44 c0206b2f f426700c 00000001 00000246 00000000 f6807a80 Call Trace: [<c02014f4>] tty_ldisc_deref+0x50/0x5f [<c0202299>] tty_write+0x258/0x262 [<c0206b2f>] write_chan+0x0/0x1d0 [<c015d39a>] __fput+0x45/0x11a [<c016c4ed>] sys_fcntl64+0x79/0x80 [<c02e09db>] syscall_call+0x7/0xb Code: 38 39 68 2c 75 2d 0f b6 50 30 f6 c2 02 74 09 89 d8 e8 ff de ff ff eb 1d f6 c2 20 74 0e ba 02 00 00 00 89 d8 e8 72 ec ff ff eb 0a <0f> 0b 1f 07 ec 5a 2f c0 89 c3 8b 03 eb c4 b8 00 f0 ff ff 21 e0 <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception
Eugene mentioned the revised test case was also failing on git HEAD with an EBADF from locker. I guess I'll have to add real synchronisation between the threads... (it's hitting the close before the fcntl).
Created attachment 314636 [details] Patch to check for fcntl/close race and clean up POSIX locks Chris's original patch from LKML ported to RHEL4 & fcheck.
Al, the patch in comment #27 is still not pretty but it does seem like it should stop the BUG(). Any feedback on this? I see some not disimilar checks in related code but I don't have the experience with this part of the kernel to say if this is viable or not.
I don't believe that it's going to help. That patch is equivalent to what's *already* done in locks.c. Do we have it reproduced on mainline kernel?
We seem to have some positive testing results with the patch in comment #27.
Eugene - I was about to post this to LKML but I realised we don't actually have testing on current upstream (my last test was from git about a month ago & I've long thrown that kernel away). I have a build from last night's git ready to go on x86_64 but you/Harald seem to be the ones who can trigger this.. I'll give it some testing here and see if I can get it to go but failing that would be great if you or Harald can try it. Harald's latest results are for 78.0.1.EL vs. 78.0.1.bz456282.EL (just the patch from comment #27). He can reliably trigger the BUG on the former but has been unable to trigger it on 78.0.1.bz456282.EL. I'm looking into a console message he reported with only the patched kernel now: nlmclnt_lock: VFS is out of sync with lock manager! Will update with further information when I have it.
Output from a kernel with the lockdump patch applied: Oct 24 10:56:23 [...network console startup...] Oct 24 10:56:44 192.168.122.67 fs/locks.c: invalid struct file_lock at c1950530 Oct 24 10:56:44 192.168.122.67 { Oct 24 10:56:44 192.168.122.67 fl_next 00000000 Oct 24 10:56:44 192.168.122.67 fl_link (next=c19503e4, prev=c037358c) Oct 24 10:56:44 192.168.122.67 fl_block (next=c195053c, prev=c195053c) Oct 24 10:56:44 192.168.122.67 fl_owner e5e01900 Oct 24 10:56:44 192.168.122.67 fl_pid 6232 Oct 24 10:56:44 192.168.122.67 fl_file e5b7e340 Oct 24 10:56:44 192.168.122.67 fl_flags 0x81 Oct 24 10:56:44 192.168.122.67 fl_type F_WRLCK Oct 24 10:56:44 192.168.122.67 fl_start 0 Oct 24 10:56:44 192.168.122.67 fl_end 0 Oct 24 10:56:44 192.168.122.67 fl_fasync 00000000 Oct 24 10:56:44 192.168.122.67 fl_break_time 0 Oct 24 10:56:44 192.168.122.67 fl_ops f8b12a8c [] Oct 24 10:56:44 192.168.122.67 fl_lmops 00000000 [] Oct 24 10:56:44 192.168.122.67 fl_u.nfs_fl { state = 0, flags = 0x0 (), owner = dcd72ee0 } Oct 24 10:56:44 192.168.122.67 } Oct 24 10:56:44 ------------[ cut here ]------------ Oct 24 10:56:44 kernel BUG at fs/locks.c:1871! Oct 24 10:56:44 invalid operand: 0000 [#1] Oct 24 10:56:44 Modules linked in: nfs lockd nfs_acl netconsole parport_pc lp parport autofs4 sunrpc ib_srp scsi_mod ib_sdp ib_ipoib md5 ipv6 rdma_ucm rdma_cm ib_local_sa iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core loop button battery ac uhci_hcd 8139cp mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod Oct 24 10:56:44 CPU: 0 Oct 24 10:56:44 EIP: 0060:[<c018653b>] Not tainted VLI Oct 24 10:56:44 EFLAGS: 00010292 (2.6.9-67.0.15.0.bz456282.bmr0.EL) Oct 24 10:56:44 EIP is at locks_remove_flock+0x11e/0x1b9 Oct 24 10:56:44 eax: 0000004b ebx: c1950530 ecx: c0333136 edx: dae9cee0 Oct 24 10:56:45 esi: daed2bf8 edi: daed2b20 ebp: e5b7e340 esp: dae9cf18 Oct 24 10:56:45 ds: 007b es: 007b ss: 0068 Oct 24 10:56:45 Process locker (pid: 6233, threadinfo=dae9c000 task=dae9a2b0) Oct 24 10:56:45 Stack: 00000003 fffffe00 c0185b4b daed2b20 00000007 e5b7e340 00000003 00000000 Oct 24 10:56:45 192.168.122.67 00000046 f631100c 00000286 00000000 c022d46d 00000286 0000001e f6311000 Oct 24 10:56:45 192.168.122.67 c022e843 0000001e b7f5cd42 c0234c2a f631100c 0000001e b7f5cd24 00000001 Oct 24 10:56:45 Call Trace: Oct 24 10:56:45 192.168.122.67 [<c0185b4b>] fcntl_setlk+0x3e2/0x3ec Oct 24 10:56:45 192.168.122.67 [<c022d46d>] tty_ldisc_deref+0xc8/0x139 Oct 24 10:56:45 192.168.122.67 [<c022e843>] tty_write+0x381/0x38b Oct 24 10:56:45 192.168.122.67 [<c0234c2a>] write_chan+0x0/0x1d5 Oct 24 10:56:45 192.168.122.67 [<c016e066>] __fput+0x41/0xee Oct 24 10:56:45 192.168.122.67 [<c018145e>] sys_fcntl64+0x79/0x80 Oct 24 10:56:45 192.168.122.67 [<c031e103>] syscall_call+0x7/0xb Oct 24 10:56:45 Code: 32 0f b6 43 40 a8 02 74 09 89 f0 e8 02 d5 ff ff eb 23 a8 20 74 0e ba 02 00 00 00 89 f0 e8 b0 e5 ff ff eb 11 89 d8 e8 a7 fd ff ff <0f> 0b 4f 07 fa 2e 33 c0 89 de 8b 1e eb bf b8 00 f0 ff ff 21 e0 Oct 24 10:56:45 192.168.122.67 <0>Fatal exception: panic in 5 seconds Oct 24 10:56:49 Kernel panic - not syncing: Fatal exception
fl_flags 0x81 == FL_POSIX | FL_SLEEP
Another crash with the debug kernel. This time I added calls to dump_struct_lock_info() inside locks_remove_posix as well. The lock that we end up BUG()ging on was not removed by locks_remove_posix (despite being having fl_flags == 0x81): Oct 24 16:58:14 locks_remove_posix: NOT removing lock at c194ebc0 Oct 24 16:58:14 fs/locks.c: dumping struct file_lock at c194ebc0 Oct 24 16:58:14 { Oct 24 16:58:14 fl_next 00000000 Oct 24 16:58:14 fl_link (next=c194e224, prev=c037458c) Oct 24 16:58:14 fl_block (next=c194ebcc, prev=c194ebcc) Oct 24 16:58:15 fl_owner e6700200 Oct 24 16:58:15 fl_pid 3928 Oct 24 16:58:15 fl_file e4ae9400 Oct 24 16:58:15 fl_flags 0x81 Oct 24 16:58:15 fl_type F_RDLCK Oct 24 16:58:15 fl_start 4 Oct 24 16:58:15 fl_end 4 Oct 24 16:58:15 fl_fasync 00000000 Oct 24 16:58:15 fl_break_time 0 Oct 24 16:58:15 fl_ops 00000000 [] Oct 24 16:58:15 fl_lmops 00000000 [] Oct 24 16:58:15 fl_u.nfs_fl { state = 0, flags = 0x0 (n/a), owner = 00000000 } Oct 24 16:58:15 } [...] Oct 24 16:59:06 fs/locks.c:locks_remove_flock() invalid file_lock Oct 24 16:59:06 fs/locks.c: dumping struct file_lock at c194ebc0 Oct 24 16:59:06 { Oct 24 16:59:06 fl_next 00000000 Oct 24 16:59:06 fl_link (next=c194e224, prev=c037458c) Oct 24 16:59:06 fl_block (next=c194ebcc, prev=c194ebcc) Oct 24 16:59:06 fl_owner e6700040 Oct 24 16:59:06 fl_pid 4647 Oct 24 16:59:06 fl_file e40623c0 Oct 24 16:59:06 fl_flags 0x81 Oct 24 16:59:06 fl_type F_WRLCK Oct 24 16:59:06 fl_start 0 Oct 24 16:59:06 fl_end 0 Oct 24 16:59:06 fl_fasync 00000000 Oct 24 16:59:06 fl_break_time 0 Oct 24 16:59:06 fl_ops f8b12a8c [] Oct 24 16:59:06 fl_lmops 00000000 [] Oct 24 16:59:06 fl_u.nfs_fl { state = 0, flags = 0x0 (n/a), owner = e455c4e0 } Oct 24 16:59:06 } Oct 24 16:59:06 ------------[ cut here ]------------ Oct 24 16:59:06 kernel BUG at fs/locks.c:1876! Oct 24 16:59:06 invalid operand: 0000 [#1] Oct 24 16:59:06 Modules linked in: nfs lockd nfs_acl netconsole parport_pc lp parport autofs4 sunrpc ib_srp scsi_mod ib_sdp ib_ipoib md5 ipv6 rdma_ucm rdma_cm ib_local_sa iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core loop button battery ac uhci_hcd 8139cp mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod Oct 24 16:59:07 CPU: 0 Oct 24 16:59:07 EIP: 0060:[<c018656b>] Not tainted VLI Oct 24 16:59:07 EFLAGS: 00010286 (2.6.9-67.0.15.0.bz456282.bmr1.EL) Oct 24 16:59:07 EIP is at locks_remove_flock+0x128/0x1c4 Oct 24 16:59:07 eax: 0000004f ebx: c194ebc0 ecx: c033315a edx: dae2fedc Oct 24 16:59:07 esi: db5f15a0 edi: db5f14c8 ebp: e40623c0 esp: dae2ff14 Oct 24 16:59:07 ds: 007b es: 007b ss: 0068 Oct 24 16:59:07 Process locker (pid: 4653, threadinfo=dae2f000 task=dae8d1a0) Oct 24 16:59:07 Stack: c03331ff 00000003 fffffe00 c0185b4b db5f14c8 00000007 e40623c0 00000003 Oct 24 16:59:07 00000000 00000046 e898000c 00000286 00000000 c022d49d 00000286 0000001e Oct 24 16:59:07 e8980000 c022e873 eab64e10 dae2ff88 c011f43a e898000c 0000001e b7f03d24 Oct 24 16:59:07 Call Trace: Oct 24 16:59:07 [<c0185b4b>] fcntl_setlk+0x3e2/0x3ec Oct 24 16:59:07 [<c022d49d>] tty_ldisc_deref+0xc8/0x139 Oct 24 16:59:07 [<c022e873>] tty_write+0x381/0x38b Oct 24 16:59:07 [<c011f43a>] recalc_task_prio+0x128/0x133 Oct 24 16:59:07 [<c016e066>] __fput+0x41/0xee Oct 24 16:59:07 [<c018145e>] sys_fcntl64+0x79/0x80 Oct 24 16:59:07 [<c031e133>] syscall_call+0x7/0xb Oct 24 16:59:07 Code: f0 e8 dc d4 ff ff eb 2e a8 20 74 0e ba 02 00 00 00 89 f0 e8 8a e5 ff ff eb 1c 68 ff 31 33 c0 e8 a9 e3 f9 ff 89 d8 e8 6d fb ff ff <0f> 0b 54 07 19 2f 33 c0 59 89 de 8b 1e eb b4 b8 00 f0 ff ff 21 Oct 24 16:59:07 <0>Fatal exception: panic in 5 seconds Oct 24 16:59:11 Kernel panic - not syncing: Fatal exception Since we know that IS_POSIX() for this lock will return true, this must have been because posix_same_owner(fl, &lock) for the two locks returned false: while (*before != NULL) { struct file_lock *fl = *before; if (IS_POSIX(fl) && posix_same_owner(fl, &lock)) { printk(KERN_WARNING "locks_remove_posix: removing lock at %p", fl); dump_struct_file_lock(fl); locks_delete_lock(before); continue; } printk(KERN_WARNING "locks_remove_posix: NOT removing lock at %p", fl); dump_struct_file_lock(fl); before = &fl->fl_next; } Just trying to get a vmcore from one of these crashes so I can take a look at the state of the lock list in more detail.
So, I don't really understand this code, so please forgive my ignorance but I don't understand how this block of code can skip cleaning up some POSIX locks: struct file_lock *fl = *before; if (IS_POSIX(fl) && posix_same_owner(fl, &lock)) { printk(KERN_WARNING "locks_remove_posix: removing lock at %p", fl); dump_struct_file_lock(fl); locks_delete_lock(before); continue; } If we're going to not clean up these locks due to the posix_same_owner() check then we *know* that we will bug on one of those locks when this fd is shut down by the final call to fput(). I guess I don't understand when it would be valid to be returning from sys_close() -> locks_remove_posix() leaving FL_POSIX locks remaining on the list.
Because it also gets called from nfs4state.c:release_stateid() I guess
I've just tried to crash 78.0.1.0.bz456282 (see comment 38) again. No chance. best regards, Hari
(In reply to comment #59) > I've just tried to crash 78.0.1.0.bz456282 (see comment 38) again. No chance. Thanks Hari. Did you try to trigger it on the latest upstream kernel? Eugene
(In reply to comment #60) just tested with 2.6.27.4 on 32bit. No crash, but I noticed those messages: mount used greatest stack depth: 1824 bytes left locker used greatest stack depth: 1788 bytes left locker used greatest stack depth: 1732 bytes left locker used greatest stack depth: 1660 bytes left Hari
Those messages are generally harmless; each time a thread hits a new high-water (low-water?) mark, it prints this message. 1k+ is still a bit of room.... (though if this was on x86_64, with 8k of stack, that may be worrisome for x86, with only 4k)
test in comment #61 was done on a 32bit KVM instance.
*** Bug 464763 has been marked as a duplicate of this bug. ***
Copying over Uli's problem description from bug 464763: Comment #1 From Ulrich Obergfell (uobergfe) 2008-09-30 07:52:28 problem description ------------------- Kernel panics reproducibly with the following footprint: kernel BUG at fs/locks.c:1823! invalid operand: 0000 [#1] SMP Modules linked in: nfs nfsd exportfs lockd nfs_acl parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc loop button battery ac md5 ipv6 pcnet32 mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm _mod ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod CPU: 1 EIP: 0060:[<c0170a30>] Not tainted VLI EFLAGS: 00010246 (2.6.9-78.ELsmp) EIP is at locks_remove_flock+0xa1/0xe1 eax: cf029c0c ebx: ca64936c ecx: 00000000 edx: 00000001 esi: 00000000 edi: ced6940c ebp: c6580ec0 esp: ce107f24 ds: 007b es: 007b ss: 0068 Process flocktest (pid: 15982, threadinfo=ce107000 task=c63ef6b0) Stack: ced6940c 00000006 c6580ec0 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000246 00000000 c6580ec0 Call Trace: [<c015d39a>] __fput+0x45/0x11a [<c016c4ed>] sys_fcntl64+0x79/0x80 [<c02e09db>] syscall_call+0x7/0xb Code: 38 39 68 2c 75 2d 0f b6 50 30 f6 c2 02 74 09 89 d8 e8 ff de ff ff eb 1d f6 c2 20 74 0e ba 02 00 00 00 89 d8 e8 72 ec ff ff eb 0a <0f> 0b 1f 07 d9 5a 2f c0 89 c3 8b 03 eb c4 b8 00 f0 ff ff 21 e0 <0>Fatal exception: panic in 5 seconds The multithreaded "flocktest" process opens files via NFS and uses the fcntl() system call to lock ranges of bytes in those files. The flow of execution thru the fcntl() system call that is suspected to cause the problem is as follows: sys_fcntl filp = fget(fd) increment f_count do_fcntl { switch (cmd) case F_SETLK: fcntl_setlk { again: // filp->f_op->lock nfs_lock { do_setlk { // NFS_PROTO(inode)->lock nfs3_proc_lock nlmclnt_proc - nlmclnt_lock - nlmclnt_call - rpc_call_sync if ("NFS_PROTO(inode)->lock" returned -EINTR) posix_lock_file_wait Create a local FL_POSIX lock. ........................... : return (status from "NFS_PROTO(inode)->lock") : } : } : : // Attempt to detect a close/fcntl race and recover by releasing : // the lock that was just acquired. : : if ("do_setlk/nfs_lock" returned zero && file has been closed) : set l_type = F_UNLCK : goto again : } : } : : fput(filp) : decrement f_count : if (f_count is zero) : __fput : locks_remove_flock : BUG() because the local FL_POSIX lock created by do_setlk() .....: still exists. One thread in the process executes the fcntl() system call. sys_fcntl() calls fget() which increments the reference count "f_count" of the file structure. Let's assume that "f_count" is now equal two. The thread subsequently issues an RPC call in nfs3_proc_lock() and gets blocked while waiting for a response from the NFS server. Another thread in the process closes the file. filp_close() calls fput() which decrements "f_count" but does not call __fput() because the reference count is still greater than zero (there is a reference from sys_fcntl()). The thread that is waiting for a response from the NFS server is interrupted by a signal, so "NFS_PROTO(inode)->lock" returns -EINTR and do_setlk() calls posix_lock_file_wait() in order to create a local FL_POSIX lock. fcntl_setlk() executes the following code after "filp->f_op->lock" returns: spin_lock(¤t->files->file_lock); f = fcheck(fd); spin_unlock(¤t->files->file_lock); if (!error && ^^^^^ is -EINTR cmd != F_UNLCK && f != filp && flock.l_type != F_UNLCK) { flock.l_type = F_UNLCK; goto again; } The fragment of code from fcntl_setlk() attempts to detect a close/fcntl race and recover by releasing the lock that was just acquired. However, since error is -EINTR in the scenario outlined above, the body of the "if" statement is not executed and thus the local FL_POSIX lock created by do_setlk() is not released. When sys_fcntl() subsequently calls fput() "f_count" goes down to zero (no more references). locks_remove_flock() BUG's because the local FL_POSIX lock created by do_setlk() still exists.
Comment #2 From Ulrich Obergfell (uobergfe) 2008-09-30 07:57:43 EDT The problem exists in kernel 2.6.9-78 (RHEL 4.7) and in 2.6.18-53 (RHEL 5.1). Panic is reproducible on a lab machine with an instrumented 2.6.18-53 kernel. Please find below: 1. description of instrumentation 2. description of method used to reproduce 3. details from a vmcore 4. source code of test program "t.c" 1. instrumentation --------------- The instrumentation consists of a printk() in do_setlk() and three printk()'s in fcntl_setlk() in order to trace the flow of execution through the suspected code. - code fragment from fs/nfs/file.c: static int do_setlk(struct file *filp, int cmd, struct file_lock *fl) { ... if (!(NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM)) { status = NFS_PROTO(inode)->lock(filp, cmd, fl); ... if (status == -EINTR || status == -ERESTARTSYS) { printk("do_setlk: status=%d\n", status); <-------- (1) do_vfs_lock(filp, fl); } } else status = do_vfs_lock(filp, fl); ... if (status < 0) goto out; ... out: return status; } - code fragment from fs/locks.c: int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, struct flock __user *l) { ... if (filp->f_op && filp->f_op->lock != NULL) { error = filp->f_op->lock(filp, cmd, file_lock); if (error) printk("fcntl_setlk: error=%d\n", error); <------- (2) } else { ... } ... if ( error && fcheck(fd) != filp && flock.l_type != F_UNLCK) printk("fcntl_setlk: *no* 'goto again'\n"); <------------- (3) if (!error && fcheck(fd) != filp && flock.l_type != F_UNLCK) { flock.l_type = F_UNLCK; printk("fcntl_setlk: 'goto again'\n"); goto again; } ^^^^^^^^^^ ... not executed } 2. method ------ Note : The messages from do_setlk() and fcntl_setlk() indicate that the printk()'s at (1), (2), and (3) were actually executed. The error / status was -4 == -EINTR. The body of the "if" statement that contains the "goto again" was not executed. # echo "test" > /tmp/testfile # mount -o intr,hard localhost:/tmp /mnt # ./t /mnt/testfile Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: nfsd: last server has exited nfsd: unexporting all filesystems [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] do_setlk: status=-4 fcntl_setlk: error=-4 fcntl_setlk: *no* 'goto again' ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/locks.c:1999 invalid opcode: 0000 [1] SMP last sysfs file: /kernel/kexec_crash_loaded CPU 6 Modules linked in: nfs(U) fscache(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) aut ofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) bonding(U) ipv6(U) dm_ multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) i2c_core(U) button(U) batter y(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev (U) shpchp(U) pcspkr(U) bnx2(U) ide_cd(U) cdrom(U) sg(U) dm_snapshot(U) dm_zero( U) dm_mirror(U) dm_mod(U) ata_piix(U) libata(U) megaraid_sas(U) sd_mod(U) scsi_m od(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) Pid: 4618, comm: t Not tainted 2.6.18-53.el5_UliO_Test #9 RIP: 0010:[<ffffffff80027074>] [<ffffffff80027074>] locks_remove_flock+0xe4/0x122 RSP: 0018:ffff8109a6905e48 EFLAGS: 00010246 RAX: ffff8109a6815298 RBX: ffff8109a639b2f8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffffffff802fee80 RBP: ffff8109bcd89380 R08: ffff8109b2824580 R09: 0000000000000020 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8109a639b1f8 R13: ffff8109a639b1f8 R14: ffff8101222ae480 R15: ffff8109a6ca4a98 FS: 0000000040a00940(0063) GS:ffff8109bfc44b40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000360c66c0c7 CR3: 00000009af07b000 CR4: 00000000000006e0 Process t (pid: 4618, threadinfo ffff8109a6904000, task ffff8109bcf23080) Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff8109bcd89380 00000000000011e1 0000000000000000 0000000000000000 0000000000000000 ffff8109bcd89380 0000000000000212 Call Trace: [<ffffffff80012302>] __fput+0x94/0x198 [<ffffffff8002e2d2>] sys_fcntl+0x2d0/0x2dc [<ffffffff8005c28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 aa ee 28 80 c2 cf 07 48 89 c3 48 8b 03 48 85 c0 75 RIP [<ffffffff80027074>] locks_remove_flock+0xe4/0x122 RSP <ffff8109a6905e48> 3. details from a vmore -------------------- 1963 void locks_remove_flock(struct file *filp) 1964 { : 1985 lock_kernel(); 1986 before = &inode->i_flock; 1987 1988 while ((fl = *before) != NULL) { 1989 if (fl->fl_file == filp) { 1990 if (IS_FLOCK(fl)) { 1991 locks_delete_lock(before); 1992 continue; 1993 } 1994 if (IS_LEASE(fl)) { 1995 lease_modify(before, F_UNLCK); 1996 continue; 1997 } 1998 /* What? */ -> 1999 <- BUG(); 2000 } 2001 before = &fl->fl_next; 2002 } 2003 unlock_kernel(); 2004 } crash> sys KERNEL: vmlinux DUMPFILE: vmcore CPUS: 16 DATE: Mon Sep 29 12:00:29 2008 UPTIME: 00:04:45 LOAD AVERAGE: 0.40, 0.38, 0.18 TASKS: 249 NODENAME: ibm-x3850m2-1 RELEASE: 2.6.18-53.el5_UliO_Test VERSION: #9 SMP Mon Sep 29 11:46:37 EDT 2008 MACHINE: x86_64 (2932 Mhz) MEMORY: 37.5 GB PANIC: "" crash> log | grep _setlk do_setlk: status=-4 fcntl_setlk: error=-4 fcntl_setlk: *no* 'goto again' crash> log | grep BUG Kernel BUG at fs/locks.c:1999 crash> bt PID: 4618 TASK: ffff8109bcf23080 CPU: 6 COMMAND: "t" #0 [ffff8109a6905ca0] die at ffffffff8006a681 #1 [ffff8109a6905cd0] do_invalid_op at ffffffff8006ac37 #2 [ffff8109a6905d90] error_exit at ffffffff8005cde9 [exception RIP: locks_remove_flock+228] RIP: ffffffff80027074 RSP: ffff8109a6905e48 RFLAGS: 00010246 RAX: ffff8109a6815298 RBX: ffff8109a639b2f8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffffffff802fee80 RBP: ffff8109bcd89380 R8: ffff8109b2824580 R9: 0000000000000020 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8109a639b1f8 R13: ffff8109a639b1f8 R14: ffff8101222ae480 R15: ffff8109a6ca4a98 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #3 [ffff8109a6905e40] locks_remove_flock at ffffffff80027045 #4 [ffff8109a6905f10] __fput at ffffffff80012302 #5 [ffff8109a6905f50] sys_fcntl at ffffffff8002e2d2 #6 [ffff8109a6905f80] tracesys at ffffffff8005c28d (via system_call) RIP: 000000360d20cddf RSP: 0000000040a00010 RFLAGS: 00000297 RAX: ffffffffffffffda RBX: ffffffff8005c28d RCX: ffffffffffffffff RDX: 0000000040a00110 RSI: 0000000000000006 RDI: 0000000000000003 RBP: 0000000040a01000 R8: 0000000040a00940 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: 0000000040a00110 ORIG_RAX: 0000000000000048 CS: 0033 SS: 002b crash> px ((struct task_struct *)0xffff8109bcf23080)->thread_info.flags $1 = 0x84 ^ TIF_SIGPENDING is set crash> px ((struct task_struct *)0xffff8109bcf23080)->pending.signal.sig $2 = {0x100} ^ SIGKILL is pending crash> dis 0xffffffff800122fa 3 // rdi = rbp (file) 0xffffffff800122fa <__fput+140>: mov %rbp,%rdi // locks_remove_flock() 0xffffffff800122fd <__fput+143>: callq 0xffffffff80026f90 0xffffffff80012302 <__fput+148>: mov 0x20(%rbp),%rax crash> dis locks_remove_flock 2 0xffffffff80026f90 <locks_remove_flock>: push %r12 // save rbp (file) 0xffffffff80026f92 <locks_remove_flock+2>: push %rbp crash> bt -f PID: 4618 TASK: ffff8109bcf23080 CPU: 6 COMMAND: "t" ... #3 [ffff8109a6905e40] locks_remove_flock at ffffffff80027045 ffff8109a6905e48: 0000000000000000 0000000000000000 ... saved rbp (file) ffff8109a6905ef8: 0000000000000010 ffff8109bcd89380 ffff8109a6905f08: ffff8109a639b1f8 ffffffff80012302 ... crash> px *((struct file *)0xffff8109bcd89380)->f_dentry->d_inode->i_flock $3 = { ^ fl_next = 0x0, | ... | fl_pid = 0x11e1 = 4577, | ... | fl_file = 0xffff8109bcd89380, -----+ fl_flags = 0x1, ^ FL_POSIX lock exists for "testfile" fl_type = 0x0, fl_start = 0x0, fl_end = 0x0, ... } crash> ps -g 4577 PID: 4577 TASK: ffff8109bac7c080 CPU: 6 COMMAND: "t" PID: 4618 TASK: ffff8109bcf23080 CPU: 6 COMMAND: "t" crash> bt 4577 PID: 4577 TASK: ffff8109bac7c080 CPU: 6 COMMAND: "t" #0 [ffff8109a6ca7e38] schedule at ffffffff80061f29 #1 [ffff8109a6ca7f20] do_exit at ffffffff80015579 #2 [ffff8109a6ca7f80] tracesys at ffffffff8005c28d (via system_call) RIP: 000000360c69585f RSP: 00007fff9bee28b8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff8005c28d RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 RBP: 000000360c946878 R8: 00000000000000e7 R9: ffffffffffffffb0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000360c946878 R14: ffffffff80046f0e R15: 00007fff9bee29b0 ORIG_RAX: 00000000000000e7 CS: 0033 SS: 002b 4. source code of test program --------------------------- Note: compile with "cc t.c -lpthread -o t" #include <sys/stat.h> #include <stdio.h> #include <fcntl.h> #include <errno.h> #include <pthread.h> #include <unistd.h> #include <stdlib.h> int fd; pthread_t other_thread; void *other_thread_routine(void *dummy_arg) { struct flock fl; fl.l_type = F_RDLCK; fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 1; if (fcntl(fd, F_SETLK, &fl) == -1) { perror("fcntl lock"); goto out; } fl.l_type = F_UNLCK; if (fcntl(fd, F_SETLK, &fl) == -1) perror("fcntl unlock"); out: return (void *)0; } main(int argc, char *argv[]) { pid_t pid; int i, ret; if ((fd = open(argv[1], O_RDONLY)) == -1) { perror("open"); exit(1); } system("service nfs stop"); ret = pthread_create(&other_thread, (pthread_attr_t *)0, other_thread_routine, (void *)0); if (ret) { perror("pthread_create"); exit(1); } sleep(1); if (close(fd) == -1) { perror("close"); exit(1); } }
commit c4d7c402b788b73dc24f1e54a57f89d3dc5eb7bc Author: Trond Myklebust <Trond.Myklebust> Date: Tue Apr 1 20:26:52 2008 -0400 NFS: Remove the buggy lock-if-signalled case from do_setlk() Both NLM and NFSv4 should be able to clean up adequately in the case where the user interrupts the RPC call... Signed-off-by: Trond Myklebust <Trond.Myklebust> diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 10e8b80..742cb74 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -566,17 +566,9 @@ static int do_setlk(struct file *filp, int cmd, struct file_lock *fl) lock_kernel(); /* Use local locking if mounted with "-onolock" */ - if (!(NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM)) { + if (!(NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM)) status = NFS_PROTO(inode)->lock(filp, cmd, fl); - /* If we were signalled we still need to ensure that - * we clean up any state on the server. We therefore - * record the lock call as having succeeded in order to - * ensure that locks_remove_posix() cleans it out when - * the process exits. - */ - if (status == -EINTR || status == -ERESTARTSYS) - do_vfs_lock(filp, fl); - } else + else status = do_vfs_lock(filp, fl); unlock_kernel(); if (status < 0)
Created attachment 325729 [details] Backport c4d7c402b788b73dc24f1e54a57f89d3dc5eb7bc to RHEL4 Backport Trond's patch to remove bogus signal-detection check from do_setlk(). This check is useless (the locking module should be able to clean up properly after being interrupted by a signal) and causes problems since it can leave a half-baked POSIX lock lying around for locks_remove_flock to trip over in the return path from sys_fcntl (after racing with close()). The original patch uses do_vfs_lock() which is an NFS helper routine not present in RHEL4's 2.6.9 kernel (introduced in 2.6.12). All this does is to make the appropriate posix_lock_file_wait()/flock_lock_file_wait() call and log a warning if the lock manager returns an error. This version just calls posix_lock_file_wait directly and prints the warning directly from do_setlk().
Upstream patch applies OK to RHEL5
Tested with 2.6.9-78.0.1.2.bz456282.EL (see comment #67): [root@rhel4-as-i386 ~]# ./t /mnt/testfile Shutting down RPC svcgssd: [FAILED] Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] [...] nfsd: last server has exited nfsd: unexporting all filesystems do_setlk: VFS is out of sync with lock manager! Can't trigger the BUG().
Can't trigger the BUG() using the testcase in comment #7 too.
Hmm. So I added the patch, and brew built a rhel-5 kernel. Did you see this before? [root@rhel5-server-i386 ~]# ./t /mnt/testfile Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: [ OK ] Shutting down NFS services: [ OK ] nfsd: last server has exited nfsd: unexporting all filesystems lockd: cannot monitor 127.0.0.1 lockd: failed to monitor 127.0.0.1 portmap: server localhost not responding, timed out RPC: failed to contact portmap (errno -5). BUG: unable to handle kernel paging request at virtual address 205c1e7f printing eip: c0431a3f *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /block/dm-1/range Modules linked in: nfs fscache nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc ipt_REJECT ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport floppy pcspkr ide_cd i2c_piix4 i2c_core 8139too serio_raw 8139cp mii cdrom dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 1 EIP: 0060:[<c0431a3f>] Not tainted VLI EFLAGS: 00010297 (2.6.18-125.el5.bz456288 #1) EIP is at flush_workqueue+0x3f/0x66 eax: 205c1e7f ebx: 00000000 ecx: 00000286 edx: 00000000 esi: de78f340 edi: 00000000 ebp: 00000801 esp: dc705ee0 ds: 007b es: 007b ss: 0068 Process rpc.nfsd (pid: 1862, ti=dc705000 task=df63e000 task.ti=dc705000) Stack: e0baf320 de78f340 c0431a87 dd9d80e0 00000000 e0b9b4fd c060fce7 dd9d80c0 dd9d80e0 dd9d80c0 00000007 e0b874e5 dd9d80c0 fffffdff e0b11055 e0bad768 e0b8749a dfc89a40 00000008 e0b8792a 00000000 e0b87aef e0ba07c0 dfc89a40 Call Trace: [<c0431a87>] cancel_rearming_delayed_workqueue+0x21/0x23 [<e0b9b4fd>] nfs4_state_shutdown+0x19/0x155 [nfsd] [<c060fce7>] _spin_lock_bh+0x8/0x18 [<e0b874e5>] nfsd_last_thread+0x39/0x62 [nfsd] [<e0b11055>] svc_destroy+0x66/0x95 [sunrpc] [<e0b8749a>] nfsd_svc+0x1aa/0x1bc [nfsd] [<e0b8792a>] write_svc+0x0/0x18 [nfsd] [<e0b87aef>] nfsctl_transaction_write+0x36/0x60 [nfsd] [<c0497245>] sys_nfsservctl+0x13d/0x17c [<c0471319>] do_sys_open+0xa6/0xae [<c0404f17>] syscall_call+0x7/0xb ======================= Code: 15 94 88 7a c0 5b 5e f7 d0 8b 04 90 e9 54 ff ff ff b8 fc fc 67 c0 e8 69 d4 1d 00 b8 e0 ef 6e c0 e8 83 44 0b 00 eb 18 8b 06 f7 d0 <8b> 04 98 e8 32 ff ff ff ba e0 ef 6e c0 89 d8 e8 83 44 0b 00 83 EIP: [<c0431a3f>] flush_workqueue+0x3f/0x66 SS:ESP 0068:dc705ee0 <0>Kernel panic - not syncing: Fatal exception
(In reply to comment #74) > Hmm. So I added the patch, and brew built a rhel-5 kernel. Did you see this > before? > > [root@rhel5-server-i386 ~]# ./t /mnt/testfile > Shutting down NFS mountd: [ OK ] > Shutting down NFS daemon: [ OK ] > Shutting down NFS services: [ OK ] > > nfsd: last server has exited > nfsd: unexporting all filesystems > lockd: cannot monitor 127.0.0.1 > lockd: failed to monitor 127.0.0.1 > portmap: server localhost not responding, timed out > RPC: failed to contact portmap (errno -5). > BUG: unable to handle kernel paging request at virtual address 205c1e7f 2.6.27.5-117.fc10.x86_64 seems ok.
tested with kernel-2.6.9-78.0.1.2.bz456282.EL.i686.rpm (patch from comment #69), was not able to crash. Plenty of those: do_setlk: VFS is out of sync with lock manager! br, Hari
Created attachment 326611 [details] revised patch
tested with kernel-2.6.9-78.0.1.3.bz456282.EL.i686.rpm (patch from comment #79) and the original reproducer, was not able to crash. I did not get any "out of sync" messages. br, Hari
Can someone please repost the tester code from comment #5 no perms to view it. I am seeing this crash in 2.6.9-42.0.2 rhel4u4 randomly while running a custom application that dont know enough about what it is doing to comment on. I have applied the revised patch from comment #79 and the issue persists. I would like to be able to replicate with a set of test code since will not be able to determine what our application is doing when it crashes. Does this patch depend on any previous patches being applied - as i had a hard time issuing the patch since locks.c was already being modified by a previous patch in 2.6.9-42src tree? We also have 2.6.9-67 that i can try but looking at the src it is relatively the same code. Please advise if any insight. Thanks
The code is in comment #5 and comment #13. Steps to reproduce are listed in comment #7. None of these are private and I've checked the permissions are correct on the attachments themselves.
Thanks, I canNOT get the server to crash with the test code from comment #5/#13 on UNpatched kernel versions 2.6.9-42.02/-67/and -78. I CAN get to crash using comment #5 lockit and comment #67 t.c code. The patched verion of 2.6.9-42.0.2 seems to ride through both test cases. However I have had an operational crash with the same bug on this patched kernel so I am still try to understand why, maybe locking files differently with application code.
There are several other problems in the locking code in older kernels that can lead to a panic with the same BUG message and backtrace, it's quite possible that you are hitting one of those.
(In reply to comment #87) > Can someone please repost the tester code from comment #5 no perms to view it. > I am seeing this crash in 2.6.9-42.0.2 rhel4u4 randomly while running a custom > application that dont know enough about what it is doing to comment on. I have > applied the revised patch from comment #79 and the issue persists. I would > like to be able to replicate with a set of test code since will not be able to > determine what our application is doing when it crashes. Does this patch > depend on any previous patches being applied - as i had a hard time issuing the > patch since locks.c was already being modified by a previous patch in > 2.6.9-42src tree? We also have 2.6.9-67 that i can try but looking at the src > it is relatively the same code. Please advise if any insight. Thanks The crash signature in this situation is unfortunately somewhat "generic". There are a lot of potential root causes for this problem but the panic will look the same. We've fixed several of these sorts of problems in earlier kernels. This BZ has been opened specifically to address the problem/reproducer that Bryn and Uli described in comment 67. If you're still seeing panics that look like this, my suggestion would be to try and reproduce them on as recent a kernel as possible. Even better, if you have a non-critical place to do so would be to test the kernels on my people.redhat.com page: http://people.redhat.com/jlayton/ They have patches that are slated for RHEL4.8. If you can still reproduce this panic with those, then please open another BZ. It's probable that the underlying cause is different than the one reported here.
I appear to be hitting this issue in RHEL5, is there a clone of this bug open for RHEL5?
(In reply to comment #94) > I appear to be hitting this issue in RHEL5, is there a clone of this bug open > for RHEL5? Yes, but the bugs are for tracking purposes. Please use this bug for reporting the problem you experienced. Thanks.
*** Bug 491695 has been marked as a duplicate of this bug. ***
This issue has been addressed in following products: MRG for RHEL-5 Via RHSA-2009:0451 https://rhn.redhat.com/errata/RHSA-2009-0451.html
I am running into this issue on 5.2. I am getting terrible performance on an NFS mount because of the problem. Will this be fixed prior to 5.4, or will I have to wait until 5.4 for an official kernel package?
Hello Ryan, Are you sure that this is the problem you are experiencing? This problem does not result in performance degradation per-se. When it triggers it causes the NFS client to panic with a BUG message (I guess though that if it is causing your nodes to reboot frequently that's going to cause poor perceived performance overall). Due to the security concerns and wide impact of this bug it is planned to be addressed via an async errata release for all supported RHEL products where this problem exists. A fix for this issue has been proposed and is now undergoing quality assurance and verification prior to final release. If you have active Red Hat Enterprise Linux support entitlements, please file a case with your Red Hat Global Support Services contacts who will be able to assist you and advise on the availability of fixed packages for affected releases. Regards,
This issue does not trigger node reboots, the reduced application performance is from reduced NFS performance. We have empirically measured NFS performance to be low, specifically in the area of lock acquisition. Processes are blocking for significantly longer waiting on locks when this condition triggers, specifically the one (1) lock that the error condition is being triggered on. We're using a Kernel we built with the patch, there's simply no other option. We couldn't wait for a fix. With 100% assurance I can say that this patch fixed our performance problems with multiple processes synchronizing on an NFS file lock.
(In reply to comment #117) > This issue does not trigger node reboots, the reduced application performance > is from reduced NFS performance. We have empirically measured NFS performance > to be low, specifically in the area of lock acquisition. Processes are > blocking for significantly longer waiting on locks when this condition > triggers, specifically the one (1) lock that the error condition is being > triggered on. > > We're using a Kernel we built with the patch, there's simply no other option. > We couldn't wait for a fix. With 100% assurance I can say that this patch > fixed our performance problems with multiple processes synchronizing on an NFS > file lock. This comment is a bit out of context, I thought I was replying to a different bug (https://bugzilla.redhat.com/show_bug.cgi?id=491695) which happened to be marked as a duplicate of this bug. I got confused, I only had 1 bug open and thought I was replying to that one so while it's the same issue the response is a bit out of context. I was only referring to my setup/tests in which I produced the other bug which has the same code solution in the kernel: ie Bug #491695 In my setup, application performance is affected by a dramatic decline in locking performance on NFS. Not just locking performance, locking correctness I should add. strace will show a return that a process has successfully acquired a lock, yet it will still be stuck in the fcntl64() call somehow...
Interesting - that's completely believable. We do have reports of multiple different NFS problems that are addressed by the patch for this problem (although this bugzilla is specific to client-side crashes triggered by the BUG() macro in locks_remove_flock). This is the first report we have of a significant performance degradation related to this change (e.g. we have reports of inconsistent lock status between the VFS and NLM leading to locks that can never be released).
Yeah, sorry to muddy the waters. My problem is the same as J. Ryan's (Bug #491695), which was marked as a duplicate of this bug. I'm lucky in the fact that I can work around it by using "nolock", which does fix the problem for me. I don't need locking on *these* clients, because they are just compute nodes that work on different files for each node.
This issue has been addressed in following products: Red Hat Enterprise Linux 4 Via RHSA-2009:0459 https://rhn.redhat.com/errata/RHSA-2009-0459.html
This issue has been addressed in following products: Red Hat Enterprise Linux 5 Via RHSA-2009:0473 https://rhn.redhat.com/errata/RHSA-2009-0473.html
*** Bug 499089 has been marked as a duplicate of this bug. ***