This was originally thought to be a case of BZ526612 but the problem still persisted after upgrading to a kernel with that fix. It appears that the kernel function shrink_dcache_sb() has the same problem that was fixed in prune_dcache() with kernel 2.6.18-174. The two functions do very similar things but are slightly different so they can't be merged together so we'll need another fix. From the customer: Just saw the RHSA-2009-1670 announcement that included bugzilla 533822 and 537019. However I went back and looked -- our system running the 2.6.18-174.el5 test kernel is showing soft lockups (apparently in dcache-related areas): BUG: soft lockup - CPU#3 stuck for 10s! [pcscd:2774] CPU 3: Modules linked in: nfs fscache nfs_acl autofs4 lockd sunrpc dm_mirror dm_multipa th scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi ac pi_memhotplug ac parport_pc lp parport sr_mod cdrom joydev usb_storage i5000_eda c bnx2 sg edac_mc serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 2774, comm: pcscd Not tainted 2.6.18-174.el5 #1 RIP: 0010:[<ffffffff80064bcc>] [<ffffffff80064bcc>] .text.lock.spinlock+0x2/0x30 RSP: 0018:ffff81042a46de20 EFLAGS: 00000286 RAX: ffff81042e422400 RBX: ffffffff80315b80 RCX: 0000000000000000 RDX: ffff8100ba01d12c RSI: ffffffff802a6add RDI: ffffffff803ec480 RBP: ffff81042e423910 R08: 0000000000000000 R09: ffffff9c2a46de58 R10: ffff81042a46dd78 R11: 0000000000000048 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 FS: 000000004193b940(0063) GS:ffff81010eb9e6c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaacc4000 CR3: 000000042c502000 CR4: 00000000000006e0 Call Trace: [<ffffffff800227e5>] d_alloc+0x166/0x1a9 [<ffffffff800f12b2>] dcache_dir_open+0x0/0x26 [<ffffffff800f12c6>] dcache_dir_open+0x14/0x26 [<ffffffff8001e9eb>] __dentry_open+0xd9/0x1dc [<ffffffff800274b4>] do_filp_open+0x2a/0x38 [<ffffffff80063cd5>] do_nanosleep+0x47/0x70 [<ffffffff80019db6>] do_sys_open+0x44/0xbe [<ffffffff8005d28d>] tracesys+0xd5/0xe0 BUG: soft lockup - CPU#5 stuck for 10s! [sendmail:2932] CPU 5: Modules linked in: nfs fscache nfs_acl autofs4 lockd sunrpc dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sr_mod cdrom joydev usb_storage i5000_edac bnx2 sg edac_mc serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm ehci_hcd Pid: 2932, comm: sendmail Not tainted 2.6.18-174.el5 #1 0 RSP: 0018:ffff8104271fbc90 EFLAGS: 00000286 RAX: ffff81010eb38400 RBX: ffff8104271fbd28 RCX: 0000000000000000 RDX: ffff810403896eac RSI: ffff81042136800d RDI: ffffffff803ec480 RBP: 0000000000000106 R08: ffff8104271fbc38 R09: 0000000000000000 R10: ffff8104271fbc98 R11: 0000000000000048 R12: 0000000000000000 R13: 000000000000002f R14: 0000000000000007 R15: ffff8104271fbc4c FS: 00002b2c98294bd0(0000) GS:ffff81042fc21540(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b497681d4f8 CR3: 0000000427776000 CR4: 00000000000006e0 Call Trace: [<ffffffff800227e5>] d_alloc+0x166/0x1a9 [<ffffffff8000cee6>] do_lookup+0xc0/0x1e6 [<ffffffff8000a22b>] __link_path_walk+0xa01/0xf42 [<ffffffff8000e9d6>] link_path_walk+0x42/0xb2 [<ffffffff8000cca6>] do_path_lookup+0x275/0x2f1 [<ffffffff80129178>] selinux_file_alloc_security+0x2a/0x54 [<ffffffff80023684>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8001ae77>] open_namei+0x73/0x6d5 [<ffffffff800274a6>] do_filp_open+0x1c/0x38 [<ffffffff80019db6>] do_sys_open+0x44/0xbe [<ffffffff8005d28d>] tracesys+0xd5/0xe0 BUG: soft lockup - CPU#7 stuck for 10s! [check_avmon:7426] CPU 7: Modules linked in: nfs fscache nfs_acl autofs4 lockd sunrpc dm_mirror dm_multipa th scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi ac pi_memhotplug ac parport_pc lp parport sr_mod cdrom joydev usb_storage i5000_eda c bnx2 sg edac_mc serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm _mod dm_mem_cache shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 7426, comm: check_avmon Not tainted 2.6.18-174.el5 #1 RIP: 0010:[<ffffffff80064bcc>] [<ffffffff80064bcc>] .text.lock.spinlock+0x2/0x3 0 RSP: 0018:ffff8103fb8addf0 EFLAGS: 00000286 RAX: 0000000000000000 RBX: ffffffff803ec480 RCX: 0000000000000001 RDX: 00007fff42e4a0e0 RSI: ffffffff803ec480 RDI: ffffffff803ec480 RBP: ffff8104151e1110 R08: ffff8103fb8adcb8 R09: 0000000000000000 R10: ffff8103fb8add98 R11: 0000000000000048 R12: ffff8104151e0810 R13: ffff81042f36f780 R14: ffff810346ce7000 R15: 000000157a914da0 FS: 00002b9c3c15ff90(0000) GS:ffff81042fcab3c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000033dda99bc0 CR3: 0000000333b3e000 CR4: 00000000000006e0 Call Trace: [<ffffffff8000c5f2>] _atomic_dec_and_lock+0x39/0x57 [<ffffffff8000d3ae>] dput+0x2c/0x114 [<ffffffff80022ca2>] path_release+0xc/0x24 [<ffffffff8003f372>] vfs_lstat_fd+0x39/0x47 [<ffffffff8002ab0d>] sys_newlstat+0x19/0x31 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 BUG: soft lockup - CPU#0 stuck for 10s! [mount.nfs:7951] CPU 0: Modules linked in: nfs fscache nfs_acl autofs4 lockd sunrpc dm_mirror dm_multip th scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi a pi_memhotplug ac parport_pc lp parport sr_mod cdrom joydev usb_storage i5000_ed c bnx2 sg edac_mc serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log d _mod dm_mem_cache shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hc ehci_hcdffffffff803155e0 Pid: 7951, comm: mount.nfs Not tainted 2.6.18-174.el5 #1 RIP: 0010:[<ffffffff800ea90a>] [<ffffffff800ea90a>] shrink_dcache_sb+0x6b/0xda RSP: 0018:ffff81042413b838 EFLAGS: 00000287 RAX: 0000000000000000 RBX: ffff8102a1da6228 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff803155e0 RBP: 0000000000000041 R08: ffff81010eb2e280 R09: 0000000000000000 R10: ffff81010eb2e280 R11: 00000000000000d0 R12: 0000000000007473 R13: 6f686c61636f6c40 R14: 736f6967616e3d4c R15: 49414d454e494d44 FS: 00002af42828a6e0(0000) GS:ffffffff803c5000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fff48276000 CR3: 000000040faa9000 CR4: 00000000000006e0 Call Trace: [<ffffffff800ea8b8>] shrink_dcache_sb+0x19/0xda [<ffffffff800e274b>] do_remount_sb+0x4e/0x153 [<ffffffff800e3174>] get_sb_single+0x8d/0xb3 [<ffffffff800e3007>] vfs_kern_mount+0x93/0x11a [<ffffffff800f15b3>] simple_pin_fs+0x42/0xa3 [<ffffffff882f289d>] :sunrpc:rpc_get_mount+0x1e/0x2f [<ffffffff882e4e56>] :sunrpc:rpc_setup_pipedir+0x34/0xc5 [<ffffffff882e521f>] :sunrpc:rpc_new_client+0x1c3/0x299 [<ffffffff882e567b>] :sunrpc:rpc_create_client+0xb/0x3c [<ffffffff88359b92>] :nfs:nfs_create_rpc_client+0x11d/0x160 [<ffffffff8835a482>] :nfs:nfs_create_server+0xf9/0x4ae [<ffffffff801283ec>] avc_has_perm+0x46/0x58 [<ffffffff88361918>] :nfs:nfs_get_sb+0x1fd/0x472 [<ffffffff8012ad2a>] selinux_sb_copy_data+0x23/0x1c5 [<ffffffff800e3007>] vfs_kern_mount+0x93/0x11a [<ffffffff800e30d0>] do_kern_mount+0x36/0x4d [<ffffffff800ed3d0>] do_mount+0x6a9/0x719 [<ffffffff8002fc43>] dev_queue_xmit+0x250/0x271 [<ffffffff80032210>] ip_output+0x29a/0x2dd [<ffffffff800347af>] ip_queue_xmit+0x437/0x491 [<ffffffff800220d4>] __up_read+0x19/0x7f [<ffffffff800ec150>] copy_mount_options+0xce/0x127 [<ffffffff8009e745>] search_exception_tables+0x1d/0x2d [<ffffffff80066ca5>] do_page_fault+0x64b/0x830 [<ffffffff8003dbed>] lock_timer_base+0x1b/0x3c [<ffffffff8001cbd0>] __mod_timer+0xb0/0xbe [<ffffffff800cb6c0>] zone_statistics+0x3e/0x6d [<ffffffff8000f2c4>] __alloc_pages+0x78/0x308 [<ffffffff8004c7a5>] sys_mount+0x8a/0xcd [<ffffffff8005d28d>] tracesys+0xd5/0xe0 BUG: soft lockup - CPU#6 stuck for 10s! [problem_handler:7944] CPU 6: ehci_hcd Pid: 7944, comm: problem_handler Not tainted 2.6.18-174.el5 #1 0 RSP: 0018:ffff81036ea9dcb0 EFLAGS: 00000286 RAX: ffff810427d496c0 RBX: ffff810428ea4ac0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff81036ea9de48 RDI: ffffffff803ec480 RBP: 000000000000012d R08: ffff81036ea9dc58 R09: 0000000000000000 R10: ffff81036ea9dcb8 R11: 0000000000000048 R12: 0000000000000000 R13: 00000000000000e7 R14: 0000000000000007 R15: ffff81036ea9de88 FS: 00002ac58e5cb250(0000) GS:ffff81042fcabbc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002ea30a8 CR3: 0000000147d8e000 CR4: 00000000000006e0 Call Trace: [<ffffffff88330d0f>] :autofs4:autofs4_revalidate+0xa4/0x149 [<ffffffff8000cfb5>] do_lookup+0x18f/0x1e6 [<ffffffff80009bd0>] __link_path_walk+0x3a6/0xf42 [<ffffffff8000e9d6>] link_path_walk+0x42/0xb2 [<ffffffff8000cca6>] do_path_lookup+0x275/0x2f1 [<ffffffff800127f8>] getname+0x15b/0x1c2 [<ffffffff80023870>] __user_walk_fd+0x37/0x4c [<ffffffff80028848>] vfs_stat_fd+0x1b/0x4a [<ffffffff80066b58>] do_page_fault+0x4fe/0x830 [<ffffffff80062486>] __sched_text_start+0xf6/0xbd6 [<ffffffff800235a2>] sys_newstat+0x19/0x31 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Created attachment 385557 [details] slabinfo before-and-after soft lockup Caught another one from system still running the 2.6.18-174.el5 test kernel. Attached is a capture of slabinfo from before (trimmed it to ~10m prior, but have data going further back if needed) and after the soft lockup was detected.
There's almost 60 million entries in the dentry cache - no wonder it's having problems. I have a test kernel with version kernel-2.6.18-183.el5.it345190.1 that includes a fix for spinlock contention on the dcache lock in the shrink_dcache_sb() function. Can you give it a try? The kernel is available from here: http://people.redhat.com/~lmcilroy/bz554574/ Lachlan
Created attachment 385567 [details] Patch to fix spinlock contention in shrink_dcache_sb().
Ok, there are two fixes for this issue: 1) easy to backport but possibly not complete for huge number of entries: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=37c42524d6090644206ae6d310d7e830bd3ccb47 2) complete but large backport: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=da3bbdd4632c0171406b2677e31494afa5bde2f8 Second patch should solve the issue completely. Ccing some FS folks.
I tried pushing patch 2 into RHEL5.4 to fix BZ526612 but it got knocked back in review due to the kABI changes to the superblock structure. Patch 1 should help in most cases but there is still the risk that another shrink_dcache_sb() will race when the dcache_lock is released in prune_one_dentry() or cond_resched_lock(). In this case it wont be a few new dentries we have to skip but a whole filesystem's worth that's just been moved to the tail. The patch I attached to this BZ doesn't have that problem since it only scans the dentry_unused list once.
It'd be a bit of a hack, but we could possibly put the list in/on the ext2/ext3/ext4 private superblocks instead, with appropriate tests etc, to help resolve the issue for those filesystems...
Well, let me look at Lachlan's proposed patch again, Josef reminds me that we can probably play tricks w/ #ifdefs etc, right...
Ok, I should have reread the whole thread through again before commenting. It seems that we mostly psyched ourselves out on Lachlan's big patch, but there are probably some valid concerns that are hard to pin down, as they relate to what 3rd party filesystems -could- possibly do .... Josef, I think Lachlan's patch attached in comment #4 is a forward-port of what you did for RHEL3 - would you mind looking at it with an eye towards this problem on RHEL5? It seems to me that it should certainly help us be more efficient in shrink_dcache_sb(), at least. Do we have a decent testcase for this problem so that we can evaluate the effectiveness of the patch? Thanks, -Eric
(In reply to comment #3) > There's almost 60 million entries in the dentry cache - no wonder it's having > problems. Right, but: inode_cache 6186 6335 dentry_cache 59704364 59704488 What's going on here? Looks like there are truck loads of dentrys with no inode.
(In reply to comment #11) > (In reply to comment #3) > > There's almost 60 million entries in the dentry cache - no wonder it's having > > problems. > > Right, but: > > inode_cache 6186 6335 > dentry_cache 59704364 59704488 > > What's going on here? > Looks like there are truck loads of dentrys with no inode. Good point. I suspect these dentries have been put on the dentry_unused list after the file was deleted (and therefore unlinked from the dentry). Or maybe we have a leak.
(In reply to comment #8) > It'd be a bit of a hack, but we could possibly put the list in/on the > ext2/ext3/ext4 private superblocks instead, with appropriate tests etc, to help > resolve the issue for those filesystems... It would have to work for NFS too since this issue is primarily seen with NFS when using autofs.
(In reply to comment #10) > Ok, I should have reread the whole thread through again before commenting. It > seems that we mostly psyched ourselves out on Lachlan's big patch, but there > are probably some valid concerns that are hard to pin down, as they relate to > what 3rd party filesystems -could- possibly do .... > > Josef, I think Lachlan's patch attached in comment #4 is a forward-port of what > you did for RHEL3 - would you mind looking at it with an eye towards this > problem on RHEL5? It seems to me that it should certainly help us be more > efficient in shrink_dcache_sb(), at least. There's a patch for this in RHEL3? You might be thinking of: https://bugzilla.redhat.com/show_bug.cgi?id=413731 which got forward ported to RHEL5 in: https://bugzilla.redhat.com/show_bug.cgi?id=526612 It's a very similar problem but in a different bit of code. > > Do we have a decent testcase for this problem so that we can evaluate the > effectiveness of the patch? I haven't had any luck yet. It may be that I simply don't have enough memory to cache so many dentries or maybe we need to find a way to get multiple threads to call shrink_dcache_sb simultaneously.
(In reply to comment #12) > (In reply to comment #11) > > (In reply to comment #3) > > > There's almost 60 million entries in the dentry cache - no wonder it's having > > > problems. > > > > Right, but: > > > > inode_cache 6186 6335 > > dentry_cache 59704364 59704488 > > > > What's going on here? > > Looks like there are truck loads of dentrys with no inode. > > Good point. I suspect these dentries have been put on the dentry_unused list > after the file was deleted (and therefore unlinked from the dentry). Or maybe > we have a leak. It looks suspect, shrink_dcache_sb() -> prune_one_dentry() unconditionally calls d_free() after dentry_iput(), which is when the dentry becomes negative during the release process (I'm assuming that the rcu list is actually getting a chance to free these guys after d_free() is called). We'd need to look at all other cases but I believe negative dentrys are only possible where the dentry has a positive reference count, is not actually instantiated, and the reference count is not decremented. Do we have any chance of finding out more about these negative dentrys? A kdump perhaps or some info gathering printks in a test kernel?
(In reply to comment #14) > > > > Do we have a decent testcase for this problem so that we can evaluate the > > effectiveness of the patch? > > I haven't had any luck yet. It may be that I simply don't have enough memory > to cache so many dentries or maybe we need to find a way to get multiple > threads to call shrink_dcache_sb simultaneously. It would be useful to inspect the inode_cache and dentry_cache usage when trying to reproduce this. If we don't see anomalous values their we probably haven't actually reproduced the problem.
> Do we have any chance of finding out more about these negative > dentrys? A kdump perhaps or some info gathering printks in a > test kernel? We don't have a vmcore from this customer. I have already provided one test kernel (with the proposed fix attached to this BZ) but I don't think we've got any feedback yet. Could we get the information with a systemtap module?
(In reply to comment #17) > > Do we have any chance of finding out more about these negative > > dentrys? A kdump perhaps or some info gathering printks in a > > test kernel? > > We don't have a vmcore from this customer. I have already provided one test > kernel (with the proposed fix attached to this BZ) but I don't think we've got > any feedback yet. > > Could we get the information with a systemtap module? Don't know, I haven't really worked with systemtap. What I think would be useful to know is if these dentrys are really negative, what their d_count is and what fs they belong to, to see if they all belong to a particular fs.
(In reply to comment #17) > > We don't have a vmcore from this customer. I have already provided one test > kernel (with the proposed fix attached to this BZ) but I don't think we've got > any feedback yet. > This sounds a lot like RHEL-4 bug 501565, except it's dentrys instead of inodes. If we think that reducing the list traversal size is all that needs to be done to fix this then the proposed fix here doesn't seem quite enough. To reduce the dentry list size without still needing a full list traversal then we would need to add a per-superblock list for the unused dentrys, similar to the inode list of bug 501565. Of course that can never fly because it's a kABI breaker, as are the inode list patches for 501565.
(In reply to comment #14) > > Josef, I think Lachlan's patch attached in comment #4 is a forward-port of what > > you did for RHEL3 - would you mind looking at it with an eye towards this > > problem on RHEL5? It seems to me that it should certainly help us be more > > efficient in shrink_dcache_sb(), at least. > > There's a patch for this in RHEL3? I guess I was thinking of the fix for #413731 which is a similar flavor but not quite the same.
In re: to c#10, it's like the fix for RHEL3 (which was already forward ported to RHEL5), but it covers the other side of things, when the VM decides to clear things out. Since its a proven method of mitigating the unmount problem I don't see why it wouldn't also be an effective way of fixing this particular problem, tho I think I'd rather wait for feedback from the customer since we can't easily reproduce this problem.
> tho I think I'd rather wait for feedback from the customer since we > can't easily reproduce this problem. Present. Sorry for leaving this hanging. > I suspect these dentries have been put on the dentry_unused list after the > file was deleted (and therefore unlinked from the dentry). Or maybe > we have a leak. It seems suspect. The machine I was reproducing on has ~72k inodes in use across 3 ext3's. NFS filesystem w/ every home directory adds ~1.1m. >50m of transient usage seems fishy at best. What data are best to confirm/deny this and move forward? With or without the patched kernel from comment 3 (which I shamefully haven't tried yet)?
For what its worth, the workload from comment 0 and comment 1 is a hacked up Nagios instance that will tickle the lockup on restart. We also had a cluster of Apache instances doing mostly static file delivery off NFS with bug 526612 updates that are also still seeing this. The pattern is consistent, ~60m entries in dentry_cache without a comparable inode_cache (as a point of reference, these hosts do have about 28m inodes in mounted filesystems). $ grep '^inod\|den' /proc/slabinfo inode_cache 1677 1694 560 7 1 : tunables 54 27 8 : slabdata 242 242 0 dentry_cache 65085024 65096424 216 18 1 : tunables 120 60 8 : slabdata 3616468 3616468 0 $ $ grep '^inod\|den' /proc/slabinfo inode_cache 1689 1708 560 7 1 : tunables 54 27 8 : slabdata 244 244 0 dentry_cache 62927063 63320094 216 18 1 : tunables 120 60 8 : slabdata 3517783 3517783 0 $ $ grep '^inod\|den' /proc/slabinfo inode_cache 1694 1694 560 7 1 : tunables 54 27 8 : slabdata 242 242 0 dentry_cache 59596668 59596668 216 18 1 : tunables 120 60 8 : slabdata 3310926 3310926 0 $
(In reply to comment #23) > For what its worth, the workload from comment 0 and comment 1 is a hacked up > Nagios instance that will tickle the lockup on restart. > > We also had a cluster of Apache instances doing mostly static file delivery off > NFS with bug 526612 updates that are also still seeing this. The pattern is > consistent, ~60m entries in dentry_cache without a comparable inode_cache (as a > point of reference, these hosts do have about 28m inodes in mounted > filesystems). > > $ grep '^inod\|den' /proc/slabinfo > inode_cache 1677 1694 560 7 1 : tunables 54 27 8 : > slabdata 242 242 0 > dentry_cache 65085024 65096424 216 18 1 : tunables 120 60 8 > : slabdata 3616468 3616468 0 > $ There has to be something broken here and it sounds like your saying the NFS client may be candidate but we still need to confirm that. I think that at this stage we are still trying to narrow down potential places to look for the problem. This bug is fairly recent so there must have been a point in the past when this wasn't happening or other machines with older kernels not showing the problem? If this is something that happens after a dentry has been made positive and is now on the way to being destroyed then there are a fairly small number of places to look in the VFS. It could be a filesystem causing it but that would be more likely to result in an oops at some point before the cache grows this large. It is more likely dentrys are not being instantiated upon creation and the reference count is not being decremented for some reason. A missing dput() in a patch could cause this. There are a couple of things we need to do. Look at the kernel version history to see if we can identify a time where this started to happen. Find out what we have in the dentry cache, check if the dentrys are really negative, what their d_count is, if there is a particular fs they belong to and if they are hashed or not. Probably the best way to do this is to reproduce on a test machine and get a kdump so we can have a look.
(In reply to comment #24) > > > > $ grep '^inod\|den' /proc/slabinfo > > inode_cache 1677 1694 560 7 1 : tunables 54 27 8 : > > slabdata 242 242 0 > > dentry_cache 65085024 65096424 216 18 1 : tunables 120 60 8 > > : slabdata 3616468 3616468 0 > > $ Oh ... hang on, maybe I'm talking nonsense. What are the values in /proc/sys/fs/dentry-state and /proc/sys/fs/inode-state?
> What are the values in /proc/sys/fs/dentry-state [...] $ cat /proc/sys/fs/{inode,dentry}-state 44763 12 0 0 0 0 0 64803227 64786025 45 0 0 0 $ grep '^inod\|den' /proc/slabinfo inode_cache 1745 1785 560 7 1 : tunables 54 27 8 : slabdata 255 255 0 dentry_cache 64799130 64845090 216 18 1 : tunables 120 60 8 : slabdata 3602505 3602505 0 $ > This bug is fairly recent so there must have been a point in the past when this > wasn't happening or other machines with older kernels not showing the problem? These are actually complete workload groups -- there's the one loner and the 3 covered in comment 23 is all of them. These just started getting load ~8mo ago and have seen the lockups since. (Most machines w/ similar workloads in our environment are still RHEL4, which might explain the latent discovery. > it sounds like your saying the NFS client may be candidate Only because its the only file system getting meaningful activity (...and since you're the one asking, Ian, its all autofs'y as well).
(In reply to comment #26) > > What are the values in /proc/sys/fs/dentry-state [...] > > $ cat /proc/sys/fs/{inode,dentry}-state > 44763 12 0 0 0 0 0 > 64803227 64786025 45 0 0 0 Phew, we're still in business. > $ grep '^inod\|den' /proc/slabinfo > inode_cache 1745 1785 560 7 1 : tunables 54 27 8 : > slabdata 255 255 0 > dentry_cache 64799130 64845090 216 18 1 : tunables 120 60 8 > : slabdata 3602505 3602505 0 > $ > > > This bug is fairly recent so there must have been a point in the past when this > > wasn't happening or other machines with older kernels not showing the problem? > > These are actually complete workload groups -- there's the one loner and the 3 > covered in comment 23 is all of them. These just started getting load ~8mo ago > and have seen the lockups since. (Most machines w/ similar workloads in our > environment are still RHEL4, which might explain the latent discovery. Bummer. > > > it sounds like your saying the NFS client may be candidate > > Only because its the only file system getting meaningful activity (...and since > you're the one asking, Ian, its all autofs'y as well). Hahaha, yep it could be autofs and if we can verify that then that's fine and we should be able to fix it. In the kernel you're using, in fact since kernel rev 106, autofs dentrys exist and are negative only during directory creation (between the kernel lookup and following kernel mkdir), they never go negative after that until they are released, assuming the VFS isn't doing something it isn't supposed to. So that narrows down the code to check quite a lot. I'm fairly confident it isn't autofs but I'll certainly have a look.
I see FS guys are paying attention to this. Reassigning to default.
Jiri, Lachlan has been working on the fix for this & keeping the others in the FS team involved as needed. Moving it back to him, thanks!
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
We don't have enough information to make progress here. Please reopen if this continues to be an issue and we can try to gather fresh data. Thanks!