Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 625749

Summary:

Machine hangs with a large number of processes in prune_dcache(). dentry_stat->nr_unused < 0.

Product:

Red Hat Enterprise Linux 5

Reporter:

Sachin Prabhu <sprabhu>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED DUPLICATE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

high

Docs Contact:

Priority:

high

Version:

5.5

CC:

aquini, dhowells, dwu, jbacik, jlayton, lmcilroy, rwheeler, shubham.git

Target Milestone:

Keywords:

Regression

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-08-23 13:28:29 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Patch to fix nr_unused counting. Patch by lmcilroy.	none

Description Sachin Prabhu 2010-08-20 10:38:25 UTC

* This seems to be a regression caused due to the patch applied to fix problem in bz 526612

A user reported a machine hang cause with kswapd running at 100%.

crash> bt 561
PID: 561    TASK: ffff81087fff57a0  CPU: 5   COMMAND: "kswapd0"
#0 [ffff81011de29f20] crash_nmi_callback at ffffffff8007af27
#1 [ffff81011de29f40] do_nmi at ffffffff8006588a
#2 [ffff81011de29f50] nmi at ffffffff80064eef
   [exception RIP: .text.lock.spinlock+5]
   RIP: ffffffff80064bff  RSP: ffff81087fac9d58  RFLAGS: 00000202
   RAX: 00000000ffffffff  RBX: ffff81011dc6be00  RCX: 0000000000000064
   RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffffffff803f1480
   RBP: 00000000ffd1bab0   R8: ffff81087fac9d58   R9: 0101010101010101
   R10: 0000000000000000  R11: 00000000ffd1bab0  R12: 0000000000000000
   R13: 0000000000000080  R14: 0000000000000000  R15: 0000000000000080
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#3 [ffff81087fac9d58] .text.lock.spinlock at ffffffff80064bff (via _spin_lock)
#4 [ffff81087fac9d58] prune_dcache at ffffffff8002e73d
#5 [ffff81087fac9d88] shrink_dcache_memory at ffffffff800eb979
#6 [ffff81087fac9d98] shrink_slab at ffffffff8003f462
#7 [ffff81087fac9dd8] kswapd at ffffffff80057e80
#8 [ffff81087fac9ee8] kthread at ffffffff80032894
#9 [ffff81087fac9f48] kernel_thread at ffffffff8005dfb1

The kswapd process was waiting for the dcache_lock spinlock. 

Along with the above kswapd process, 109 other processes were waiting for the same dcache_lock called from prune_dcache using cond_resched_lock.

crash> bt 904
PID: 904    TASK: ffff8104be6920c0  CPU: 4   COMMAND: "yum-updatesd-he"
#0 [ffff8102fa15db18] schedule at ffffffff80062f96
#1 [ffff8102fa15dbf0] __cond_resched at ffffffff8008e8e5
#2 [ffff8102fa15dc00] cond_resched_lock at ffffffff8003b2cd
#3 [ffff8102fa15dc20] prune_dcache at ffffffff8002e74e
#4 [ffff8102fa15dc50] shrink_dcache_memory at ffffffff800eb979
#5 [ffff8102fa15dc60] shrink_slab at ffffffff8003f462
#6 [ffff8102fa15dca0] try_to_free_pages at ffffffff800cbad1
#7 [ffff8102fa15dd30] __alloc_pages at ffffffff8000f450
#8 [ffff8102fa15dda0] __handle_mm_fault at ffffffff80008e44
#9 [ffff8102fa15de60] do_page_fault at ffffffff80066b55
#10 [ffff8102fa15df50] error_exit at ffffffff8005dde9
   RIP: 00002b2c11c08d9f  RSP: 00007fffd4a229b0  RFLAGS: 00010206
   RAX: 000000000000001b  RBX: 000000000000002c  RCX: 0000000000000037
   RDX: 0000000000000000  RSI: 000000001fdb4000  RDI: 000000000000002c
   RBP: 0000000000000005   R8: fefefefefefefeff   R9: 0000000000000000
   R10: 0000000000000000  R11: 000000000000000a  R12: 000000000000000a
   R13: 00002b2c11eb41c0  R14: 000000001fbd3f08  R15: 0000000000000000
   ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b


The trigger appears to be a negative dentry_stat->nr_unused which tracks the total number of dentries in the dentry_unused list linked to the dentry struct through dentry->d_lru.

crash> p dentry_stat
dentry_stat = $7 = {
 nr_dentry = 4037,
 nr_unused = -3032498,
 age_limit = 45,
 want_pages = 0,
 dummy = {0, 0}
}

The negative value of nr_unused is used to calculate the unsigned variable total_scan in shrink_slab(). total_scan is used in the while condition to decide the number of times the shrinker for that cache is called. The negative nr_unused variable results in a large value for total_scan and as a result shrink_dcache_memory()->prune_dcache() is called a large number of times.

The last patch which made changes to the way the dentry_stat counter is used was made in bz 526612. We passed this report to Lachlan, the author of the patch for bz 526612 who confirmed that this is due to the same patch

--
I think I see what's going on here - the accounting for nr_unused is getting out of whack.  Most of the code in dcache.c assumes that the dentry->lru list pointers (when in use) always refer to the dentry_unused list and therefore do nr_unused++/-- when adding or removing a dentry from the list.  The new changes I added overload the dentry->lru pointers to put the dentry on a private list which doesn't use the nr_unused++/-- accounting.

It's possible for a dentry to be hashed into the filesystem and be on either of these lists.  When select_parent() traverses all the dentries hashed into the filesystem and finds one on an lru list it assumes it is the dentry_unused list and removes it and does nr_unused--.  But due to the cond_resched_lock(dcache_lock) in prune_dcache() it's possible to expose a dentry that's on a private lru list to select_parent().  It removes it from the list and does an erroneous nr_unused--.

I can fix this easily by doing nr_unused accounting for the private list too.  I'll work up a patch.
--

We are currently testing this patch in a test kernel given to the user.

Comment 2 Sachin Prabhu 2010-08-20 10:57:52 UTC

Created attachment 439909 [details]
Patch to fix nr_unused counting. Patch by lmcilroy.

--
I think I see what's going on here - the accounting for nr_unused is getting out of whack.  Most of the code in dcache.c assumes that the dentry->lru list pointers (when in use) always refer to the dentry_unused list and therefore do nr_unused++/-- when adding or removing a dentry from the list.  The new changes I added overload the dentry->lru pointers to put the dentry on a private list which doesn't use the nr_unused++/-- accounting.

It's possible for a dentry to be hashed into the filesystem and be on either of these lists.  When select_parent() traverses all the dentries hashed into the filesystem and finds one on an lru list it assumes it is the dentry_unused list and removes it and does nr_unused--.  But due to the cond_resched_lock(dcache_lock) in prune_dcache() it's possible to expose a
dentry that's on a private lru list to select_parent().  It removes it from the
list and does an erroneous nr_unused--.

I can fix this easily by doing nr_unused accounting for the private list too. 
I'll work up a patch.
--