Bug 495861
Summary: | A deadlock can occur between kjournald and drop_pagecache | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Harshula Jayasuriya <harshula> | ||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.8 | CC: | dhoward, jolsa, ktokunag, mgahagan, moshiro, sghosh, tao, vmayatsk | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | 4.9 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-02-08 14:31:34 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 505649 | ||||||
Attachments: |
|
Description
Harshula Jayasuriya
2009-04-15 08:04:55 UTC
I can reproduce the hang in RHEL4-U8 without this patch by placing artificial stalls in invalidate_all_mapping_pages(). This happens when kjournald gets to acquire the journal->j_list_lock and then spins on the global inode_lock and once invalidate_all_mapping_pages continues it later spins on the journal->j_list_lock, deadlocking the system. With this patch applied I can no longer get that deadlock becauce we drop the inode_lock before calling invalidate_all_mapping_pages(). --- linux-2.6.9/fs/inode.c.orig 2009-04-16 11:43:56.000000000 -0400 +++ linux-2.6.9/fs/inode.c 2009-04-16 12:00:25.000000000 -0400 @@ -314,9 +314,13 @@ static int invalidate_list(struct list_h inode = list_entry(tmp, struct inode, i_list); if (inode->i_sb != sb) continue; - if (pages) + if (pages) { + __iget(inode); + spin_unlock(&inode_lock); invalidate_all_mapping_pages(inode->i_mapping); - else { + iput(inode); + spin_lock(&inode_lock); + } else { invalidate_inode_buffers(inode); if (!atomic_read(&inode->i_count)) { hlist_del_init(&inode->i_hash); Hi, We need to hold a reference to the inode till we can safely get the next inode. So the patch should look something like: ------------------------------------------------------------------- --- fs/inode.c.orig 2009-04-14 15:18:22.000000000 +1000 +++ fs/inode.c 2009-04-21 22:04:28.000000000 +1000 @@ -301,12 +301,13 @@ static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose, int pages) { struct list_head *next; + struct inode * inode; + struct inode * toput_inode = NULL; int busy = 0, count = 0; next = head->next; for (;;) { struct list_head * tmp = next; - struct inode * inode; next = next->next; if (tmp == head) @@ -314,9 +315,14 @@ inode = list_entry(tmp, struct inode, i_list); if (inode->i_sb != sb) continue; - if (pages) + if (pages) { + __iget(inode); + spin_unlock(&inode_lock); invalidate_all_mapping_pages(inode->i_mapping); - else { + iput(toput_inode); + toput_inode = inode; + spin_lock(&inode_lock); + } else { invalidate_inode_buffers(inode); if (!atomic_read(&inode->i_count)) { hlist_del_init(&inode->i_hash); @@ -328,6 +334,7 @@ } busy = 1; } + iput(toput_inode); /* only unused inodes may be cached with i_count zero */ inodes_stat.nr_unused -= count; return busy; ------------------------------------------------------------------- NOTE: this patch has not been tested nor compiled. In my patch above, the last iput(toput_inode) is called whilst holding inode_lock. However, iput() needs to be called while inode_lock is *not* held. Unfortunately the caller of invalidate_list() first grabs the inode_lock and then drops it after invalidate_list() calls return. One option is to push the initial inode_lock grab and the final drop down into invalidate_list(). Then the iput(toput_inode) can be called immediately after inode_lock is dropped: invalidate_list(): ... spin_lock(&inode_lock); loop-start ... loop-finish ... spin_unlock(&inode_lock); iput(toput_inode); ... Created attachment 345792 [details]
Patch that fixed this problem
The attached patch fixed this problem:
Committed in 89.2.EL Event posted on 2009-06-10 18:24 EST by hjayasur Hi, I haven't looked at the core dump, but the log output is quite clear. The panic is due to the inode's i_state being I_CLEAR: ----------------------------------------------------------- 1136 void iput(struct inode *inode) 1137 { 1138 if (inode) { 1139 struct super_operations *op = inode->i_sb->s_op; 1140 1141 if (inode->i_state == I_CLEAR) => 1142 BUG(); ----------------------------------------------------------- invalidate_list() walks the inode list and will do an iput() without skipping inodes with an i_state of I_CLEAR: ----------------------------------------------------------- 301 static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose, int pages) 302 { 303 struct list_head *next; 304 int busy = 0, count = 0; 305 struct inode * inode; 306 struct inode * toput_inode = NULL; 307 308 next = head->next; 309 for (;;) { 310 struct list_head * tmp = next; 311 312 next = next->next; 313 if (tmp == head) 314 break; 315 inode = list_entry(tmp, struct inode, i_list); 316 if (inode->i_sb != sb) 317 continue; 318 if (pages) { 319 __iget(inode); 320 spin_unlock(&inode_lock); 321 invalidate_all_mapping_pages(inode->i_mapping); => 322 iput(toput_inode); ----------------------------------------------------------- RHEL 5.3 and TOT skip inodes with an i_state of I_CLEAR. Regards, Harshula This event sent from IssueTracker by hjayasur issue 282073 Event posted on 2009-06-10 18:31 EST by hjayasur Correction, RHEL 5.3 contains this bug (bz 500164), but it is fixed in kernel-2.6.18-152.el5 (RHEL 5.4). This event sent from IssueTracker by hjayasur issue 282073 Agreed, we need to skip the inodes that the i_state is I_CLEAR. I can not reproduce this problem, can you try out this patch and see if the probme still occurs??? Thanks, Larry Woodman ------------------------------------------------------------------------------ --- linux-2.6.9/fs/inode.c.orig 2009-06-11 14:54:05.000000000 -0400 +++ linux-2.6.9/fs/inode.c 2009-06-11 14:54:10.000000000 -0400 @@ -313,7 +313,7 @@ static int invalidate_list(struct list_h if (tmp == head) break; inode = list_entry(tmp, struct inode, i_list); - if (inode->i_sb != sb) + if (inode->i_sb != sb || inode->i_state == I_CLEAR) continue; if (pages) { __iget(inode); ------------------------------------------------------------------------------ Actually, we should emulate the upstream code and skip indoes that are I_FREEING or I_CLEAR or I_WILL_FREE. I'll build a kernel with this patch and place it on my people page so you can test it, OK? Larry Woodman ------------------------------------------------------------------------- --- linux-2.6.9/fs/inode.c.orig 2009-06-11 14:54:05.000000000 -0400 +++ linux-2.6.9/fs/inode.c 2009-06-11 15:01:53.000000000 -0400 @@ -313,7 +313,7 @@ static int invalidate_list(struct list_h if (tmp == head) break; inode = list_entry(tmp, struct inode, i_list); - if (inode->i_sb != sb) + if (inode->i_sb != sb || inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) continue; if (pages) { __iget(inode); -------------------------------------------------------------------------
Can someone try the kernel located here:
>>>http://people.redhat.com/~lwoodman/RHEL4/
Larry Woodman
Hi Larry, This new RHEL 4 crash the customer has seen is more related to RHEL 5 bug 500164. I have requested that they open a new bug for this new crash. Can we please use this bug only for the original bug? Regards, Harshula I made another quick change based on BZ500164 and rebuilt the kernel. Everything needed is located in http://people.redhat.com/~lwoodman/RHEL4/ Larry Woodman Sorry, I forgot to show you the patch: ----------------------------------------------------------------------- --- linux-2.6.9/fs/inode.c.orig 2009-06-11 14:54:05.000000000 -0400 +++ linux-2.6.9/fs/inode.c 2009-06-12 10:09:23.000000000 -0400 @@ -316,6 +316,8 @@ static int invalidate_list(struct list_h if (inode->i_sb != sb) continue; if (pages) { + if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE|I_NEW)) + continue; __iget(inode); spin_unlock(&inode_lock); invalidate_all_mapping_pages(inode->i_mapping); ------------------------------------------------------------------------ Ignore my Comment #15 regarding using a separate bug, we are only using this bug. Hi, Since we can't backport the patch safely, feel free to close/wontfix this bug. Regards, Harshula Cant safely fix this BUG. I backported the dropcaches functionality into RHEL4 but after several attemts gave up trying to get it totally safe from deadlocks with kjournald. This would require additional fields in the inode struct and therefore break the RHEL4 kABI. Larry Woodman |