Bug 141757 (IT_56112)
Summary: | Infinite loop when syncing over automounted NFS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Bastien Nocera <bnocera> | ||||||||
Component: | kernel | Assignee: | Steve Dickson <steved> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 3.0 | CC: | hgarcia, hooft, kanderso, peterm, petrides, riel, sct, tao | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2005-05-18 13:28:48 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 132991 | ||||||||||
Attachments: |
|
Description
Bastien Nocera
2004-12-03 16:19:47 UTC
The modified wait_on_locked() and sync_inodes_sb(): static void wait_on_locked(struct list_head *head) { printk("wait_on_locked begins\n"); struct list_head * tmp; while ((tmp = head->prev) != head) { struct inode *inode = list_entry(tmp, struct inode, i_list); printk("i_ino: %lu\n", inode->i_ino); printk("struct inode address: %p\n", (void *)inode); printk("head address: %p\n", (void *)head); printk("tmp address: %p\n", (void *)tmp); printk("head->prev address: %p\n", (void *)(head->prev)); __iget(inode); spin_unlock(&inode_lock); __wait_on_inode(inode); iput(inode); spin_lock(&inode_lock); } printk("wait_on_locked ends\n"); } the above function is originally inline in the kernel source and it is called by: void sync_inodes_sb(struct super_block *sb) { spin_lock(&inode_lock); while (!list_empty(&sb->s_dirty)||!list_empty(&sb->s_locked_inodes)) { sync_list(&sb->s_dirty); wait_on_locked(&sb->s_locked_inodes); } spin_unlock(&inode_lock); } The output when reproducing the problem: head address: ce03f06c tmp address: cd6a1788 head->prev address: cd6a1788 i_ino: 213089 struct inode address: cd6a1780 head address: ce03f06c tmp address: cd6a1788 head->prev address: cd6a1788 i_ino: 213089 struct inode address: cd6a1780 head address: ce03f06c tmp address: cd6a1788 head->prev address: cd6a1788 i_ino: 213089 struct inode address: cd6a1780 head address: ce03f06c tmp address: cd6a1788 head->prev address: cd6a1788 i_ino: 213089 struct inode address: cd6a1780 head address: ce03f06c tmp address: cd6a1788 head->prev address: cd6a1788 etc. Created attachment 107839 [details]
script.pl
Reproducer script
Created attachment 107840 [details]
altsysrq.txt
Alt+SysRq+T of the "hang"
It's not an autofs loopback mount, the NFS mounts are automounted on the clients from the NFS server. It seems to me that this can be reproduced without the automounter, yes? automount only seems to trigger the hang quicker, but with a normal NFS mount, the problem still happens after a couple of sync()'s. The NFS options triggering the bug were: acregmin=1,acregmax=1 Any low values of acregmax would trigger the hang. Steve, please let me know if this is normal. If it is, the bug can be closed. It looks like, after more thorough testing, the default values for acregmin and acregmax don't fix the issue. Ok... Would it possible to get the raw output of the dump? Meaning either "tethereal -w /tmp/ethdump.pcap" or tcpdump -o /tmp/tcpdump.pcap. Having the raw data makes it easier to sort out the noise.... Also You might want to bzip2 any dumps since it makes it easier to download... Created attachment 108552 [details]
Proposed Patch
The sync process loops in wait_on_locked(), when called from
sync_inodes_sb(), since the "broken" inode can not be cleared
from the locked inode list.
This patch sets the NFS_INO_STALE bit in write path (via
nfs_writeback_done) which breaks the inode is early enough to
stop it from being added to the that list.
A fix for this problem has just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.7.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html |