Bug 444961 - softlockup when repeatedly dropping caches
Summary: softlockup when repeatedly dropping caches
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Bryn M. Reeves
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 391501 KernelPrio5.3
TreeView+ depends on / blocked
 
Reported: 2008-05-02 11:48 UTC by Bryn M. Reeves
Modified: 2018-10-20 01:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 20:08:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed upstream patch to drop inode_lock across __invalidate_mapping_pages (724 bytes, patch)
2008-05-02 11:48 UTC, Bryn M. Reeves
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Bryn M. Reeves 2008-05-02 11:48:37 UTC
Description of problem:
When repeatedly dropping caches while a lot of file system activity is taking
place the system will eventually start producing softlockup warnings and
everything starts to contend for the inode_lock.

This was reported on lkml (while running XFS stress tests):

http://lkml.org/lkml/2008/3/18/150

This can also be reproduced, e.g. running a kernel build over NFS and dropping
the page cache in a shell loop.

Version-Release number of selected component (if applicable):
kernel-2.6.18-89.el5.x86_64.rpm

How reproducible:
100%

Steps to Reproduce:
1. Generate some file system load, e.g. make -j8 on a kernel tree located on an
NFS mounted file system
2. Run:
# while true;do echo 1 > /proc/sys/vm/drop_caches;sleep 2;done
  

Actual results:
Within a few minutes the system will start logging softlockup warnings for
various threads trying to acquire the inode_lock, e.g:

 BUG: soft lockup - CPU#3 stuck for 10s! [id:21024]
 CPU 3:
 Modules linked in: ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs
ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api sd_mod sg
dm_mirror dm_multipa
th dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac
parport_pc lp parport joydev usb_storage sata_nv floppy i2c_nforce2 libata
ohci_hcd i2c_core pcspkr scsi_mod k8temp
k8_edac hwmon edac_mc shpchp serio_raw nfs nfs_acl lockd fscache sunrpc e1000
 Pid: 21024, comm: id Not tainted 2.6.18-89.el5 #1
 RIP: 0010:[<ffffffff80064b57>]  [<ffffffff80064b57>] .text.lock.spinlock+0x5/0x30
 RSP: 0018:ffff8100ccf39b80  EFLAGS: 00000282
 RAX: 0000000000006621 RBX: ffff810103871280 RCX: 0000000000000014
 RDX: 00000000f0000008 RSI: ffff81000c54fe08 RDI: ffffffff802f40b0
 RBP: 0000000000000246 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff8100c228a6c0 R11: ffffffff8002c525 R12: ffffffff803053c0
 R13: 0000000300000000 R14: ffffffff80089214 R15: ffff8100ccf39b18
 FS:  00002ae2ffbb7710(0000) GS:ffff810103967640(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00002ae2ff5310d0 CR3: 00000001fc6d0000 CR4: 00000000000006e0

 Call Trace:
  [<ffffffff8003d716>] ifind_fast+0x1a/0x83
  [<ffffffff800230fa>] iget_locked+0x59/0x149
  [<ffffffff800475f5>] proc_get_inode+0x89/0x161
  [<ffffffff8002767a>] proc_lookup+0xb3/0xcc
  [<ffffffff80055c87>] proc_root_lookup+0x12/0x30
  [<ffffffff8000cbb4>] do_lookup+0xd3/0x1d4
  [<ffffffff80009f73>] __link_path_walk+0xa01/0xf42
  [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
  [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
  [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97
  [<ffffffff8001a954>] open_namei+0x73/0x6d5
  [<ffffffff8006683f>] do_page_fault+0x4eb/0x81d
  [<ffffffff8002732a>] do_filp_open+0x1c/0x38
  [<ffffffff80019720>] do_sys_open+0x44/0xbe
  [<ffffffff8005d116>] system_call+0x7e/0x83

 BUG: soft lockup - CPU#0 stuck for 10s! [dpkg-logger:21022]
 CPU 0:
 Modules linked in: ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs
ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api sd_mod sg
dm_mirror dm_multipa
th dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac
parport_pc lp parport joydev usb_storage sata_nv floppy i2c_nforce2 libata
ohci_hcd i2c_core pcspkr scsi_mod k8temp
k8_edac hwmon edac_mc shpchp serio_raw nfs nfs_acl lockd fscache sunrpc e1000
 Pid: 21022, comm: dpkg-logger Not tainted 2.6.18-89.el5 #1
 RIP: 0010:[<ffffffff80064b57>]  [<ffffffff80064b57>] .text.lock.spinlock+0x5/0x30
 RSP: 0018:ffff8101faf17ec0  EFLAGS: 00000286
 RAX: 0000000000008000 RBX: ffffffff8005604d RCX: 00000000c0000100
 RDX: ffff8101febac040 RSI: 0000000000000018 RDI: ffffffff802f40b0
 RBP: ffff8100c939f078 R08: ffff8101faf16000 R09: 0000000000000000
 R10: ffff8100c939f078 R11: ffff8102fff4dc48 R12: ffff8100cefa1480
 R13: ffff8100c8a56a68 R14: ffff8100c8a56a68 R15: ffff81030385d180
 FS:  00002b1625da8e10(0000) GS:ffffffff8039e000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00002b1625b69760 CR3: 00000002f9324000 CR4: 00000000000006e0

 Call Trace:
  [<ffffffff8002f34c>] generic_delete_inode+0xea/0x143
  [<ffffffff8000d11c>] dput+0xf6/0x114
  [<ffffffff800125f3>] __fput+0x16c/0x198 
  [<ffffffff800239d6>] filp_close+0x5c/0x64
  [<ffffffff8001d9e5>] sys_close+0x88/0xa2
  [<ffffffff8005d116>] system_call+0x7e/0x83


Expected results:
Performance should suck but no softlockup warnings.

Additional info:
From the linked LKML thread:

> Looks like everything is backed up on the inode_lock.  Why? Looks
> like drop_pagecache_sb() is doing something ..... suboptimal.
> 
> static void drop_pagecache_sb(struct super_block *sb)
> {
>         struct inode *inode;
> 
>         spin_lock(&inode_lock);
>         list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
>                 if (inode->i_state & (I_FREEING|I_WILL_FREE))
>                         continue;
>                 __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
>         }
>         spin_unlock(&inode_lock);
> }
> 
> It holds the inode_lock for an amazingly long time, and calls a
> function that ends up in ->release_page which can issue
> transactions.
> 
> Given that transactions can then mark an inode dirty or the
> kjournald might need to mark an inode dirty while holding
> transaction locks, the implementation of drop_pagecache_sb seems to
> be just a little dangerous....
> 
> Anyone know the reason why drop_pagecache_sb() uses such a brute-force
> mechanism to free up clean page cache pages?

Comment 1 Bryn M. Reeves 2008-05-02 11:48:37 UTC
Created attachment 304374 [details]
Proposed upstream patch to drop inode_lock across __invalidate_mapping_pages

Comment 2 Bryn M. Reeves 2008-05-02 11:52:38 UTC
The patch in comment #1 has been added to 2.6.25-mm1 as
vfs-fix-lock-inversion-in-drop_pagecache_sb.patch:

From: Jan Kara <jack>

Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
before calling __invalidate_mapping_pages().  We just have to make sure inode
won't go away from under us by keeping reference to it and putting the
reference only after we have safely resumed the scan of the inode list.  A bit
tricky but not too bad...

Signed-off-by: Jan Kara <jack>
Cc: Fengguang Wu <wfg.edu.cn>
Cc: David Chinner <dgc>
Signed-off-by: Andrew Morton <akpm>

Comment 7 Bryn M. Reeves 2008-07-24 16:14:55 UTC
Went into mainline back in April:

commit eccb95cee4f0d56faa46ef22fb94dd4a3578d3eb
Author: Jan Kara <jack>
Date:   Tue Apr 29 00:59:37 2008 -0700

    vfs: fix lock inversion in drop_pagecache_sb()
    
    Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    before calling __invalidate_mapping_pages().  We just have to make sure inode
    won't go away from under us by keeping reference to it and putting the
    reference only after we have safely resumed the scan of the inode list.  A bit
    tricky but not too bad...
    
    Signed-off-by: Jan Kara <jack>
    Cc: Fengguang Wu <wfg.edu.cn>
    Cc: David Chinner <dgc>
    Signed-off-by: Andrew Morton <akpm>
    Signed-off-by: Linus Torvalds <torvalds>


Comment 10 Don Zickus 2008-08-08 21:08:16 UTC
in kernel-2.6.18-103.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 17 errata-xmlrpc 2009-01-20 20:08:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.