BUG: write-lock lockup on CPU#4, cc1/13482, d5f92abc (Not tainted) BUG: write-lock lockup on CPU#3, cc1/13502, d5f92abc (Not tainted) [<c01df047>] __write_lock_debug+0xb4/0xdd [<c01df0af>] _raw_write_lock+0x3f/0x7c [<c0144aed>] add_to_page_cache+0x34/0xaf [<c0185dc6>] mpage_readpages+0xed/0x13d [<c011a476>] activate_task+0x8f/0x9e [<c011aa80>] try_to_wake_up+0x6e/0x343 [<f9cd21ec>] linvfs_readpages+0x0/0x15 [xfs] [<c014b57a>] read_pages+0x2a/0xf7 [<f9cd1fd2>] linvfs_get_block+0x0/0x35 [xfs] [<c0149245>] __alloc_pages+0x109/0x469 [<c014b7b0>] __do_page_cache_readahead+0x169/0x16e [<c0145e26>] filemap_nopage+0x31d/0x39b [<c0154bf7>] do_no_page+0x96/0x30b [<c0155089>] __handle_mm_fault+0x13e/0x1e5 [<c031ede4>] do_page_fault+0x274/0x700 [<c011625c>] smp_apic_timer_interrupt+0xc1/0xca [<c031eb70>] do_page_fault+0x0/0x700 [<c010457f>] error_code+0x4f/0x54 [<c01df047>] __write_lock_debug+0xb4/0xdd [<c01df0af>] _raw_write_lock+0x3f/0x7c [<c0144aed>] add_to_page_cache+0x34/0xaf [<c0185dc6>] mpage_readpages+0xed/0x13d [<f9cd1fd2>] linvfs_get_block+0x0/0x35 [xfs] [<c0148bd0>] rmqueue_bulk+0x77/0x81 [<f9cd21ec>] linvfs_readpages+0x0/0x15 [xfs] [<c014b57a>] read_pages+0x2a/0xf7 [<f9cd1fd2>] linvfs_get_block+0x0/0x35 [xfs] [<c0149245>] __alloc_pages+0x109/0x469 [<c014b7b0>] __do_page_cache_readahead+0x169/0x16e [<c0145e26>] filemap_nopage+0x31d/0x39b [<c0154bf7>] do_no_page+0x96/0x30b [<c0155089>] __handle_mm_fault+0x13e/0x1e5 [<c031ede4>] do_page_fault+0x274/0x700 [<c011625c>] smp_apic_timer_interrupt+0xc1/0xca [<c031eb70>] do_page_fault+0x0/0x700 [<c010457f>] error_code+0x4f/0x54 $ uname -a Linux rhlx01.fht-esslingen.de 2.6.14-1.1637_FC4smp #1 SMP Wed Nov 9 18:34:11 EST 2005 i686 i686 i386 GNU/Linux Server froze for about 30 minutes completely, even the clock didn't change. Now it seems to be back at normal.
Created attachment 121461 [details] dmesg dump
and again: Nov 30 18:05:10 rhlx01 kernel: BUG: write-lock lockup on CPU#4, msgfmt/9352, e4115510 (Not tainted) Nov 30 18:05:10 rhlx01 kernel: BUG: write-lock lockup on CPU#2, msgfmt/9353, e4115510 (Not tainted) Nov 30 18:05:10 rhlx01 kernel: [<c01df047>] __write_lock_debug+0xb4/0xdd Nov 30 18:05:10 rhlx01 kernel: [<c01df0af>] _raw_write_lock+0x3f/0x7c Nov 30 18:05:10 rhlx01 kernel: [<c0144aed>] add_to_page_cache+0x34/0xaf Nov 30 18:05:10 rhlx01 kernel: [<c0185dc6>] mpage_readpages+0xed/0x13d Nov 30 18:05:10 rhlx01 kernel: [<c01bf58c>] avc_has_perm_noaudit+0x26/0xd1 Nov 30 18:05:10 rhlx01 kernel: [<f9cd21ec>] linvfs_readpages+0x0/0x15 [xfs] Nov 30 18:05:10 rhlx01 kernel: [<c014b57a>] read_pages+0x2a/0xf7 Nov 30 18:05:10 rhlx01 kernel: [<f9cd1fd2>] linvfs_get_block+0x0/0x35 [xfs] Nov 30 18:05:10 rhlx01 kernel: [<c0149245>] __alloc_pages+0x109/0x469 Nov 30 18:05:10 rhlx01 kernel: [<f9cda405>] vn_revalidate+0x4c/0x58 [xfs] Nov 30 18:05:10 rhlx01 kernel: [<c014b7b0>] __do_page_cache_readahead+0x169/0x16e Nov 30 18:05:11 rhlx01 kernel: [<c0145e26>] filemap_nopage+0x31d/0x39b Nov 30 18:05:11 rhlx01 kernel: [<c0154bf7>] do_no_page+0x96/0x30b Nov 30 18:05:11 rhlx01 kernel: [<c0155089>] __handle_mm_fault+0x13e/0x1e5 Nov 30 18:05:12 rhlx01 kernel: [<c031ede4>] do_page_fault+0x274/0x700 Nov 30 18:05:12 rhlx01 kernel: [<c031eb70>] do_page_fault+0x0/0x700 Nov 30 18:05:12 rhlx01 kernel: [<c010457f>] error_code+0x4f/0x54 Nov 30 18:05:13 rhlx01 kernel: [<c01df047>] __write_lock_debug+0xb4/0xdd Nov 30 18:05:13 rhlx01 kernel: [<c01df0af>] _raw_write_lock+0x3f/0x7c Nov 30 18:05:13 rhlx01 kernel: [<c0144aed>] add_to_page_cache+0x34/0xaf Nov 30 18:05:13 rhlx01 kernel: [<c0185dc6>] mpage_readpages+0xed/0x13d Nov 30 18:05:13 rhlx01 kernel: [<c0148bd0>] rmqueue_bulk+0x77/0x81 Nov 30 18:05:13 rhlx01 kernel: [<f9cd21ec>] linvfs_readpages+0x0/0x15 [xfs] Nov 30 18:05:13 rhlx01 kernel: [<c014b57a>] read_pages+0x2a/0xf7 Nov 30 18:05:13 rhlx01 kernel: [<f9cd1fd2>] linvfs_get_block+0x0/0x35 [xfs] Nov 30 18:05:13 rhlx01 kernel: [<c0149245>] __alloc_pages+0x109/0x469 Nov 30 18:05:13 rhlx01 kernel: [<f9cda405>] vn_revalidate+0x4c/0x58 [xfs] Nov 30 18:05:13 rhlx01 kernel: [<c014b7b0>] __do_page_cache_readahead+0x169/0x16e Nov 30 18:05:14 rhlx01 kernel: [<c0145e26>] filemap_nopage+0x31d/0x39b Nov 30 18:05:14 rhlx01 kernel: [<c0154bf7>] do_no_page+0x96/0x30b Nov 30 18:05:14 rhlx01 kernel: [<c0155089>] __handle_mm_fault+0x13e/0x1e5 Nov 30 18:05:14 rhlx01 kernel: [<c031ede4>] do_page_fault+0x274/0x700 Nov 30 18:05:14 rhlx01 kernel: [<c031eb70>] do_page_fault+0x0/0x700 Nov 30 18:05:14 rhlx01 kernel: [<c010457f>] error_code+0x4f/0x54 The system crashed 3 hours later without any message. This is still 2.6.14-1.1637_FC4smp. We will now reboot with 2.6.14-1.1644_FC4smp and see if this BUG still appears.
please report XFS problems to its upstream maintainer. <nathans> As an unsupported filesystem in Fedora, it'll get fixed faster that way.
I am reopening this bug because it happened again without xfs. I will attach a dmesg copy.
Created attachment 122168 [details] dmesg I just saw that we are not using the newest version (2.6.14-1.1644_FC4smp) but still 2.6.14-1.1637_FC4smp.
There's not much to go on here. Things locked up because 5 tasks are stuck in add_to_page_cache() at write_lock_irq(&mapping->tree_lock); but there's no sign of who owns that lock or why it has not been released. Some things that might help would be enabling NMI lockup detection, or obtaining a full alt-sysrq-t after the lockup. It still has XFS mounted, by the looks of things; it's certainly possible that it's still an XFS bug and we're just hitting a lock that XFS has not released. What sort of workload are you running? Have there been any other suspicious log messages that might help?
There is no XFS mounted anymore. It used to be but we moved the data a bit around and reformatted the XFS partition with ext3. But the machine has not rebooted since the last XFS partition has been unmounted. The server is a mostly doing mirrors for various projects. We usually have over 150MBit/s on our network interface and not much else is happening. We will reboot the server probably today with 2.6.14-1.1644_FC4smp and see if this will happen again. As all cases happened when nobody was around it is hard to do alt-sysrq-t.
OK, thanks; let me know how that kernel goes. Did this behaviour suddenly start with one particular kernel?
We have rebooted and 2.6.14-1.1644_FC4smp is now running; I will post here if we see this bug again. This has only happened with 2.6.14-1.1637_FC4smp and was never seen before.
Created attachment 122954 [details] output of dmesg The same bug has happened again. See attachment. We are currently running 2.6.14-1.1644_FC4smp with following modules loaded: Module Size Used by loop 20937 0 ipv6 271009 396 ipt_REJECT 9921 3 iptable_filter 7105 1 ip_tables 25665 2 ipt_REJECT,iptable_filter dm_mod 61533 0 video 20293 0 button 10705 0 battery 13509 0 ac 8901 0 ohci_hcd 26721 0 cfi_probe 10049 0 gen_probe 7617 1 cfi_probe scb2_flash 8781 0 mtdcore 11913 1 scb2_flash chipreg 7489 2 cfi_probe,scb2_flash map_funcs 5953 1 scb2_flash e100 41281 0 mii 9409 1 e100 e1000 108077 0 floppy 66181 0 qla2300 128705 0 qla2xxx 129053 2 qla2300 scsi_transport_fc 32321 1 qla2xxx ext3 135753 11 jbd 62037 1 ext3 aic7xxx 154741 0 scsi_transport_spi 25153 1 aic7xxx i2o_block 17485 13 i2o_core 47721 1 i2o_block sd_mod 22849 2 scsi_mod 140009 5 qla2xxx,scsi_transport_fc,aic7xxx,scsi_transport_spi,sd_mod
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing due to previous comment, and note by reporter that 2.6.14-1.1644_FC4smp solved it.