Created attachment 512869 [details] soft lockup log Clone for a new soft lockup, I saw the soft lockup only once for now. Kernel version is 2.6.18-238.19.1.el5 Upload the full log +++ This bug was initially created as a clone of Bug #650122 +++ Description of problem: When testing xfs with xfstests, 104 causes the system hung_task_timeout. 104 cannot move on, but the system still can be used as normal. INFO: task pdflush:16694 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. pdflush D ffff810003347120 0 16694 43 22136 28115 (L-TLB) ffff81005c457690 0000000000000046 0000000000200200 ffff810049047900 ffff810049047a80 000000000000000a ffff8100bcbe9820 ffff810037c02080 000001da8202f96d 0000000000009401 ffff8100bcbe9a08 000000023beaa500 Call Trace: [<ffffffff8006466c>] __down_read+0x7a/0x92 [<ffffffff88680055>] :xfs:xfs_bmap_btalloc+0x27c/0x8c1 [<ffffffff80046fbc>] try_to_wake_up+0x472/0x484 [<ffffffff8867a2ef>] :xfs:xfs_bmap_search_multi_extents+0x9d/0xda [<ffffffff88680d8a>] :xfs:xfs_bmapi+0x6eb/0xe79 [<ffffffff886a1bae>] :xfs:xfs_log_reserve+0xad/0xc9 [<ffffffff8869d104>] :xfs:xfs_iomap_write_allocate+0x201/0x328 [<ffffffff8869db49>] :xfs:xfs_iomap+0x22a/0x2a5 [<ffffffff886b270a>] :xfs:xfs_map_blocks+0x2d/0x63 [<ffffffff886b334a>] :xfs:xfs_page_state_convert+0x2af/0x544 [<ffffffff886b372b>] :xfs:xfs_vm_writepage+0xa7/0xe0 [<ffffffff8001d0e6>] mpage_writepages+0x1bf/0x37d [<ffffffff886b3684>] :xfs:xfs_vm_writepage+0x0/0xe0 [<ffffffff8008cf69>] find_busiest_group+0x20d/0x621 [<ffffffff8005ac3b>] do_writepages+0x20/0x2f [<ffffffff8002fc91>] __writeback_single_inode+0x19e/0x318 [<ffffffff80021039>] sync_sb_inodes+0x1b5/0x26f [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4 [<ffffffff80051143>] writeback_inodes+0x82/0xd8 [<ffffffff800cbb57>] wb_kupdate+0xd4/0x14e [<ffffffff8005662e>] pdflush+0x0/0x1fb [<ffffffff8005677f>] pdflush+0x151/0x1fb [<ffffffff800cba83>] wb_kupdate+0x0/0x14e [<ffffffff80032968>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4 [<ffffffff800e4629>] generic_write_end+0x0/0x58 [<ffffffff886b1f04>] :xfs:xfs_vm_write_begin+0x0/0x1e [<ffffffff8003286a>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 INFO: task fsstress:22324 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D 0000000000000001 0 22324 22322 22325 (NOTLB) ffff81001adfba98 0000000000000082 ffff810086381698 ffffffff886a070c ffff81002e1efa98 0000000000000007 ffff8100088770c0 ffff8100ba9da820 000001d360dd41e8 000000000000da95 ffff8100088772a8 000000004f8efca8 Call Trace: [<ffffffff886a070c>] :xfs:xlog_state_get_iclog_space+0x31/0x221 [<ffffffff8006381f>] schedule_timeout+0x1e/0xad [<ffffffff8006466c>] __down_read+0x7a/0x92 [<ffffffff8869432e>] :xfs:xfs_ialloc_ag_select+0xc8/0x271 [<ffffffff886a1712>] :xfs:_xfs_log_force+0x63/0x68 [<ffffffff8869450a>] :xfs:xfs_dialloc+0x33/0x809 [<ffffffff886b1d31>] :xfs:kmem_zone_zalloc+0x1e/0x2f [<ffffffff886a1adc>] :xfs:xlog_ticket_get+0xc1/0xe6 [<ffffffff8869ab79>] :xfs:xfs_ialloc+0x5f/0x57f [<ffffffff886ac6cb>] :xfs:xfs_dir_ialloc+0x86/0x2b7 [<ffffffff886a1074>] :xfs:xlog_grant_log_space+0x204/0x25c [<ffffffff886af198>] :xfs:xfs_create+0x237/0x45c [<ffffffff88674d4f>] :xfs:xfs_attr_get+0x8e/0x9f [<ffffffff886b8ab4>] :xfs:xfs_vn_mknod+0x144/0x215 [<ffffffff800ea838>] vfs_mknod+0x105/0x176 [<ffffffff800ead25>] sys_mknodat+0x144/0x188 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Job link: https://beaker.engineering.redhat.com/recipes/57248 Console log: https://beaker.engineering.redhat.com/logs/2010/11/287/28779/57248///console.log Version-Release number of selected component (if applicable): RHEL5.6-Server-20101029.0 kernel 2.6.18-230.el5 How reproducible: I do made this happen several times. Steps to Reproduce: 1. 2. 3. Actual results: 104 hung there and average load of the system is high. Expected results: 104 passed with a sane system. Additional info: --- Additional comment from yugzhang on 2010-11-11 03:22:11 EST --- Clarify more about this bug: I use xfstests from /kernel/filesystems/xfs/xfstests to test xfs automatically in beaker. By viewing the logs, I saw this problem once. But I couldn't reproduce it manually. How reproducible: Rarely Steps to Reproduce: 1.Install and configure xfstests(see README under xfstests directory) 2../check 104 3. --- Additional comment from yugzhang on 2010-11-11 03:41:01 EST --- More attempts. Now I could catch this ocassionally when testing manually. --- Additional comment from esandeen on 2010-11-11 11:47:02 EST --- There's a fix for this upstream, it's a known problem (thanks Christoph) and we'll have to decide how critical it is for RHEL5.6 I think. --- Additional comment from dchinner on 2010-11-14 23:21:03 EST --- It is not exactly a simple fix - it touches a lot of code because it changes the way we access the perag structures. However, this deadlock is actually a very rare problem in the real world so I'd be inclined to ignore it until there is a real need to fix it. Cheers, Dave.
Why clone an existing open bug for an re-occurrence of the same problem? I'll just mark this a dup of the original bug if there isn't any reason to track the issue separately....
I'm just not sure whether this new soft lockup shares the same root cause with the original bug. If so I can mark it as a dup and sorry for the noise...
(In reply to comment #2) > I'm just not sure whether this new soft lockup shares the same root cause with > the original bug. In that case, shouldn't you raise a completely new bug, not clone a previous one? > If so I can mark it as a dup and sorry for the noise... The soft lockup output is not conclusive. Do you have sysrq-w output?
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).