Bug 721359 - [xfs/xfstests] 104 causes the system hung_task_timeout and soft lockup
Summary: [xfs/xfstests] 104 causes the system hung_task_timeout and soft lockup
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.8
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 650122
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-14 11:36 UTC by Eryu Guan
Modified: 2015-12-28 12:59 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 650122
Environment:
Last Closed: 2014-06-02 13:22:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
soft lockup log (28.72 KB, text/plain)
2011-07-14 11:36 UTC, Eryu Guan
no flags Details

Description Eryu Guan 2011-07-14 11:36:41 UTC
Created attachment 512869 [details]
soft lockup log

Clone for a new soft lockup, I saw the soft lockup only once for now.
Kernel version is 2.6.18-238.19.1.el5
Upload the full log

+++ This bug was initially created as a clone of Bug #650122 +++

Description of problem:
When testing xfs with xfstests, 104 causes the system hung_task_timeout. 104 cannot move on, but the system still can be used as normal.

INFO: task pdflush:16694 blocked for more than 120 seconds. 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
pdflush       D ffff810003347120     0 16694     43         22136 28115 (L-TLB) 
 ffff81005c457690 0000000000000046 0000000000200200 ffff810049047900 
 ffff810049047a80 000000000000000a ffff8100bcbe9820 ffff810037c02080 
 000001da8202f96d 0000000000009401 ffff8100bcbe9a08 000000023beaa500 
Call Trace: 
 [<ffffffff8006466c>] __down_read+0x7a/0x92 
 [<ffffffff88680055>] :xfs:xfs_bmap_btalloc+0x27c/0x8c1 
 [<ffffffff80046fbc>] try_to_wake_up+0x472/0x484 
 [<ffffffff8867a2ef>] :xfs:xfs_bmap_search_multi_extents+0x9d/0xda 
 [<ffffffff88680d8a>] :xfs:xfs_bmapi+0x6eb/0xe79 
 [<ffffffff886a1bae>] :xfs:xfs_log_reserve+0xad/0xc9 
 [<ffffffff8869d104>] :xfs:xfs_iomap_write_allocate+0x201/0x328 
 [<ffffffff8869db49>] :xfs:xfs_iomap+0x22a/0x2a5 
 [<ffffffff886b270a>] :xfs:xfs_map_blocks+0x2d/0x63 
 [<ffffffff886b334a>] :xfs:xfs_page_state_convert+0x2af/0x544 
 [<ffffffff886b372b>] :xfs:xfs_vm_writepage+0xa7/0xe0 
 [<ffffffff8001d0e6>] mpage_writepages+0x1bf/0x37d 
 [<ffffffff886b3684>] :xfs:xfs_vm_writepage+0x0/0xe0 
 [<ffffffff8008cf69>] find_busiest_group+0x20d/0x621 
 [<ffffffff8005ac3b>] do_writepages+0x20/0x2f 
 [<ffffffff8002fc91>] __writeback_single_inode+0x19e/0x318 
 [<ffffffff80021039>] sync_sb_inodes+0x1b5/0x26f 
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4 
 [<ffffffff80051143>] writeback_inodes+0x82/0xd8 
 [<ffffffff800cbb57>] wb_kupdate+0xd4/0x14e 
 [<ffffffff8005662e>] pdflush+0x0/0x1fb 
 [<ffffffff8005677f>] pdflush+0x151/0x1fb 
 [<ffffffff800cba83>] wb_kupdate+0x0/0x14e 
 [<ffffffff80032968>] kthread+0xfe/0x132 
 [<ffffffff8005dfb1>] child_rip+0xa/0x11 
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4 
 [<ffffffff800e4629>] generic_write_end+0x0/0x58 
 [<ffffffff886b1f04>] :xfs:xfs_vm_write_begin+0x0/0x1e 
 [<ffffffff8003286a>] kthread+0x0/0x132 
 [<ffffffff8005dfa7>] child_rip+0x0/0x11 
 
INFO: task fsstress:22324 blocked for more than 120 seconds. 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
fsstress      D 0000000000000001     0 22324  22322         22325       (NOTLB) 
 ffff81001adfba98 0000000000000082 ffff810086381698 ffffffff886a070c 
 ffff81002e1efa98 0000000000000007 ffff8100088770c0 ffff8100ba9da820 
 000001d360dd41e8 000000000000da95 ffff8100088772a8 000000004f8efca8 
Call Trace: 
 [<ffffffff886a070c>] :xfs:xlog_state_get_iclog_space+0x31/0x221 
 [<ffffffff8006381f>] schedule_timeout+0x1e/0xad 
 [<ffffffff8006466c>] __down_read+0x7a/0x92 
 [<ffffffff8869432e>] :xfs:xfs_ialloc_ag_select+0xc8/0x271 
 [<ffffffff886a1712>] :xfs:_xfs_log_force+0x63/0x68 
 [<ffffffff8869450a>] :xfs:xfs_dialloc+0x33/0x809 
 [<ffffffff886b1d31>] :xfs:kmem_zone_zalloc+0x1e/0x2f 
 [<ffffffff886a1adc>] :xfs:xlog_ticket_get+0xc1/0xe6 
 [<ffffffff8869ab79>] :xfs:xfs_ialloc+0x5f/0x57f 
 [<ffffffff886ac6cb>] :xfs:xfs_dir_ialloc+0x86/0x2b7 
 [<ffffffff886a1074>] :xfs:xlog_grant_log_space+0x204/0x25c 
 [<ffffffff886af198>] :xfs:xfs_create+0x237/0x45c 
 [<ffffffff88674d4f>] :xfs:xfs_attr_get+0x8e/0x9f 
 [<ffffffff886b8ab4>] :xfs:xfs_vn_mknod+0x144/0x215 
 [<ffffffff800ea838>] vfs_mknod+0x105/0x176 
 [<ffffffff800ead25>] sys_mknodat+0x144/0x188 
 [<ffffffff8005d229>] tracesys+0x71/0xe0 
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 

Job link:
https://beaker.engineering.redhat.com/recipes/57248

Console log:
https://beaker.engineering.redhat.com/logs/2010/11/287/28779/57248///console.log

Version-Release number of selected component (if applicable):
RHEL5.6-Server-20101029.0
kernel 2.6.18-230.el5

How reproducible:
I do made this happen several times.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
104 hung there and average load of the system is high.

Expected results:
104 passed with a sane system.

Additional info:

--- Additional comment from yugzhang on 2010-11-11 03:22:11 EST ---

Clarify more about this bug:
I use xfstests from /kernel/filesystems/xfs/xfstests to test xfs automatically in beaker.
By viewing the logs, I saw this problem once. 
But I couldn't reproduce it manually.

How reproducible:
Rarely

Steps to Reproduce:
1.Install and configure xfstests(see README under xfstests directory)
2../check 104
3.

--- Additional comment from yugzhang on 2010-11-11 03:41:01 EST ---

More attempts. Now I could catch this ocassionally when testing manually.

--- Additional comment from esandeen on 2010-11-11 11:47:02 EST ---

There's a fix for this upstream, it's a known problem (thanks Christoph) and we'll have to decide how critical it is for RHEL5.6 I think.

--- Additional comment from dchinner on 2010-11-14 23:21:03 EST ---

It is not exactly a simple fix - it touches a lot of code because it changes the way we access the perag structures. However, this deadlock is actually a very rare problem in the real world so I'd be inclined to ignore it until there is a real need to fix it.

Cheers,

Dave.

Comment 1 Dave Chinner 2011-07-15 00:58:32 UTC
Why clone an existing open bug for an re-occurrence of the same problem? I'll just mark this a dup of the original bug if there isn't any reason to track the issue separately....

Comment 2 Eryu Guan 2011-07-15 02:53:01 UTC
I'm just not sure whether this new soft lockup shares the same root cause with the original bug. If so I can mark it as a dup and sorry for the noise...

Comment 3 Dave Chinner 2011-07-15 06:20:17 UTC
(In reply to comment #2)
> I'm just not sure whether this new soft lockup shares the same root cause with
> the original bug. 

In that case, shouldn't you raise a completely new bug, not clone a previous one?

> If so I can mark it as a dup and sorry for the noise...

The soft lockup output is not conclusive. Do you have sysrq-w output?

Comment 4 RHEL Program Management 2014-03-07 13:54:51 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 5 RHEL Program Management 2014-06-02 13:22:39 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).


Note You need to log in before you can comment on or make changes to this bug.