Bug 650122

Summary: [xfs/xfstests] 104 causes the system hung_task_timeout
Product: Red Hat Enterprise Linux 5 Reporter: Igor Zhang <yugzhang>
Component: kernelAssignee: fs-maint
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.6CC: baumanmo, branto, dchinner, eguan, esandeen, rwheeler, yugzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 721359 (view as bug list) Environment:
Last Closed: 2014-04-24 20:31:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 721359    

Description Igor Zhang 2010-11-05 09:47:05 UTC
Description of problem:
When testing xfs with xfstests, 104 causes the system hung_task_timeout. 104 cannot move on, but the system still can be used as normal.

INFO: task pdflush:16694 blocked for more than 120 seconds. 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
pdflush       D ffff810003347120     0 16694     43         22136 28115 (L-TLB) 
 ffff81005c457690 0000000000000046 0000000000200200 ffff810049047900 
 ffff810049047a80 000000000000000a ffff8100bcbe9820 ffff810037c02080 
 000001da8202f96d 0000000000009401 ffff8100bcbe9a08 000000023beaa500 
Call Trace: 
 [<ffffffff8006466c>] __down_read+0x7a/0x92 
 [<ffffffff88680055>] :xfs:xfs_bmap_btalloc+0x27c/0x8c1 
 [<ffffffff80046fbc>] try_to_wake_up+0x472/0x484 
 [<ffffffff8867a2ef>] :xfs:xfs_bmap_search_multi_extents+0x9d/0xda 
 [<ffffffff88680d8a>] :xfs:xfs_bmapi+0x6eb/0xe79 
 [<ffffffff886a1bae>] :xfs:xfs_log_reserve+0xad/0xc9 
 [<ffffffff8869d104>] :xfs:xfs_iomap_write_allocate+0x201/0x328 
 [<ffffffff8869db49>] :xfs:xfs_iomap+0x22a/0x2a5 
 [<ffffffff886b270a>] :xfs:xfs_map_blocks+0x2d/0x63 
 [<ffffffff886b334a>] :xfs:xfs_page_state_convert+0x2af/0x544 
 [<ffffffff886b372b>] :xfs:xfs_vm_writepage+0xa7/0xe0 
 [<ffffffff8001d0e6>] mpage_writepages+0x1bf/0x37d 
 [<ffffffff886b3684>] :xfs:xfs_vm_writepage+0x0/0xe0 
 [<ffffffff8008cf69>] find_busiest_group+0x20d/0x621 
 [<ffffffff8005ac3b>] do_writepages+0x20/0x2f 
 [<ffffffff8002fc91>] __writeback_single_inode+0x19e/0x318 
 [<ffffffff80021039>] sync_sb_inodes+0x1b5/0x26f 
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4 
 [<ffffffff80051143>] writeback_inodes+0x82/0xd8 
 [<ffffffff800cbb57>] wb_kupdate+0xd4/0x14e 
 [<ffffffff8005662e>] pdflush+0x0/0x1fb 
 [<ffffffff8005677f>] pdflush+0x151/0x1fb 
 [<ffffffff800cba83>] wb_kupdate+0x0/0x14e 
 [<ffffffff80032968>] kthread+0xfe/0x132 
 [<ffffffff8005dfb1>] child_rip+0xa/0x11 
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4 
 [<ffffffff800e4629>] generic_write_end+0x0/0x58 
 [<ffffffff886b1f04>] :xfs:xfs_vm_write_begin+0x0/0x1e 
 [<ffffffff8003286a>] kthread+0x0/0x132 
 [<ffffffff8005dfa7>] child_rip+0x0/0x11 
 
INFO: task fsstress:22324 blocked for more than 120 seconds. 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
fsstress      D 0000000000000001     0 22324  22322         22325       (NOTLB) 
 ffff81001adfba98 0000000000000082 ffff810086381698 ffffffff886a070c 
 ffff81002e1efa98 0000000000000007 ffff8100088770c0 ffff8100ba9da820 
 000001d360dd41e8 000000000000da95 ffff8100088772a8 000000004f8efca8 
Call Trace: 
 [<ffffffff886a070c>] :xfs:xlog_state_get_iclog_space+0x31/0x221 
 [<ffffffff8006381f>] schedule_timeout+0x1e/0xad 
 [<ffffffff8006466c>] __down_read+0x7a/0x92 
 [<ffffffff8869432e>] :xfs:xfs_ialloc_ag_select+0xc8/0x271 
 [<ffffffff886a1712>] :xfs:_xfs_log_force+0x63/0x68 
 [<ffffffff8869450a>] :xfs:xfs_dialloc+0x33/0x809 
 [<ffffffff886b1d31>] :xfs:kmem_zone_zalloc+0x1e/0x2f 
 [<ffffffff886a1adc>] :xfs:xlog_ticket_get+0xc1/0xe6 
 [<ffffffff8869ab79>] :xfs:xfs_ialloc+0x5f/0x57f 
 [<ffffffff886ac6cb>] :xfs:xfs_dir_ialloc+0x86/0x2b7 
 [<ffffffff886a1074>] :xfs:xlog_grant_log_space+0x204/0x25c 
 [<ffffffff886af198>] :xfs:xfs_create+0x237/0x45c 
 [<ffffffff88674d4f>] :xfs:xfs_attr_get+0x8e/0x9f 
 [<ffffffff886b8ab4>] :xfs:xfs_vn_mknod+0x144/0x215 
 [<ffffffff800ea838>] vfs_mknod+0x105/0x176 
 [<ffffffff800ead25>] sys_mknodat+0x144/0x188 
 [<ffffffff8005d229>] tracesys+0x71/0xe0 
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 

Job link:
https://beaker.engineering.redhat.com/recipes/57248

Console log:
https://beaker.engineering.redhat.com/logs/2010/11/287/28779/57248///console.log

Version-Release number of selected component (if applicable):
RHEL5.6-Server-20101029.0
kernel 2.6.18-230.el5

How reproducible:
I do made this happen several times.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
104 hung there and average load of the system is high.

Expected results:
104 passed with a sane system.

Additional info:

Comment 1 Igor Zhang 2010-11-11 08:22:11 UTC
Clarify more about this bug:
I use xfstests from /kernel/filesystems/xfs/xfstests to test xfs automatically in beaker.
By viewing the logs, I saw this problem once. 
But I couldn't reproduce it manually.

How reproducible:
Rarely

Steps to Reproduce:
1.Install and configure xfstests(see README under xfstests directory)
2../check 104
3.

Comment 2 Igor Zhang 2010-11-11 08:41:01 UTC
More attempts. Now I could catch this ocassionally when testing manually.

Comment 3 Eric Sandeen 2010-11-11 16:47:02 UTC
There's a fix for this upstream, it's a known problem (thanks Christoph) and we'll have to decide how critical it is for RHEL5.6 I think.

Comment 4 Dave Chinner 2010-11-15 04:21:03 UTC
It is not exactly a simple fix - it touches a lot of code because it changes the way we access the perag structures. However, this deadlock is actually a very rare problem in the real world so I'd be inclined to ignore it until there is a real need to fix it.

Cheers,

Dave.

Comment 5 RHEL Program Management 2014-03-07 13:44:18 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 7 Red Hat Bugzilla 2023-09-14 01:22:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days