Bug 721357 - [xfs/xfstests 013] possible deadlock when fsstressing
Summary: [xfs/xfstests 013] possible deadlock when fsstressing
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 5.8
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-14 11:28 UTC by Eryu Guan
Modified: 2015-12-28 12:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-02 13:08:42 UTC
Target Upstream Version:


Attachments (Terms of Use)
console log (206.50 KB, text/plain)
2011-07-14 11:28 UTC, Eryu Guan
no flags Details

Description Eryu Guan 2011-07-14 11:28:11 UTC
Created attachment 512868 [details]
console log

Description of problem:
xfstests 013

INFO: task fsstress:412 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   412    411           413       (NOTLB)
 ffff81005c49bed8 0000000000000086 0000000002441fff ffffffff800481d4
 ffff81005c49be48 0000000000000007 ffff81007c5aa0c0 ffff810002983100
 000008f2f501c8b9 00000000000db424 ffff81007c5aa2a8 0000000700000000
Call Trace:
 [<ffffffff800481d4>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80022357>] __up_read+0x19/0x7f
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800f59b0>] __sync_inodes+0x85/0xaa
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:413 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   413    411           414   412 (NOTLB)
 ffff810047567ed8 0000000000000086 0000000002441fff ffffffff800481d4
 ffff810047567e48 0000000000000007 ffff81007f47f080 ffff810037c60080
 000008f2baf95176 00000000000bf1eb ffff81007f47f268 000000068873b1e6
Call Trace:
 [<ffffffff800481d4>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80022357>] __up_read+0x19/0x7f
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800f59b0>] __sync_inodes+0x85/0xaa
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:414 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   414    411           415   413 (NOTLB)
 ffff81007dd7ded8 0000000000000082 0000000002441fff ffffffff800481d4
 ffff81007dd7de48 0000000000000007 ffff8100256a8080 ffff81007f44c100
 000008f28526742e 000000000038ffa0 ffff8100256a8268 0000000300000000
Call Trace:
 [<ffffffff800481d4>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80064604>] __down_read+0x12/0x92
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:416 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   416    411           417   415 (NOTLB)
 ffff8100501bfed8 0000000000000086 00000000000065f1 ffff81000c2e6800
 ffff8100501bfe48 0000000000000007 ffff81003c71e820 ffff810037c51100
 000008f2e25c2fd0 000000000000507f ffff81003c71ea08 000000050151d708
Call Trace:
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80064604>] __down_read+0x12/0x92
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:417 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   417    411           418   416 (NOTLB)
 ffff810044397ed8 0000000000000082 0000000010801040 ffff810075d25c00
 ffff810044397ef8 0000000000000007 ffff81005a297820 ffff810037c01820
 000008f2f4fb1e57 0000000000000d20 ffff81005a297a08 0000000110d790c0
Call Trace:
 [<ffffffff80047172>] try_to_wake_up+0x472/0x484
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80022357>] __up_read+0x19/0x7f
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800f59b0>] __sync_inodes+0x85/0xaa
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:418 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   418    411           419   417 (NOTLB)
 ffff81005a67fed8 0000000000000082 00000000000065f1 ffff81000c2e6800
 ffff81005a67fe48 0000000000000007 ffff810020119100 ffff81007875f820
 000008f2a1ec8603 0000000000001894 ffff8100201192e8 000000045a67fee8
Call Trace:
 [<ffffffff8002af8f>] iput+0x4b/0x84
 [<ffffffff80021095>] sync_sb_inodes+0x146/0x26f
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80064604>] __down_read+0x12/0x92
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:419 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   419    411           420   418 (NOTLB)
 ffff8100666c5ed8 0000000000000082 0000000002441fff ffffffff800481d4
 ffff8100666c5e48 0000000000000007 ffff810071f447e0 ffff810037c60080
 000008f2aff3d5ec 0000000000006411 ffff810071f449c8 000000067eced820
Call Trace:
 [<ffffffff800481d4>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80064604>] __down_read+0x12/0x92
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:420 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   420    411           421   419 (NOTLB)
 ffff81000ca83ed8 0000000000000082 00000000000065f1 ffff81000c2e6800
 ffff81000ca83e48 0000000000000007 ffff8100144960c0 ffff810037c60080
 000008f2f4ffca70 000000000000c4ce ffff8100144962a8 000000068873b1f9
Call Trace:
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80022357>] __up_read+0x19/0x7f
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800f59b0>] __sync_inodes+0x85/0xaa
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:421 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   421    411           422   420 (NOTLB)
 ffff810066ec3ed8 0000000000000086 00000000000065f1 ffff81000c2e6800
 ffff810066ec3e48 0000000000000007 ffff810047b78040 ffff810002983100
 000008f2a1d6cc9d 00000000001a5949 ffff810047b78228 0000000700000000
Call Trace:
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80064604>] __down_read+0x12/0x92
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

INFO: task fsstress:422 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D ffffffff801538c1     0   422    411           423   421 (NOTLB)
 ffff810065fbded8 0000000000000086 0000000002441fff ffffffff800481d4
 ffff810065fbde48 0000000000000007 ffff81004fea57e0 ffff810037c60080
 000008f2f579d9f5 000000000000498e ffff81004fea59c8 0000000672fc4408
Call Trace:
 [<ffffffff800481d4>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80022357>] __up_read+0x19/0x7f
 [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
 [<ffffffff800f59b0>] __sync_inodes+0x85/0xaa
 [<ffffffff800e67e5>] sync_filesystems+0x13/0x102
 [<ffffffff800e34d3>] do_sync+0x28/0x5a
 [<ffffffff800e3513>] sys_sync+0xe/0x12
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Version-Release number of selected component (if applicable):
kernel-2.6.18-238.19.1.el5

How reproducible:
Only once for now
I've tried to run 013 in loop, but still cannot reproduce it.

Steps to Reproduce:
1. xfstests 013
2.
3.
  
Actual results:
task hang

Expected results:
pass test

Additional info:
Full log is attached.

Comment 1 Ric Wheeler 2011-07-14 11:51:17 UTC
Which file system are you using here?

thanks!

Comment 2 Eryu Guan 2011-07-14 12:07:12 UTC
(In reply to comment #1)
> Which file system are you using here?
> 
> thanks!

Sorry, I shouldn't hide it in summary. It's xfs.

Comment 3 Dave Chinner 2011-07-15 01:20:48 UTC
I'm not so sure this is an XFS problem. kjournald is in the middle of waiting for IO during a transaction commit like this:

kjournald     D ffff81007f398040     0   717     51           742   678 (L-TLB)
 ffff81007d6bbcf0 0000000000000046 ffff81007e27d928 ffff81007b527928
 ffffffff800155a2 000000000000000a ffff81007e0e0860 ffff81007f398040
 00001c3a5732230e 0000000000005594 ffff81007e0e0a48 000000068807b407
Call Trace:
 [<ffffffff800155a2>] sync_buffer+0x0/0x3f
 [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
 [<ffffffff8005aa30>] getnstimeofday+0x10/0x28
 [<ffffffff800155a2>] sync_buffer+0x0/0x3f
 [<ffffffff800637ca>] io_schedule+0x3f/0x67
 [<ffffffff800155dd>] sync_buffer+0x3b/0x3f
 [<ffffffff800639f6>] __wait_on_bit+0x40/0x6e
 [<ffffffff800155a2>] sync_buffer+0x0/0x3f
 [<ffffffff80063a90>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff800a2929>] wake_bit_function+0x0/0x23
 [<ffffffff880312e8>] :jbd:__journal_file_buffer+0x13e/0x243
 [<ffffffff880339a5>] :jbd:journal_commit_transaction+0x543/0x1066
 [<ffffffff8003dd98>] lock_timer_base+0x1b/0x3c
 [<ffffffff8004b432>] try_to_del_timer_sync+0x7f/0x88
 [<ffffffff880375d3>] :jbd:kjournald+0xc1/0x213
 [<ffffffff800a28fb>] autoremove_wake_function+0x0/0x2e
 [<ffffffff88037512>] :jbd:kjournald+0x0/0x213
 [<ffffffff800a26e3>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032b26>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a26e3>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032a28>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

and syslogd is blocked on a fsync waiting for a journal commit:

syslogd       D ffffffff801538c1     0  2785      1          2788  2777 (NOTLB)
 ffff81007396bd88 0000000000000082 0000000600000296 0000000000000001
 000000037396bd18 000000000000000a ffff81007e2b4100 ffff81007f44c100
 00001c3a5731c376 000000000000cd33 ffff81007e2b42e8 0000000300000096
Call Trace:
 [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff800a28fb>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8002fe73>] __writeback_single_inode+0x1d9/0x318
 [<ffffffff800e2be6>] do_readv_writev+0x26e/0x291
 [<ffffffff800f5a22>] sync_inode+0x24/0x33
 [<ffffffff8804c370>] :ext3:ext3_sync_file+0xcc/0xf8
 [<ffffffff80050508>] do_fsync+0x52/0xa4
 [<ffffffff800e346b>] __do_fsync+0x23/0x36
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

then there's pcscd, acpid, sshd and other system daemons stuck waiting for ext3 journal commits. 

Then there's a sys_sync call from fsstress that is stuck waiting for a ext3 journal commit to complete, and then a bunch more sys_sync calls stuck behind that first one. It's that bunch of sys_sync() calls that triggered the hung task warning.

So this looks like a problem with your root filesystem (ext3), not XFS. It is possible a manifestation of the well known fsync-the-world problem if you are using loopback devices hosted on the root ext3 filesystem to provide the TEST_DEV/SCRATCH_DEV devices for xfstests....

Comment 4 Eryu Guan 2011-07-15 02:57:08 UTC
Thanks for the explaining. Yes, this is a testing on loop device hosted on ext3.

(Perhaps qe should improve z-stream testing to try to avoid using loop device when running xfstests, as what we've already done with y-stream testing)

Is there a bug number for the sync-the-world problem?

Comment 5 RHEL Program Management 2014-03-07 13:40:19 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 6 RHEL Program Management 2014-06-02 13:08:42 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).


Note You need to log in before you can comment on or make changes to this bug.