Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 657336

Summary: [xfstests 232] fsstress blocked for more than 120 seconds
Product: Red Hat Enterprise Linux 5 Reporter: Boris Ranto <branto>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.6CC: rwheeler
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-21 15:27:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Boris Ranto 2010-11-25 14:06:01 UTC
Description of problem:
xfstests no. 232 will get stuck in fsstress part of the test (probably) forever when it is executed(I've let it run for whole night but nothing changed). The test itself can be killed but the fsstress processes that it generates cannot be killed.

Version-Release number of selected component (if applicable):
kernel-2.6.18-232.el5

How reproducible:
Always

Steps to Reproduce:
1. Run test no. 232 from xfstests for ext4 or ext3, e.g.:
TEST_PARAM_RUNTESTS="232" TEST_PARAM_FSTYPE="ext4" make
  
Actual results:
Test is stuck forever. Although the make can be killed, fsstress processes that remain in background cannot.

Expected results:
Test won't get stuck.

Additional info:
Reproducible on ext3 and ext4 in rhel5. Xfs works ok, rhel6 works ok. Latest rhel5 kernel (2.6.18-233.el5) is also vulnerable. Reproducible on i686 and x86_64 using two different machines therefore this shouldn't be machine-specific.
After a while, following output is generated into dmesg (more than once, for more than one fsstress/pdflush process):
[root@dell-pesc1420-01 ~]# dmesg |tail -45
INFO: task fsstress:4191 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsstress      D 0000009E  1988  4191   4180                4190 (NOTLB)
       e0cfbe24 00000082 facf82e8 0000009e f4ee052c 4cee6738 01e0bd66 00000007 
       f7fa4aa0 fae20201 0000009e 00127f19 00000002 f7fa4bac c1814b08 f359ee40 
       e0cfbdf8 00000000 c18154a8 c181b94c 00000020 00000000 e0cfbe40 c1a78c38 
Call Trace:
 [<c0436cc7>] prepare_to_wait+0x24/0x46
 [<f8c51a5a>] start_this_handle+0x220/0x337 [jbd2]
 [<c0436b7b>] autoremove_wake_function+0x0/0x2d
 [<f8c51bfc>] jbd2_journal_start+0x8b/0xba [jbd2]
 [<f8cbda19>] ext4_acquire_dquot+0x44/0x6f [ext4]
 [<c04a1e4e>] dqget+0x210/0x236
 [<c04a3790>] dquot_transfer+0xf8/0x375
 [<c048ef07>] notify_change+0x1b1/0x2ae
 [<c0475cd5>] chown_common+0x8d/0x9f
 [<c0475d6d>] sys_lchown+0x2f/0x40
 [<c044cb0f>] audit_syscall_entry+0x18f/0x1b9
 [<c048c7db>] dput+0x32/0xed
 [<c0407f5f>] do_syscall_trace+0xab/0xb1
 [<c0404f4b>] syscall_call+0x7/0xb
 =======================
INFO: task pdflush:3176 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pdflush       D 000000A6  2196  3176     19          3977  2237 (L-TLB)
       f7bc9e00 00000046 c357abd0 000000a6 f7bc9df4 7fffffff f7bc9f64 0000000a 
       c196baa0 c357e6a7 000000a6 00003ad7 00000000 c196bbac c1806e80 f346e900 
       eb73916c c1807820 00000000 00000000 00000000 0000003f f7bc9e1c ffffffff 
Call Trace:
 [<f8c51a5a>] start_this_handle+0x220/0x337 [jbd2]
 [<c0436b7b>] autoremove_wake_function+0x0/0x2d
 [<f8c51bfc>] jbd2_journal_start+0x8b/0xba [jbd2]
 [<f8cb3680>] ext4_da_writepages+0x2ab/0x4d4 [ext4]
 [<c045edb1>] do_writepages+0x20/0x32
 [<c0495b33>] __writeback_single_inode+0x164/0x2a3
 [<c0495f54>] sync_sb_inodes+0x17c/0x21f
 [<c04961a3>] writeback_inodes+0x6a/0xb0
 [<c045f246>] wb_kupdate+0xaa/0x10f
 [<c045f661>] pdflush+0x0/0x1a3
 [<c045f76c>] pdflush+0x10b/0x1a3
 [<c045f19c>] wb_kupdate+0x0/0x10f
 [<c0436ab7>] kthread+0xc0/0xed
 [<c04369f7>] kthread+0x0/0xed
 [<c0405c87>] kernel_thread_helper+0x7/0x10
 =======================

Comment 1 Boris Ranto 2010-11-25 14:43:12 UTC
Little correction. In 2.6.18-233 on x86_64, the process that is blocked for more than 120 seconds is not fsstress but syslogd:
INFO: task syslogd:10105 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd       D ffffffff801536d0     0 10105      1         10108 10095 (NOTLB)
 ffff810279fb3d88 0000000000000082 ffff810279fb3d98 ffffffff80062ff0
 ffff8101086e24d8 0000000000000009 ffff81027ecf0820 ffff8102871e1860
 000004422d2279e2 0000000000000af3 ffff81027ecf0a08 0000000800005d0d
Call Trace:
 [<ffffffff80062ff0>] thread_return+0x62/0xfe
 [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff800a28a0>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80099840>] process_timeout+0x0/0x5
 [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8002fccc>] __writeback_single_inode+0x1d9/0x318
 [<ffffffff800e2b4d>] do_readv_writev+0x26e/0x291
 [<ffffffff800f5900>] sync_inode+0x24/0x33
 [<ffffffff8804c370>] :ext3:ext3_sync_file+0xcc/0xf8
 [<ffffffff8005030c>] do_fsync+0x52/0xa4
 [<ffffffff800e33d2>] __do_fsync+0x23/0x36
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Comment 2 Boris Ranto 2010-12-21 15:27:32 UTC

*** This bug has been marked as a duplicate of bug 650813 ***