Bug 681439 - [ext4/xfstests] 133 task blocked for more than 120 seconds
Summary: [ext4/xfstests] 133 task blocked for more than 120 seconds
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: s390x
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Josef Bacik
QA Contact: Eryu Guan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-02 07:22 UTC by Eryu Guan
Modified: 2011-05-19 12:43 UTC (History)
5 users (show)

Fixed In Version: kernel-2.6.32-130.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 12:43:26 UTC
Target Upstream Version:


Attachments (Terms of Use)
patch to fix the problem (560 bytes, patch)
2011-03-02 15:44 UTC, Josef Bacik
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Eryu Guan 2011-03-02 07:22:56 UTC
Description of problem:
System got stuck when testing ext4 with xfstests 133 on s390x. Block size is 1024

Version-Release number of selected component (if applicable):
kernel-2.6.32-118.el6

How reproducible:
Every time (3 tries)

Steps to Reproduce:
1. install rh-tests-kernel-filesystems-xfs-xfstests
2. cd /mnt/tests/kernel/filesystems/xfs/xfstests
3. TEST_PARAM_FSTYPE=ext4 TEST_PARAM_RUNTESTS=133 TEST_PARAM_BLKSIZE=1024 make run
  
Actual results:
 Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc
INFO: task jbd2/loop0-8:59759 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/loop0-8  D 000003c000861038     0 59759      2 0x00000200
000000000bff7a50 00000000010f4e00 000000000bff7a50 000000000bff7a78
       000000001f6cb318 00000000008a5e00 00000000010f4e00 000000001f6cb318
       000000001f6cb318 000000001d3dc540 0000000000000000 000000000080ee98
       00000000008a5e00 000000001d3dc9d8 000000001f6cb2e0 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 000000000bff7ab0 000000000bff7c68
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<000003c000861038>¨ jbd2_journal_commit_transaction+0x1c8/0x1a94 Ýjbd2¨
 Ý<000003c000869562>¨ kjournald2+0xde/0x2c0 Ýjbd2¨
 Ý<000000000016cf48>¨ kthread+0xa4/0xac
 Ý<0000000000109dea>¨ kernel_thread_starter+0x6/0xc
 Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc
INFO: task xfs_io:59991 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfs_io        D 00000000004be222     0 59991  59768 0x00000200
0000000000000000 0000000000000000 0000000000000000 0000000000000400
       0000000000000400 0000000000000001 0000000000000400 0000000000000001
       0000000000000000 0000000000000000 000000001d17d0e0 000000000080ee98
       00000000008a5e00 000000001d17d578 000000001f97e040 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 0000000001a7f9f8 0000000001a7fbb0
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<00000000004be222>¨ __mutex_lock_slowpath+0xa6/0x148
 Ý<00000000004be31e>¨ mutex_lock+0x5a/0x60
 Ý<00000000001effd0>¨ generic_file_aio_write+0x58/0xf4
 Ý<000003c00097e2ae>¨ ext4_file_write+0x7e/0x21c Ýext4¨
 Ý<000000000024fb74>¨ do_sync_write+0xf0/0x154
 Ý<00000000002509c8>¨ vfs_write+0xa0/0x1a0
 Ý<0000000000250b5e>¨ SyS_pwrite64+0x96/0xa8
 Ý<0000000000118644>¨ sysc_tracego+0xe/0x14
 Ý<0000004fb2d2bbfc>¨ 0x4fb2d2bbfc
       00000000010f4e00 00000000010e4e00 00000000000058ae 00000000008b3cf8
       0000000000000000 0000000000000000 000000001ce0c140 000000000080ee98
       00000000008a5e00 000000001ce0c5d8 000000001f97e040 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 000000001f23b890 000000001f23ba48
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<000003c0008606dc>¨ start_this_handle+0x308/0x5e0 Ýjbd2¨
 Ý<000003c000860bcc>¨ jbd2_journal_start+0xd8/0x118 Ýjbd2¨
 Ý<000003c000983d64>¨ ext4_dirty_inode+0x38/0x74 Ýext4¨
 Ý<000000000027b07e>¨ __mark_inode_dirty+0x46/0x198
 Ý<000000000026a6d4>¨ touch_atime+0x138/0x170
 Ý<00000000001f07a0>¨ generic_file_aio_read+0x418/0x7ac
 Ý<000000000024fcc8>¨ do_sync_read+0xf0/0x154
 Ý<0000000000250cbc>¨ vfs_read+0xa0/0x1a0
 Ý<0000000000250ebe>¨ SyS_read+0x5a/0xac
 Ý<0000000000118644>¨ sysc_tracego+0xe/0x14
 Ý<0000020000466460>¨ 0x20000466460
INFO: task jbd2/loop0-8:59759 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/loop0-8  D 000003c000861038     0 59759      2 0x00000200
000000000bff7a50 00000000010f4e00 000000000bff7a50 000000000bff7a78
       000000001f6cb318 00000000008a5e00 00000000010f4e00 000000001f6cb318
       000000001f6cb318 000000001d3dc540 0000000000000000 000000000080ee98
       00000000008a5e00 000000001d3dc9d8 000000001f6cb2e0 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 000000000bff7ab0 000000000bff7c68
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<000003c000861038>¨ jbd2_journal_commit_transaction+0x1c8/0x1a94 Ýjbd2¨
 Ý<000003c000869562>¨ kjournald2+0xde/0x2c0 Ýjbd2¨
 Ý<000000000016cf48>¨ kthread+0xa4/0xac
 Ý<0000000000109dea>¨ kernel_thread_starter+0x6/0xc
 Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc

Expected results:
Test passed

Additional info:
Here is a failed task in beaker
https://beaker.engineering.redhat.com/recipes/116608
http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/571/57102/116608///console.log

I've also got one stuck on i386 host, but I cannot reproduce it
https://beaker.engineering.redhat.com/recipes/115224
http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/564/56473/115224///console.log

Comment 1 Eric Sandeen 2011-03-02 15:33:25 UTC
Does this seem to be a regression?

Comment 2 Eric Sandeen 2011-03-02 15:35:41 UTC
Also, could you do a sysrq-W to see what all is stuck?

Thanks,
-Eric

Comment 3 Josef Bacik 2011-03-02 15:44:53 UTC
Created attachment 481887 [details]
patch to fix the problem

Could you try with this patch please, it should fix the problem.

Comment 4 Eryu Guan 2011-03-03 06:38:38 UTC
(In reply to comment #1)
> Does this seem to be a regression?

I'm afraid it is a regression. I tried on 6.0 GA kernel (-71) more than 10 times and all went well. 

Also I can reproduce it on i386 host now, and there is no such issue on 6.0 GA kernel as well.

Comment 5 Eryu Guan 2011-03-03 06:39:34 UTC
(In reply to comment #2)
> Also, could you do a sysrq-W to see what all is stuck?
> 
> Thanks,
> -Eric

Here is the sysrq-w output

[root@ibm-z10-25 ~]# echo w > /proc/sysrq-trigger
  .min_vruntime                  : 57350.449026
  .max_vruntime                  : 57350.487409
  .spread                        : 40.038383
  .spread0                       : -139846.319760
  .nr_running                    : 3
  .load                          : 3072
  .nr_spread_over                : 5
  .shares                        : 0

rt_rqÝ1¨:/
  .rt_nr_running                 : 1
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 0.000001

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime
 sum-exec        sum-sleep
--------------------------------------------------------------------------------
--------------------------
     migration/1  5404    195462.980152         2     0    195462.980152
 0.050415         0.000000 /
        events/1  5418     57310.449026        37   120     57310.449026
 2.705469     36153.255552 /
R         xfs_io  5420    640690.437219       449   120    640690.437219    5839
75.535754     31362.931872 /
       rhsmcertd  5424     57350.487409         1   120     57350.487409
 0.114585         0.000000 /

Comment 6 Eryu Guan 2011-03-03 06:41:16 UTC
(In reply to comment #3)
> Created attachment 481887 [details]
> patch to fix the problem
> 
> Could you try with this patch please, it should fix the problem.

Sure, I'll try it and update bz once I get results.

Comment 7 Eryu Guan 2011-03-04 07:37:33 UTC
(In reply to comment #3)
> Created attachment 481887 [details]
> patch to fix the problem
> 
> Could you try with this patch please, it should fix the problem.

It seems to fix the bug, I tried on s390x more than 10 times, no issue found.
I'll try on i386 host as well and update bz later.

Comment 8 Eryu Guan 2011-03-07 03:53:40 UTC
No issue found on i386 host as well. The patch fixed this issue.

Comment 9 RHEL Program Management 2011-03-08 20:00:40 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 11 Aristeu Rozanski 2011-04-07 13:50:21 UTC
Patch(es) available on kernel-2.6.32-130.el6

Comment 14 Eryu Guan 2011-04-11 06:16:11 UTC
Verified on -130 kernel

Ran xfstests 133 on s390x and i386 hosts for more than 50 times in loop, no issue found.

Set it to VERIFIED.

Comment 15 errata-xmlrpc 2011-05-19 12:43:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.