Hide Forgot
Description of problem: System got stuck when testing ext4 with xfstests 133 on s390x. Block size is 1024 Version-Release number of selected component (if applicable): kernel-2.6.32-118.el6 How reproducible: Every time (3 tries) Steps to Reproduce: 1. install rh-tests-kernel-filesystems-xfs-xfstests 2. cd /mnt/tests/kernel/filesystems/xfs/xfstests 3. TEST_PARAM_FSTYPE=ext4 TEST_PARAM_RUNTESTS=133 TEST_PARAM_BLKSIZE=1024 make run Actual results: Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc INFO: task jbd2/loop0-8:59759 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. jbd2/loop0-8 D 000003c000861038 0 59759 2 0x00000200 000000000bff7a50 00000000010f4e00 000000000bff7a50 000000000bff7a78 000000001f6cb318 00000000008a5e00 00000000010f4e00 000000001f6cb318 000000001f6cb318 000000001d3dc540 0000000000000000 000000000080ee98 00000000008a5e00 000000001d3dc9d8 000000001f6cb2e0 00000000010f4e00 00000000004c6c78 00000000004bcd9e 000000000bff7ab0 000000000bff7c68 Call Trace: (Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84) Ý<000003c000861038>¨ jbd2_journal_commit_transaction+0x1c8/0x1a94 Ýjbd2¨ Ý<000003c000869562>¨ kjournald2+0xde/0x2c0 Ýjbd2¨ Ý<000000000016cf48>¨ kthread+0xa4/0xac Ý<0000000000109dea>¨ kernel_thread_starter+0x6/0xc Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc INFO: task xfs_io:59991 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. xfs_io D 00000000004be222 0 59991 59768 0x00000200 0000000000000000 0000000000000000 0000000000000000 0000000000000400 0000000000000400 0000000000000001 0000000000000400 0000000000000001 0000000000000000 0000000000000000 000000001d17d0e0 000000000080ee98 00000000008a5e00 000000001d17d578 000000001f97e040 00000000010f4e00 00000000004c6c78 00000000004bcd9e 0000000001a7f9f8 0000000001a7fbb0 Call Trace: (Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84) Ý<00000000004be222>¨ __mutex_lock_slowpath+0xa6/0x148 Ý<00000000004be31e>¨ mutex_lock+0x5a/0x60 Ý<00000000001effd0>¨ generic_file_aio_write+0x58/0xf4 Ý<000003c00097e2ae>¨ ext4_file_write+0x7e/0x21c Ýext4¨ Ý<000000000024fb74>¨ do_sync_write+0xf0/0x154 Ý<00000000002509c8>¨ vfs_write+0xa0/0x1a0 Ý<0000000000250b5e>¨ SyS_pwrite64+0x96/0xa8 Ý<0000000000118644>¨ sysc_tracego+0xe/0x14 Ý<0000004fb2d2bbfc>¨ 0x4fb2d2bbfc 00000000010f4e00 00000000010e4e00 00000000000058ae 00000000008b3cf8 0000000000000000 0000000000000000 000000001ce0c140 000000000080ee98 00000000008a5e00 000000001ce0c5d8 000000001f97e040 00000000010f4e00 00000000004c6c78 00000000004bcd9e 000000001f23b890 000000001f23ba48 Call Trace: (Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84) Ý<000003c0008606dc>¨ start_this_handle+0x308/0x5e0 Ýjbd2¨ Ý<000003c000860bcc>¨ jbd2_journal_start+0xd8/0x118 Ýjbd2¨ Ý<000003c000983d64>¨ ext4_dirty_inode+0x38/0x74 Ýext4¨ Ý<000000000027b07e>¨ __mark_inode_dirty+0x46/0x198 Ý<000000000026a6d4>¨ touch_atime+0x138/0x170 Ý<00000000001f07a0>¨ generic_file_aio_read+0x418/0x7ac Ý<000000000024fcc8>¨ do_sync_read+0xf0/0x154 Ý<0000000000250cbc>¨ vfs_read+0xa0/0x1a0 Ý<0000000000250ebe>¨ SyS_read+0x5a/0xac Ý<0000000000118644>¨ sysc_tracego+0xe/0x14 Ý<0000020000466460>¨ 0x20000466460 INFO: task jbd2/loop0-8:59759 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. jbd2/loop0-8 D 000003c000861038 0 59759 2 0x00000200 000000000bff7a50 00000000010f4e00 000000000bff7a50 000000000bff7a78 000000001f6cb318 00000000008a5e00 00000000010f4e00 000000001f6cb318 000000001f6cb318 000000001d3dc540 0000000000000000 000000000080ee98 00000000008a5e00 000000001d3dc9d8 000000001f6cb2e0 00000000010f4e00 00000000004c6c78 00000000004bcd9e 000000000bff7ab0 000000000bff7c68 Call Trace: (Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84) Ý<000003c000861038>¨ jbd2_journal_commit_transaction+0x1c8/0x1a94 Ýjbd2¨ Ý<000003c000869562>¨ kjournald2+0xde/0x2c0 Ýjbd2¨ Ý<000000000016cf48>¨ kthread+0xa4/0xac Ý<0000000000109dea>¨ kernel_thread_starter+0x6/0xc Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc Expected results: Test passed Additional info: Here is a failed task in beaker https://beaker.engineering.redhat.com/recipes/116608 http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/571/57102/116608///console.log I've also got one stuck on i386 host, but I cannot reproduce it https://beaker.engineering.redhat.com/recipes/115224 http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/564/56473/115224///console.log
Does this seem to be a regression?
Also, could you do a sysrq-W to see what all is stuck? Thanks, -Eric
Created attachment 481887 [details] patch to fix the problem Could you try with this patch please, it should fix the problem.
(In reply to comment #1) > Does this seem to be a regression? I'm afraid it is a regression. I tried on 6.0 GA kernel (-71) more than 10 times and all went well. Also I can reproduce it on i386 host now, and there is no such issue on 6.0 GA kernel as well.
(In reply to comment #2) > Also, could you do a sysrq-W to see what all is stuck? > > Thanks, > -Eric Here is the sysrq-w output [root@ibm-z10-25 ~]# echo w > /proc/sysrq-trigger .min_vruntime : 57350.449026 .max_vruntime : 57350.487409 .spread : 40.038383 .spread0 : -139846.319760 .nr_running : 3 .load : 3072 .nr_spread_over : 5 .shares : 0 rt_rqÝ1¨:/ .rt_nr_running : 1 .rt_throttled : 0 .rt_time : 0.000000 .rt_runtime : 0.000001 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep -------------------------------------------------------------------------------- -------------------------- migration/1 5404 195462.980152 2 0 195462.980152 0.050415 0.000000 / events/1 5418 57310.449026 37 120 57310.449026 2.705469 36153.255552 / R xfs_io 5420 640690.437219 449 120 640690.437219 5839 75.535754 31362.931872 / rhsmcertd 5424 57350.487409 1 120 57350.487409 0.114585 0.000000 /
(In reply to comment #3) > Created attachment 481887 [details] > patch to fix the problem > > Could you try with this patch please, it should fix the problem. Sure, I'll try it and update bz once I get results.
(In reply to comment #3) > Created attachment 481887 [details] > patch to fix the problem > > Could you try with this patch please, it should fix the problem. It seems to fix the bug, I tried on s390x more than 10 times, no issue found. I'll try on i386 host as well and update bz later.
No issue found on i386 host as well. The patch fixed this issue.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available on kernel-2.6.32-130.el6
Verified on -130 kernel Ran xfstests 133 on s390x and i386 hosts for more than 50 times in loop, no issue found. Set it to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html