681439 – [ext4/xfstests] 133 task blocked for more than 120 seconds

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 681439 - [ext4/xfstests] 133 task blocked for more than 120 seconds

Summary: [ext4/xfstests] 133 task blocked for more than 120 seconds

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	s390x
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Josef Bacik
QA Contact:	Eryu Guan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-03-02 07:22 UTC by Eryu Guan
Modified:	2011-05-19 12:43 UTC (History)
CC List:	5 users (show)
Fixed In Version:	kernel-2.6.32-130.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-19 12:43:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch to fix the problem (560 bytes, patch) 2011-03-02 15:44 UTC, Josef Bacik	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0542	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update	2011-05-19 11:58:07 UTC

Description Eryu Guan 2011-03-02 07:22:56 UTC

Description of problem:
System got stuck when testing ext4 with xfstests 133 on s390x. Block size is 1024

Version-Release number of selected component (if applicable):
kernel-2.6.32-118.el6

How reproducible:
Every time (3 tries)

Steps to Reproduce:
1. install rh-tests-kernel-filesystems-xfs-xfstests
2. cd /mnt/tests/kernel/filesystems/xfs/xfstests
3. TEST_PARAM_FSTYPE=ext4 TEST_PARAM_RUNTESTS=133 TEST_PARAM_BLKSIZE=1024 make run
  
Actual results:
 Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc
INFO: task jbd2/loop0-8:59759 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/loop0-8  D 000003c000861038     0 59759      2 0x00000200
000000000bff7a50 00000000010f4e00 000000000bff7a50 000000000bff7a78
       000000001f6cb318 00000000008a5e00 00000000010f4e00 000000001f6cb318
       000000001f6cb318 000000001d3dc540 0000000000000000 000000000080ee98
       00000000008a5e00 000000001d3dc9d8 000000001f6cb2e0 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 000000000bff7ab0 000000000bff7c68
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<000003c000861038>¨ jbd2_journal_commit_transaction+0x1c8/0x1a94 Ýjbd2¨
 Ý<000003c000869562>¨ kjournald2+0xde/0x2c0 Ýjbd2¨
 Ý<000000000016cf48>¨ kthread+0xa4/0xac
 Ý<0000000000109dea>¨ kernel_thread_starter+0x6/0xc
 Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc
INFO: task xfs_io:59991 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfs_io        D 00000000004be222     0 59991  59768 0x00000200
0000000000000000 0000000000000000 0000000000000000 0000000000000400
       0000000000000400 0000000000000001 0000000000000400 0000000000000001
       0000000000000000 0000000000000000 000000001d17d0e0 000000000080ee98
       00000000008a5e00 000000001d17d578 000000001f97e040 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 0000000001a7f9f8 0000000001a7fbb0
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<00000000004be222>¨ __mutex_lock_slowpath+0xa6/0x148
 Ý<00000000004be31e>¨ mutex_lock+0x5a/0x60
 Ý<00000000001effd0>¨ generic_file_aio_write+0x58/0xf4
 Ý<000003c00097e2ae>¨ ext4_file_write+0x7e/0x21c Ýext4¨
 Ý<000000000024fb74>¨ do_sync_write+0xf0/0x154
 Ý<00000000002509c8>¨ vfs_write+0xa0/0x1a0
 Ý<0000000000250b5e>¨ SyS_pwrite64+0x96/0xa8
 Ý<0000000000118644>¨ sysc_tracego+0xe/0x14
 Ý<0000004fb2d2bbfc>¨ 0x4fb2d2bbfc
       00000000010f4e00 00000000010e4e00 00000000000058ae 00000000008b3cf8
       0000000000000000 0000000000000000 000000001ce0c140 000000000080ee98
       00000000008a5e00 000000001ce0c5d8 000000001f97e040 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 000000001f23b890 000000001f23ba48
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<000003c0008606dc>¨ start_this_handle+0x308/0x5e0 Ýjbd2¨
 Ý<000003c000860bcc>¨ jbd2_journal_start+0xd8/0x118 Ýjbd2¨
 Ý<000003c000983d64>¨ ext4_dirty_inode+0x38/0x74 Ýext4¨
 Ý<000000000027b07e>¨ __mark_inode_dirty+0x46/0x198
 Ý<000000000026a6d4>¨ touch_atime+0x138/0x170
 Ý<00000000001f07a0>¨ generic_file_aio_read+0x418/0x7ac
 Ý<000000000024fcc8>¨ do_sync_read+0xf0/0x154
 Ý<0000000000250cbc>¨ vfs_read+0xa0/0x1a0
 Ý<0000000000250ebe>¨ SyS_read+0x5a/0xac
 Ý<0000000000118644>¨ sysc_tracego+0xe/0x14
 Ý<0000020000466460>¨ 0x20000466460
INFO: task jbd2/loop0-8:59759 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/loop0-8  D 000003c000861038     0 59759      2 0x00000200
000000000bff7a50 00000000010f4e00 000000000bff7a50 000000000bff7a78
       000000001f6cb318 00000000008a5e00 00000000010f4e00 000000001f6cb318
       000000001f6cb318 000000001d3dc540 0000000000000000 000000000080ee98
       00000000008a5e00 000000001d3dc9d8 000000001f6cb2e0 00000000010f4e00
       00000000004c6c78 00000000004bcd9e 000000000bff7ab0 000000000bff7c68
Call Trace:
(Ý<00000000004bcd9e>¨ schedule+0x5aa/0xf84)
 Ý<000003c000861038>¨ jbd2_journal_commit_transaction+0x1c8/0x1a94 Ýjbd2¨
 Ý<000003c000869562>¨ kjournald2+0xde/0x2c0 Ýjbd2¨
 Ý<000000000016cf48>¨ kthread+0xa4/0xac
 Ý<0000000000109dea>¨ kernel_thread_starter+0x6/0xc
 Ý<0000000000109de4>¨ kernel_thread_starter+0x0/0xc

Expected results:
Test passed

Additional info:
Here is a failed task in beaker
https://beaker.engineering.redhat.com/recipes/116608
http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/571/57102/116608///console.log

I've also got one stuck on i386 host, but I cannot reproduce it
https://beaker.engineering.redhat.com/recipes/115224
http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/564/56473/115224///console.log

Comment 1 Eric Sandeen 2011-03-02 15:33:25 UTC

Does this seem to be a regression?

Comment 2 Eric Sandeen 2011-03-02 15:35:41 UTC

Also, could you do a sysrq-W to see what all is stuck?

Thanks,
-Eric

Comment 3 Josef Bacik 2011-03-02 15:44:53 UTC

Created attachment 481887 [details]
patch to fix the problem

Could you try with this patch please, it should fix the problem.

Comment 4 Eryu Guan 2011-03-03 06:38:38 UTC

(In reply to comment #1)
> Does this seem to be a regression?

I'm afraid it is a regression. I tried on 6.0 GA kernel (-71) more than 10 times and all went well. 

Also I can reproduce it on i386 host now, and there is no such issue on 6.0 GA kernel as well.

Comment 5 Eryu Guan 2011-03-03 06:39:34 UTC

(In reply to comment #2)
> Also, could you do a sysrq-W to see what all is stuck?
> 
> Thanks,
> -Eric

Here is the sysrq-w output

[root@ibm-z10-25 ~]# echo w > /proc/sysrq-trigger
  .min_vruntime                  : 57350.449026
  .max_vruntime                  : 57350.487409
  .spread                        : 40.038383
  .spread0                       : -139846.319760
  .nr_running                    : 3
  .load                          : 3072
  .nr_spread_over                : 5
  .shares                        : 0

rt_rqÝ1¨:/
  .rt_nr_running                 : 1
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 0.000001

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime
 sum-exec        sum-sleep
--------------------------------------------------------------------------------
--------------------------
     migration/1  5404    195462.980152         2     0    195462.980152
 0.050415         0.000000 /
        events/1  5418     57310.449026        37   120     57310.449026
 2.705469     36153.255552 /
R         xfs_io  5420    640690.437219       449   120    640690.437219    5839
75.535754     31362.931872 /
       rhsmcertd  5424     57350.487409         1   120     57350.487409
 0.114585         0.000000 /

Comment 6 Eryu Guan 2011-03-03 06:41:16 UTC

(In reply to comment #3)
> Created attachment 481887 [details]
> patch to fix the problem
> 
> Could you try with this patch please, it should fix the problem.

Sure, I'll try it and update bz once I get results.

Comment 7 Eryu Guan 2011-03-04 07:37:33 UTC

(In reply to comment #3)
> Created attachment 481887 [details]
> patch to fix the problem
> 
> Could you try with this patch please, it should fix the problem.

It seems to fix the bug, I tried on s390x more than 10 times, no issue found.
I'll try on i386 host as well and update bz later.

Comment 8 Eryu Guan 2011-03-07 03:53:40 UTC

No issue found on i386 host as well. The patch fixed this issue.

Comment 9 RHEL Program Management 2011-03-08 20:00:40 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 11 Aristeu Rozanski 2011-04-07 13:50:21 UTC

Patch(es) available on kernel-2.6.32-130.el6

Comment 14 Eryu Guan 2011-04-11 06:16:11 UTC

Verified on -130 kernel

Ran xfstests 133 on s390x and i386 hosts for more than 50 times in loop, no issue found.

Set it to VERIFIED.

Comment 15 errata-xmlrpc 2011-05-19 12:43:26 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.