697750 – [xfs] concurrent aio/dio got stuck

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 697750 - [xfs] concurrent aio/dio got stuck

Summary: [xfs] concurrent aio/dio got stuck

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Bill O'Donnell
QA Contact:	Zorro Lang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	824796
TreeView+	depends on / blocked

Reported:	2011-04-19 06:01 UTC by Eryu Guan
Modified:	2016-05-10 21:46 UTC (History)
CC List:	7 users (show)
Fixed In Version:	kernel-2.6.32-601.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	824796 (view as bug list)
Environment:
Last Closed:	2016-05-10 21:46:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dmesg and sysrq-w output (178.77 KB, application/octet-stream) 2011-04-19 06:01 UTC, Eryu Guan	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:0855	0	normal	SHIPPED_LIVE	Moderate: kernel security, bug fix, and enhancement update	2016-05-10 22:43:57 UTC

Description Eryu Guan 2011-04-19 06:01:47 UTC

Created attachment 493086 [details]
dmesg and sysrq-w output

Description of problem:
Reproducer for bug 439917 takes forever to finish. I upload the full log and sysrq-w output.

INFO: task fio:5127 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fio           D 0000000000000005     0  5127   5123 0x00000000
 ffff8800bab6de58 0000000000000082 ffff8800bab6ddb8 ffffffffa034aa0f
 ffff8800bab6dee8 ffffffff8117256a ffff8800bab6dde8 ffffffff81050be3
 ffff8800bcb16638 ffff8800bab6dfd8 000000000000f598 ffff8800bcb16638
Call Trace:
 [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs]
 [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140
 [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90
 [<ffffffff810571ee>] ? activate_task+0x2e/0x40
 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
 [<ffffffff81172a96>] sys_lseek+0x66/0x80
 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task fio:5128 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fio           D 0000000000000003     0  5128   5123 0x00000000
 ffff8800baab3e58 0000000000000086 0000000000000000 ffff8800baab3dd0
 ffffffff81103c76 ffff8800bcbc4040 ffff8801a81c0b40 0000000100274b2f
 ffff8800bcbc45f8 ffff8800baab3fd8 000000000000f598 ffff8800bcbc45f8
Call Trace:
 [<ffffffff81103c76>] ? __perf_event_task_sched_out+0x36/0x50
 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
 [<ffffffff81172a96>] sys_lseek+0x66/0x80
 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task fio:5131 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fio           D 0000000000000003     0  5131   5123 0x00000000
 ffff8800bab75e58 0000000000000082 ffff8800bab75db8 ffffffffa034aa0f
 ffff8800bab75ee8 ffffffff8117256a ffff8800bab75de8 ffffffff81050be3
 ffff8800bb26daf8 ffff8800bab75fd8 000000000000f598 ffff8800bb26daf8
Call Trace:
 [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs]
 [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140
 [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90
 [<ffffffff810571ee>] ? activate_task+0x2e/0x40
 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
 [<ffffffff81172a96>] sys_lseek+0x66/0x80
 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task fio:5132 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fio           D 0000000000000003     0  5132   5123 0x00000000
 ffff8800bb129e58 0000000000000086 ffff8800bb129df8 ffffffff81138318
 0000000000000062 ffff8800a99656b0 ffff8800bb129e28 0000000000000008
 ffff8800baa83078 ffff8800bb129fd8 000000000000f598 ffff8800baa83078
Call Trace:
 [<ffffffff81138318>] ? handle_mm_fault+0x1d8/0x2a0
 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
 [<ffffffff81172a96>] sys_lseek+0x66/0x80
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):
2.6.32-131.0.1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Get fio job file from bug 439917 comment 5
2. Run fio on xfs
  
Actual results:
task fio: blocked for more than 120 seconds.

Expected results:
Pass test within 20 minutes

Additional info:
This is not a regression
The same workload will finish within 20 minutes on extN fs.

Comment 1 Dave Chinner 2011-04-19 23:27:51 UTC

fio           D 0000000000000004     0 18570  18508 0x00000000
 ffff880207d6fd18 0000000000000086 ffff880207d6fce0 ffff880207d6fcdc
 ffff880207d4b038 ffff88022fc24900 ffff880237415f80 0000000103c36f79
 ffff880207d4b038 ffff880207d6ffd8 000000000000f598 ffff880207d4b038
Call Trace:
 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
 [<ffffffff8119e692>] generic_file_splice_write+0xa2/0x150
 [<ffffffffa03c2499>] xfs_splice_write+0x109/0x290 [xfs]
 [<ffffffff81211c6b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffffa03bd85d>] xfs_file_splice_write+0x4d/0x50 [xfs]
 [<ffffffff8119e072>] do_splice_from+0x72/0xa0
 [<ffffffff8119f3c6>] sys_splice+0x296/0x5b0
 [<ffffffff814e0aae>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

generic_file_splice_write taking the i_mutex causes a lock order inversion with the xfs_iolock. There needs to be a version that doesn't take locks because the caller already holds the necessary locks.

Comment 2 RHEL Program Management 2011-10-07 15:31:42 UTC

Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 XuWang 2014-11-10 03:25:23 UTC

Nov  4 22:18:42 ibm-x3655-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  4 22:18:42 ibm-x3655-01 kernel: fio             D ffff88048dc94600     0 31711  31709 0x00000080
Nov  4 22:18:42 ibm-x3655-01 kernel: ffff8808780abcb8 0000000000000082 ffff8808868a8b60 ffff8808780abfd8
Nov  4 22:18:42 ibm-x3655-01 kernel: ffff8808780abfd8 ffff8808780abfd8 ffff8808868a8b60 ffff8808868a8b60
Nov  4 22:18:42 ibm-x3655-01 kernel: ffff88087d0748b0 ffffffffffffffff ffff88087d0748b8 0000000000000004
Nov  4 22:18:42 ibm-x3655-01 kernel: Call Trace:
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff81608c49>] schedule+0x29/0x70
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff8160a725>] rwsem_down_read_failed+0xf5/0x165
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffffa05d72ac>] ? xfs_file_aio_read+0x11c/0x310 [xfs]
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff812e1d14>] call_rwsem_down_read_failed+0x14/0x30
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff81608000>] ? down_read+0x20/0x30
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffffa061d414>] xfs_ilock+0xd4/0xe0 [xfs]
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffffa05d72ac>] xfs_file_aio_read+0x11c/0x310 [xfs]
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff811c564d>] do_sync_read+0x8d/0xd0
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff811c5d2c>] vfs_read+0x9c/0x170
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff811c6858>] SyS_read+0x58/0xb0
Nov  4 22:18:42 ibm-x3655-01 kernel: [<ffffffff816137a9>] system_call_fastpath+0x16/0x1b

Also be reproduced on RHEL7.1(RHEL-7.1-20141030.n.0 Server x86_64), may this bug be duplicated to RHEL7 for tracker?

Comment 4 Eryu Guan 2014-11-10 03:35:06 UTC

Hi Xu, please see bug 824796.

Comment 9 RHEL Program Management 2015-12-11 20:40:27 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 11 Aristeu Rozanski 2016-01-14 14:50:43 UTC

Patch(es) available on kernel-2.6.32-601.el6

Comment 17 errata-xmlrpc 2016-05-10 21:46:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-0855.html

Note You need to log in before you can comment on or make changes to this bug.