Hide Forgot
Created attachment 493086 [details] dmesg and sysrq-w output Description of problem: Reproducer for bug 439917 takes forever to finish. I upload the full log and sysrq-w output. INFO: task fio:5127 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000005 0 5127 5123 0x00000000 ffff8800bab6de58 0000000000000082 ffff8800bab6ddb8 ffffffffa034aa0f ffff8800bab6dee8 ffffffff8117256a ffff8800bab6dde8 ffffffff81050be3 ffff8800bcb16638 ffff8800bab6dfd8 000000000000f598 ffff8800bcb16638 Call Trace: [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs] [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140 [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90 [<ffffffff810571ee>] ? activate_task+0x2e/0x40 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task fio:5128 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000003 0 5128 5123 0x00000000 ffff8800baab3e58 0000000000000086 0000000000000000 ffff8800baab3dd0 ffffffff81103c76 ffff8800bcbc4040 ffff8801a81c0b40 0000000100274b2f ffff8800bcbc45f8 ffff8800baab3fd8 000000000000f598 ffff8800bcbc45f8 Call Trace: [<ffffffff81103c76>] ? __perf_event_task_sched_out+0x36/0x50 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task fio:5131 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000003 0 5131 5123 0x00000000 ffff8800bab75e58 0000000000000082 ffff8800bab75db8 ffffffffa034aa0f ffff8800bab75ee8 ffffffff8117256a ffff8800bab75de8 ffffffff81050be3 ffff8800bb26daf8 ffff8800bab75fd8 000000000000f598 ffff8800bb26daf8 Call Trace: [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs] [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140 [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90 [<ffffffff810571ee>] ? activate_task+0x2e/0x40 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task fio:5132 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000003 0 5132 5123 0x00000000 ffff8800bb129e58 0000000000000086 ffff8800bb129df8 ffffffff81138318 0000000000000062 ffff8800a99656b0 ffff8800bb129e28 0000000000000008 ffff8800baa83078 ffff8800bb129fd8 000000000000f598 ffff8800baa83078 Call Trace: [<ffffffff81138318>] ? handle_mm_fault+0x1d8/0x2a0 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Version-Release number of selected component (if applicable): 2.6.32-131.0.1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Get fio job file from bug 439917 comment 5 2. Run fio on xfs Actual results: task fio: blocked for more than 120 seconds. Expected results: Pass test within 20 minutes Additional info: This is not a regression The same workload will finish within 20 minutes on extN fs.
fio D 0000000000000004 0 18570 18508 0x00000000 ffff880207d6fd18 0000000000000086 ffff880207d6fce0 ffff880207d6fcdc ffff880207d4b038 ffff88022fc24900 ffff880237415f80 0000000103c36f79 ffff880207d4b038 ffff880207d6ffd8 000000000000f598 ffff880207d4b038 Call Trace: [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff8119e692>] generic_file_splice_write+0xa2/0x150 [<ffffffffa03c2499>] xfs_splice_write+0x109/0x290 [xfs] [<ffffffff81211c6b>] ? selinux_file_permission+0xfb/0x150 [<ffffffffa03bd85d>] xfs_file_splice_write+0x4d/0x50 [xfs] [<ffffffff8119e072>] do_splice_from+0x72/0xa0 [<ffffffff8119f3c6>] sys_splice+0x296/0x5b0 [<ffffffff814e0aae>] ? do_page_fault+0x3e/0xa0 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b generic_file_splice_write taking the i_mutex causes a lock order inversion with the xfs_iolock. There needs to be a version that doesn't take locks because the caller already holds the necessary locks.
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Nov 4 22:18:42 ibm-x3655-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 4 22:18:42 ibm-x3655-01 kernel: fio D ffff88048dc94600 0 31711 31709 0x00000080 Nov 4 22:18:42 ibm-x3655-01 kernel: ffff8808780abcb8 0000000000000082 ffff8808868a8b60 ffff8808780abfd8 Nov 4 22:18:42 ibm-x3655-01 kernel: ffff8808780abfd8 ffff8808780abfd8 ffff8808868a8b60 ffff8808868a8b60 Nov 4 22:18:42 ibm-x3655-01 kernel: ffff88087d0748b0 ffffffffffffffff ffff88087d0748b8 0000000000000004 Nov 4 22:18:42 ibm-x3655-01 kernel: Call Trace: Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff81608c49>] schedule+0x29/0x70 Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff8160a725>] rwsem_down_read_failed+0xf5/0x165 Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffffa05d72ac>] ? xfs_file_aio_read+0x11c/0x310 [xfs] Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff812e1d14>] call_rwsem_down_read_failed+0x14/0x30 Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff81608000>] ? down_read+0x20/0x30 Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffffa061d414>] xfs_ilock+0xd4/0xe0 [xfs] Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffffa05d72ac>] xfs_file_aio_read+0x11c/0x310 [xfs] Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff811c564d>] do_sync_read+0x8d/0xd0 Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff811c5d2c>] vfs_read+0x9c/0x170 Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff811c6858>] SyS_read+0x58/0xb0 Nov 4 22:18:42 ibm-x3655-01 kernel: [<ffffffff816137a9>] system_call_fastpath+0x16/0x1b Also be reproduced on RHEL7.1(RHEL-7.1-20141030.n.0 Server x86_64), may this bug be duplicated to RHEL7 for tracker?
Hi Xu, please see bug 824796.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Patch(es) available on kernel-2.6.32-601.el6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-0855.html