Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1495579

Summary: [iSCSI]: Kernel hang "INFO: task wb_throttle:13677 blocked for more than 120 seconds."
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tejas <tchandra>
Component: iSCSIAssignee: Mike Christie <mchristi>
Status: CLOSED NOTABUG QA Contact: Tejas <tchandra>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.0CC: ceph-eng-bugs, ceph-qe-bugs, jdillama, tchandra
Target Milestone: rc   
Target Release: 3.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-29 03:18:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tejas 2017-09-26 10:37:20 UTC
Description of problem:
     Mapped 10 LUNs to a RHEL initiator. Started dd to all the 10 LUNS on a 4 GW setup. Seeing a kernel hang on one of the GWs(catalina) only.
Logs below.
We are seeing login failures on all the GWs:
 5255.122834] Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:ceph-igw
[ 5255.123787] iSCSI Login negotiation failed.

Version-Release number of selected component (if applicable):
3.10.0-714.el7.test.x86_64 #1 SMP Mon Sep 25 01:00:30 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

System still in the same state.

Additional info:

5521.099080] INFO: task wb_throttle:13677 blocked for more than 120 seconds.
[ 5521.099107] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5521.099131] wb_throttle     D ffff88203a635ee0     0 13677      1 0x00000080
[ 5521.099134]  ffff8820273cfc00 0000000000000086 ffff8820273cffd8 ffff8820273cffd8
[ 5521.099136]  ffff8820273cffd8 0000000000016cc0 ffff880169478fd0 ffff88103e696cc0
[ 5521.099137]  0000000000000000 7fffffffffffffff ffff88107ff8f2e8 ffffffff816b0c70
[ 5521.099139] Call Trace:
[ 5521.099146]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099148]  [<ffffffff816b29d9>] schedule+0x29/0x70
[ 5521.099149]  [<ffffffff816b0659>] schedule_timeout+0x239/0x2c0
[ 5521.099153]  [<ffffffff81183701>] ? find_get_pages_tag+0xe1/0x1a0
[ 5521.099156]  [<ffffffff810e9c1c>] ? ktime_get_ts64+0x4c/0xf0
[ 5521.099158]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099159]  [<ffffffff816b21cd>] io_schedule_timeout+0xad/0x130
[ 5521.099161]  [<ffffffff816b2268>] io_schedule+0x18/0x20
[ 5521.099162]  [<ffffffff816b0c81>] bit_wait_io+0x11/0x50
[ 5521.099164]  [<ffffffff816b07a5>] __wait_on_bit+0x65/0x90
[ 5521.099166]  [<ffffffff811824c1>] wait_on_page_bit+0x81/0xa0
[ 5521.099169]  [<ffffffff810b1d10>] ? wake_bit_function+0x40/0x40
[ 5521.099170]  [<ffffffff811825f1>] __filemap_fdatawait_range+0x111/0x190
[ 5521.099172]  [<ffffffff81182684>] filemap_fdatawait_range+0x14/0x30
[ 5521.099174]  [<ffffffff81184606>] filemap_write_and_wait_range+0x56/0x90
[ 5521.099208]  [<ffffffffc0367e36>] xfs_file_fsync+0x66/0x1e0 [xfs]
[ 5521.099211]  [<ffffffff81234c05>] do_fsync+0x65/0xa0
[ 5521.099212]  [<ffffffff81234f03>] SyS_fdatasync+0x13/0x20
[ 5521.099214]  [<ffffffff816be5c9>] system_call_fastpath+0x16/0x1b

5521.099216] INFO: task filestore_sync:13678 blocked for more than 120 seconds.
[ 5521.099241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5521.099269] filestore_sync  D ffff88203a631fa0     0 13678      1 0x00000080
[ 5521.099271]  ffff882027b8fbf0 0000000000000086 ffff882027b8ffd8 ffff882027b8ffd8
[ 5521.099272]  ffff882027b8ffd8 0000000000016cc0 ffff8810e9f72f70 ffff88203ded6cc0
[ 5521.099274]  0000000000000000 7fffffffffffffff ffff88107ff8f2e8 ffffffff816b0c70
[ 5521.099275] Call Trace:
[ 5521.099277]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099279]  [<ffffffff816b29d9>] schedule+0x29/0x70
[ 5521.099281]  [<ffffffff816b0659>] schedule_timeout+0x239/0x2c0
[ 5521.099282]  [<ffffffff810e9c1c>] ? ktime_get_ts64+0x4c/0xf0
[ 5521.099285]  [<ffffffff8113dbbf>] ? delayacct_end+0x8f/0xb0
[ 5521.099286]  [<ffffffff810e9c1c>] ? ktime_get_ts64+0x4c/0xf0
[ 5521.099288]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099290]  [<ffffffff816b21cd>] io_schedule_timeout+0xad/0x130
[ 5521.099291]  [<ffffffff816b2268>] io_schedule+0x18/0x20
[ 5521.099293]  [<ffffffff816b0c81>] bit_wait_io+0x11/0x50
[ 5521.099295]  [<ffffffff816b07a5>] __wait_on_bit+0x65/0x90
[ 5521.099296]  [<ffffffff811824c1>] wait_on_page_bit+0x81/0xa0
[ 5521.099298]  [<ffffffff810b1d10>] ? wake_bit_function+0x40/0x40
[ 5521.099300]  [<ffffffff811825f1>] __filemap_fdatawait_range+0x111/0x190
[ 5521.099302]  [<ffffffff81185427>] filemap_fdatawait_keep_errors+0x27/0x30
[ 5521.099304]  [<ffffffff8122f46d>] sync_inodes_sb+0x16d/0x1f0
[ 5521.099305]  [<ffffffff81234cd3>] sync_filesystem+0x63/0xb0
[ 5521.099307]  [<ffffffff81234e84>] SyS_syncfs+0x54/0xa0
[ 5521.099308]  [<ffffffff816be5c9>] system_call_fastpath+0x16/0x1b
[ 5533.389313] tcmu: command timed out waiting for ring space.

 5558.442398] iSCSI Login negotiation failed.
[ 5563.019294] sd 0:0:3:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5563.019310] sd 0:0:3:0: [sdc] tag#1 Sense Key : Medium Error [current]
[ 5563.019316] sd 0:0:3:0: [sdc] tag#1 Add. Sense: Write error - auto reallocation failed
[ 5563.019323] sd 0:0:3:0: [sdc] tag#1 CDB: Write(10) 2a 00 23 e3 cd 88 00 02 00 00
[ 5563.019328] blk_update_request: I/O error, dev sdc, sector 602131851
[ 5565.493721] sd 0:0:3:0: [sdc] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5565.493726] sd 0:0:3:0: [sdc] tag#11 Sense Key : Medium Error [current]
[ 5565.493728] sd 0:0:3:0: [sdc] tag#11 Add. Sense: Write error - auto reallocation failed
[ 5565.493730] sd 0:0:3:0: [sdc] tag#11 CDB: Write(10) 2a 00 23 e3 cb 88 00 02 00 00
[ 5565.493732] blk_update_request: I/O error, dev sdc, sector 602131336
[ 5570.364858] sd 0:0:3:0: [sdc] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5570.364871] sd 0:0:3:0: [sdc] tag#22 Sense Key : Medium Error [current]
[ 5570.364876] sd 0:0:3:0: [sdc] tag#22 Add. Sense: Write error - auto reallocation failed
[ 5570.364882] sd 0:0:3:0: [sdc] tag#22 CDB: Write(10) 2a 00 23 e3 c9 88 00 02 00 00
[ 5570.364886] blk_update_request: I/O error, dev sdc, sector 602131009
[ 5585.607515] sd 0:0:3:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5585.607530] sd 0:0:3:0: [sdc] tag#0 Sense Key : Medium Error [current]
[ 5585.607537] sd 0:0:3:0: [sdc] tag#0 Add. Sense: Write error - auto reallocation failed
[ 5585.607543] sd 0:0:3:0: [sdc] tag#0 CDB: Write(10) 2a 00 23 e3 d3 88 00 02 00 00
[ 5585.607548] blk_update_request: I/O error, dev sdc, sector 602133690
[ 5585.607609] Buffer I/O error on dev dm-3, logical block 73951665, lost async page write
[ 5585.607636] Buffer I/O error on dev dm-3, logical block 73951666, lost async page write
[ 5585.607667] Buffer I/O error on dev dm-3, logical block 73951667, lost async page write
[ 5585.607724] Buffer I/O error on dev dm-3, logical block 73951668, lost async page write

Comment 3 Mike Christie 2017-09-27 18:42:32 UTC
It looks like we are working as expected.

It looks like /dev/sdc is starting to go bad. sdc is used by OSDs, so when the iscsi target tries to do IO, it will hang and timeout. This leads to the target not allowing relogins until we can recover (clean up IO, reopen the image, etc).

The hung task warnings are from IOs to the bad disk and are expected.

Comment 4 Mike Christie 2017-09-27 18:45:34 UTC
Tejas,

Do you want to keep this one open or can I close it?

Comment 5 Tejas 2017-09-29 03:18:00 UTC
Mike,

   We can close it, since its expected.