Bug 1495579 - [iSCSI]: Kernel hang "INFO: task wb_throttle:13677 blocked for more than 120 seconds."
Summary: [iSCSI]: Kernel hang "INFO: task wb_throttle:13677 blocked for more than 120 ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: iSCSI
Version: 3.0
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: rc
: 3.0
Assignee: Mike Christie
QA Contact: Tejas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-26 10:37 UTC by Tejas
Modified: 2017-09-29 03:18 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-29 03:18:00 UTC
Embargoed:


Attachments (Terms of Use)

Description Tejas 2017-09-26 10:37:20 UTC
Description of problem:
     Mapped 10 LUNs to a RHEL initiator. Started dd to all the 10 LUNS on a 4 GW setup. Seeing a kernel hang on one of the GWs(catalina) only.
Logs below.
We are seeing login failures on all the GWs:
 5255.122834] Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:ceph-igw
[ 5255.123787] iSCSI Login negotiation failed.

Version-Release number of selected component (if applicable):
3.10.0-714.el7.test.x86_64 #1 SMP Mon Sep 25 01:00:30 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

System still in the same state.

Additional info:

5521.099080] INFO: task wb_throttle:13677 blocked for more than 120 seconds.
[ 5521.099107] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5521.099131] wb_throttle     D ffff88203a635ee0     0 13677      1 0x00000080
[ 5521.099134]  ffff8820273cfc00 0000000000000086 ffff8820273cffd8 ffff8820273cffd8
[ 5521.099136]  ffff8820273cffd8 0000000000016cc0 ffff880169478fd0 ffff88103e696cc0
[ 5521.099137]  0000000000000000 7fffffffffffffff ffff88107ff8f2e8 ffffffff816b0c70
[ 5521.099139] Call Trace:
[ 5521.099146]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099148]  [<ffffffff816b29d9>] schedule+0x29/0x70
[ 5521.099149]  [<ffffffff816b0659>] schedule_timeout+0x239/0x2c0
[ 5521.099153]  [<ffffffff81183701>] ? find_get_pages_tag+0xe1/0x1a0
[ 5521.099156]  [<ffffffff810e9c1c>] ? ktime_get_ts64+0x4c/0xf0
[ 5521.099158]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099159]  [<ffffffff816b21cd>] io_schedule_timeout+0xad/0x130
[ 5521.099161]  [<ffffffff816b2268>] io_schedule+0x18/0x20
[ 5521.099162]  [<ffffffff816b0c81>] bit_wait_io+0x11/0x50
[ 5521.099164]  [<ffffffff816b07a5>] __wait_on_bit+0x65/0x90
[ 5521.099166]  [<ffffffff811824c1>] wait_on_page_bit+0x81/0xa0
[ 5521.099169]  [<ffffffff810b1d10>] ? wake_bit_function+0x40/0x40
[ 5521.099170]  [<ffffffff811825f1>] __filemap_fdatawait_range+0x111/0x190
[ 5521.099172]  [<ffffffff81182684>] filemap_fdatawait_range+0x14/0x30
[ 5521.099174]  [<ffffffff81184606>] filemap_write_and_wait_range+0x56/0x90
[ 5521.099208]  [<ffffffffc0367e36>] xfs_file_fsync+0x66/0x1e0 [xfs]
[ 5521.099211]  [<ffffffff81234c05>] do_fsync+0x65/0xa0
[ 5521.099212]  [<ffffffff81234f03>] SyS_fdatasync+0x13/0x20
[ 5521.099214]  [<ffffffff816be5c9>] system_call_fastpath+0x16/0x1b

5521.099216] INFO: task filestore_sync:13678 blocked for more than 120 seconds.
[ 5521.099241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5521.099269] filestore_sync  D ffff88203a631fa0     0 13678      1 0x00000080
[ 5521.099271]  ffff882027b8fbf0 0000000000000086 ffff882027b8ffd8 ffff882027b8ffd8
[ 5521.099272]  ffff882027b8ffd8 0000000000016cc0 ffff8810e9f72f70 ffff88203ded6cc0
[ 5521.099274]  0000000000000000 7fffffffffffffff ffff88107ff8f2e8 ffffffff816b0c70
[ 5521.099275] Call Trace:
[ 5521.099277]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099279]  [<ffffffff816b29d9>] schedule+0x29/0x70
[ 5521.099281]  [<ffffffff816b0659>] schedule_timeout+0x239/0x2c0
[ 5521.099282]  [<ffffffff810e9c1c>] ? ktime_get_ts64+0x4c/0xf0
[ 5521.099285]  [<ffffffff8113dbbf>] ? delayacct_end+0x8f/0xb0
[ 5521.099286]  [<ffffffff810e9c1c>] ? ktime_get_ts64+0x4c/0xf0
[ 5521.099288]  [<ffffffff816b0c70>] ? bit_wait+0x50/0x50
[ 5521.099290]  [<ffffffff816b21cd>] io_schedule_timeout+0xad/0x130
[ 5521.099291]  [<ffffffff816b2268>] io_schedule+0x18/0x20
[ 5521.099293]  [<ffffffff816b0c81>] bit_wait_io+0x11/0x50
[ 5521.099295]  [<ffffffff816b07a5>] __wait_on_bit+0x65/0x90
[ 5521.099296]  [<ffffffff811824c1>] wait_on_page_bit+0x81/0xa0
[ 5521.099298]  [<ffffffff810b1d10>] ? wake_bit_function+0x40/0x40
[ 5521.099300]  [<ffffffff811825f1>] __filemap_fdatawait_range+0x111/0x190
[ 5521.099302]  [<ffffffff81185427>] filemap_fdatawait_keep_errors+0x27/0x30
[ 5521.099304]  [<ffffffff8122f46d>] sync_inodes_sb+0x16d/0x1f0
[ 5521.099305]  [<ffffffff81234cd3>] sync_filesystem+0x63/0xb0
[ 5521.099307]  [<ffffffff81234e84>] SyS_syncfs+0x54/0xa0
[ 5521.099308]  [<ffffffff816be5c9>] system_call_fastpath+0x16/0x1b
[ 5533.389313] tcmu: command timed out waiting for ring space.

 5558.442398] iSCSI Login negotiation failed.
[ 5563.019294] sd 0:0:3:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5563.019310] sd 0:0:3:0: [sdc] tag#1 Sense Key : Medium Error [current]
[ 5563.019316] sd 0:0:3:0: [sdc] tag#1 Add. Sense: Write error - auto reallocation failed
[ 5563.019323] sd 0:0:3:0: [sdc] tag#1 CDB: Write(10) 2a 00 23 e3 cd 88 00 02 00 00
[ 5563.019328] blk_update_request: I/O error, dev sdc, sector 602131851
[ 5565.493721] sd 0:0:3:0: [sdc] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5565.493726] sd 0:0:3:0: [sdc] tag#11 Sense Key : Medium Error [current]
[ 5565.493728] sd 0:0:3:0: [sdc] tag#11 Add. Sense: Write error - auto reallocation failed
[ 5565.493730] sd 0:0:3:0: [sdc] tag#11 CDB: Write(10) 2a 00 23 e3 cb 88 00 02 00 00
[ 5565.493732] blk_update_request: I/O error, dev sdc, sector 602131336
[ 5570.364858] sd 0:0:3:0: [sdc] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5570.364871] sd 0:0:3:0: [sdc] tag#22 Sense Key : Medium Error [current]
[ 5570.364876] sd 0:0:3:0: [sdc] tag#22 Add. Sense: Write error - auto reallocation failed
[ 5570.364882] sd 0:0:3:0: [sdc] tag#22 CDB: Write(10) 2a 00 23 e3 c9 88 00 02 00 00
[ 5570.364886] blk_update_request: I/O error, dev sdc, sector 602131009
[ 5585.607515] sd 0:0:3:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5585.607530] sd 0:0:3:0: [sdc] tag#0 Sense Key : Medium Error [current]
[ 5585.607537] sd 0:0:3:0: [sdc] tag#0 Add. Sense: Write error - auto reallocation failed
[ 5585.607543] sd 0:0:3:0: [sdc] tag#0 CDB: Write(10) 2a 00 23 e3 d3 88 00 02 00 00
[ 5585.607548] blk_update_request: I/O error, dev sdc, sector 602133690
[ 5585.607609] Buffer I/O error on dev dm-3, logical block 73951665, lost async page write
[ 5585.607636] Buffer I/O error on dev dm-3, logical block 73951666, lost async page write
[ 5585.607667] Buffer I/O error on dev dm-3, logical block 73951667, lost async page write
[ 5585.607724] Buffer I/O error on dev dm-3, logical block 73951668, lost async page write

Comment 3 Mike Christie 2017-09-27 18:42:32 UTC
It looks like we are working as expected.

It looks like /dev/sdc is starting to go bad. sdc is used by OSDs, so when the iscsi target tries to do IO, it will hang and timeout. This leads to the target not allowing relogins until we can recover (clean up IO, reopen the image, etc).

The hung task warnings are from IOs to the bad disk and are expected.

Comment 4 Mike Christie 2017-09-27 18:45:34 UTC
Tejas,

Do you want to keep this one open or can I close it?

Comment 5 Tejas 2017-09-29 03:18:00 UTC
Mike,

   We can close it, since its expected.


Note You need to log in before you can comment on or make changes to this bug.