Bug 1623874 - IO errors on block device post rebooting one brick node
Summary: IO errors on block device post rebooting one brick node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.4.z Batch Update 1
Assignee: Xavi Hernandez
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks: 1623438 1619264 1624698
TreeView+ depends on / blocked
 
Reported: 2018-08-30 11:20 UTC by Prasanna Kumar Kalever
Modified: 2018-10-31 08:47 UTC (History)
21 users (show)

Fixed In Version: glusterfs-3.12.2-20
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1623438
Environment:
Last Closed: 2018-10-31 08:46:14 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:3432 None None None 2018-10-31 08:47:58 UTC

Comment 11 Pranith Kumar K 2018-09-04 04:20:16 UTC
>--- Additional comment from Prasanna Kumar Kalever on 2018-08-30 16:01:40 IST ---

> # dmesg -T
> [...]
> [Wed Aug 29 14:08:15 2018]  connection6:0: detected conn error (1021)
> [Wed Aug 29 14:08:15 2018]  connection6:0: detected conn error (1021)
> [Wed Aug 29 14:08:20 2018]  session6: session recovery timed out after 5 secs
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: rejecting I/O to offline device
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error > recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error > recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: [sdi] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: [sdi] CDB: Write(10) 2a 00 00 00 11 80 00 00 80 00

When a brick is rebooted, I/O stalls for network.ping-timeout which is 42 seconds. Could you explain what is the session recovery time out of 5 seconds in the logs above signify? There was a failover timeout which was 120 seconds. So wondering what this is.

Comment 16 Worker Ant 2018-09-13 14:16:50 UTC
REVISION POSTED: https://review.gluster.org/21170 (socket: set 42 as default tpc-user-timeout) posted (#2) for review on master by Xavi Hernandez

Comment 26 errata-xmlrpc 2018-10-31 08:46:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3432


Note You need to log in before you can comment on or make changes to this bug.