Bug 1623874

Summary: IO errors on block device post rebooting one brick node
Product: Red Hat Gluster Storage Reporter: Prasanna Kumar Kalever <prasanna.kalever>
Component: coreAssignee: Xavi Hernandez <jahernan>
Status: CLOSED ERRATA QA Contact: Sweta Anandpara <sanandpa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, apaladug, atumball, bgoyal, hchiramm, jahernan, kramdoss, madam, pkarampu, pprakash, prasanna.kalever, rgeorge, rhs-bugs, rtalur, sanandpa, sankarshan, sarumuga, sheggodu, storage-qa-internal, vbellur, xiubli
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-20 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1623438 Environment:
Last Closed: 2018-10-31 08:46:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1619264, 1623438, 1624698    

Comment 11 Pranith Kumar K 2018-09-04 04:20:16 UTC
>--- Additional comment from Prasanna Kumar Kalever on 2018-08-30 16:01:40 IST ---

> # dmesg -T
> [...]
> [Wed Aug 29 14:08:15 2018]  connection6:0: detected conn error (1021)
> [Wed Aug 29 14:08:15 2018]  connection6:0: detected conn error (1021)
> [Wed Aug 29 14:08:20 2018]  session6: session recovery timed out after 5 secs
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: rejecting I/O to offline device
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error > recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error > recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: [sdi] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: [sdi] CDB: Write(10) 2a 00 00 00 11 80 00 00 80 00

When a brick is rebooted, I/O stalls for network.ping-timeout which is 42 seconds. Could you explain what is the session recovery time out of 5 seconds in the logs above signify? There was a failover timeout which was 120 seconds. So wondering what this is.

Comment 16 Worker Ant 2018-09-13 14:16:50 UTC
REVISION POSTED: https://review.gluster.org/21170 (socket: set 42 as default tpc-user-timeout) posted (#2) for review on master by Xavi Hernandez

Comment 26 errata-xmlrpc 2018-10-31 08:46:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3432