Bug 1623874

Summary:	IO errors on block device post rebooting one brick node
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Prasanna Kumar Kalever <prasanna.kalever>
Component:	core	Assignee:	Xavi Hernandez <jahernan>
Status:	CLOSED ERRATA	QA Contact:	Sweta Anandpara <sanandpa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.4	CC:	amukherj, apaladug, atumball, bgoyal, hchiramm, jahernan, kramdoss, madam, pkarampu, pprakash, prasanna.kalever, rgeorge, rhs-bugs, rtalur, sanandpa, sankarshan, sarumuga, sheggodu, storage-qa-internal, vbellur, xiubli
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.4.z Batch Update 1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.12.2-20	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1623438	Environment:
Last Closed:	2018-10-31 08:46:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1619264, 1623438, 1624698

Comment 11 Pranith Kumar K 2018-09-04 04:20:16 UTC

>--- Additional comment from Prasanna Kumar Kalever on 2018-08-30 16:01:40 IST ---

> # dmesg -T
> [...]
> [Wed Aug 29 14:08:15 2018]  connection6:0: detected conn error (1021)
> [Wed Aug 29 14:08:15 2018]  connection6:0: detected conn error (1021)
> [Wed Aug 29 14:08:20 2018]  session6: session recovery timed out after 5 secs
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: rejecting I/O to offline device
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error > recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: Device offlined - not ready after error > recovery
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: [sdi] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> [Wed Aug 29 14:08:20 2018] sd 38:0:0:0: [sdi] CDB: Write(10) 2a 00 00 00 11 80 00 00 80 00

When a brick is rebooted, I/O stalls for network.ping-timeout which is 42 seconds. Could you explain what is the session recovery time out of 5 seconds in the logs above signify? There was a failover timeout which was 120 seconds. So wondering what this is.

Comment 16 Worker Ant 2018-09-13 14:16:50 UTC

REVISION POSTED: https://review.gluster.org/21170 (socket: set 42 as default tpc-user-timeout) posted (#2) for review on master by Xavi Hernandez

Comment 26 errata-xmlrpc 2018-10-31 08:46:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3432