Bug 1285200

Summary:	Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical result out of range.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Aravinda VK <avishwan>
Component:	geo-replication	Assignee:	Aravinda VK <avishwan>
Status:	CLOSED ERRATA	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rhgs-3.1	CC:	asrivast, avishwan, chrisw, csaba, david.macdonald, nlevinki, rhinduja, rhs-bugs, rwheeler, sankarshan, smohan, storage-qa-internal, vkoppad
Target Milestone:	---	Keywords:	EasyFix, ZStream
Target Release:	RHGS 3.1.3
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.7.9-1	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1026780
Clones:	1294588 (view as bug list)		Environment:
Last Closed:	2016-06-23 04:57:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1026780
Bug Blocks:	1294588, 1299184, 1313311

Comment 3 Aravinda VK 2015-12-29 05:28:57 UTC

llistxattr is two syscall instead of one

SIZE = llistxattr(PATH, &VALUE, 0);
_ = llistxattr(PATH, &VALUE, SIZE);

So if any new xattrs added just after first call by any other worker, second syscall will fail with ERANGE error.

For Geo-replication, this is not critical, Geo-rep worker goes to Faulty and restarts automatically.

Fix to be done in $SRC/geo-replication/syncdaemon/libcxattr.py
Handle ERANGE error in listxattr, llistxattr, getxattr, lgetxattr, setxattr and lsetxattr. Retry 2-3 times when ERANGE error.

Comment 4 Aravinda VK 2015-12-29 06:00:29 UTC

Upstream Patch sent http://review.gluster.org/#/c/13106/

Comment 6 Aravinda VK 2016-03-23 06:23:25 UTC

Patch for this bug is available in rhgs-3.1.3 branch as part of rebase from upstream release-3.7.9.

Comment 8 Rahul Hinduja 2016-04-18 14:41:01 UTC

Verified with the build: glusterfs-3.7.9-1

Ran automated geo-rep cases on Tiered and Non-Tiered volume. Also carried mountbroker cases. Haven't seen worker crashing with "Numerical result". Other crashes (IO error) seen during IO at slave and not during INIT of geo-replication. Moving this bug to verified state. Will revisit if seen after the other bz is fixed.

Comment 10 errata-xmlrpc 2016-06-23 04:57:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240