1285200 – Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical result out of range.

Bug 1285200 - Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical result out of range.

Summary: Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical re...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Aravinda VK
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:	1026780
Blocks:	1294588 1299184 1313311
TreeView+	depends on / blocked

Reported:	2015-11-25 08:39 UTC by Aravinda VK
Modified:	2016-06-23 04:57 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-3.7.9-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1026780
Clones:	1294588 (view as bug list)
Environment:
Last Closed:	2016-06-23 04:57:37 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Comment 3 Aravinda VK 2015-12-29 05:28:57 UTC

llistxattr is two syscall instead of one

SIZE = llistxattr(PATH, &VALUE, 0);
_ = llistxattr(PATH, &VALUE, SIZE);

So if any new xattrs added just after first call by any other worker, second syscall will fail with ERANGE error.

For Geo-replication, this is not critical, Geo-rep worker goes to Faulty and restarts automatically.

Fix to be done in $SRC/geo-replication/syncdaemon/libcxattr.py
Handle ERANGE error in listxattr, llistxattr, getxattr, lgetxattr, setxattr and lsetxattr. Retry 2-3 times when ERANGE error.

Comment 4 Aravinda VK 2015-12-29 06:00:29 UTC

Upstream Patch sent http://review.gluster.org/#/c/13106/

Comment 6 Aravinda VK 2016-03-23 06:23:25 UTC

Patch for this bug is available in rhgs-3.1.3 branch as part of rebase from upstream release-3.7.9.

Comment 8 Rahul Hinduja 2016-04-18 14:41:01 UTC

Verified with the build: glusterfs-3.7.9-1

Ran automated geo-rep cases on Tiered and Non-Tiered volume. Also carried mountbroker cases. Haven't seen worker crashing with "Numerical result". Other crashes (IO error) seen during IO at slave and not during INIT of geo-replication. Moving this bug to verified state. Will revisit if seen after the other bz is fixed.

Comment 10 errata-xmlrpc 2016-06-23 04:57:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.