Bug 1285200

Summary: Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical result out of range.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Aravinda VK <avishwan>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: asrivast, avishwan, chrisw, csaba, david.macdonald, nlevinki, rhinduja, rhs-bugs, rwheeler, sankarshan, smohan, storage-qa-internal, vkoppad
Target Milestone: ---Keywords: EasyFix, ZStream
Target Release: RHGS 3.1.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1026780
: 1294588 (view as bug list) Environment:
Last Closed: 2016-06-23 04:57:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1026780    
Bug Blocks: 1294588, 1299184, 1313311    

Comment 3 Aravinda VK 2015-12-29 05:28:57 UTC
llistxattr is two syscall instead of one

SIZE = llistxattr(PATH, &VALUE, 0);
_ = llistxattr(PATH, &VALUE, SIZE);

So if any new xattrs added just after first call by any other worker, second syscall will fail with ERANGE error.

For Geo-replication, this is not critical, Geo-rep worker goes to Faulty and restarts automatically.

Fix to be done in $SRC/geo-replication/syncdaemon/libcxattr.py
Handle ERANGE error in listxattr, llistxattr, getxattr, lgetxattr, setxattr and lsetxattr. Retry 2-3 times when ERANGE error.

Comment 4 Aravinda VK 2015-12-29 06:00:29 UTC
Upstream Patch sent http://review.gluster.org/#/c/13106/

Comment 6 Aravinda VK 2016-03-23 06:23:25 UTC
Patch for this bug is available in rhgs-3.1.3 branch as part of rebase from upstream release-3.7.9.

Comment 8 Rahul Hinduja 2016-04-18 14:41:01 UTC
Verified with the build: glusterfs-3.7.9-1

Ran automated geo-rep cases on Tiered and Non-Tiered volume. Also carried mountbroker cases. Haven't seen worker crashing with "Numerical result". Other crashes (IO error) seen during IO at slave and not during INIT of geo-replication. Moving this bug to verified state. Will revisit if seen after the other bz is fixed.

Comment 10 errata-xmlrpc 2016-06-23 04:57:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240