Red Hat Bugzilla – Bug 1285200
Dist-geo-rep : geo-rep worker crashed while init with [Errno 34] Numerical result out of range.
Last modified: 2016-06-23 00:57:37 EDT
llistxattr is two syscall instead of one
SIZE = llistxattr(PATH, &VALUE, 0);
_ = llistxattr(PATH, &VALUE, SIZE);
So if any new xattrs added just after first call by any other worker, second syscall will fail with ERANGE error.
For Geo-replication, this is not critical, Geo-rep worker goes to Faulty and restarts automatically.
Fix to be done in $SRC/geo-replication/syncdaemon/libcxattr.py
Handle ERANGE error in listxattr, llistxattr, getxattr, lgetxattr, setxattr and lsetxattr. Retry 2-3 times when ERANGE error.
Upstream Patch sent http://review.gluster.org/#/c/13106/
Patch for this bug is available in rhgs-3.1.3 branch as part of rebase from upstream release-3.7.9.
Verified with the build: glusterfs-3.7.9-1
Ran automated geo-rep cases on Tiered and Non-Tiered volume. Also carried mountbroker cases. Haven't seen worker crashing with "Numerical result". Other crashes (IO error) seen during IO at slave and not during INIT of geo-replication. Moving this bug to verified state. Will revisit if seen after the other bz is fixed.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.