llistxattr is two syscall instead of one SIZE = llistxattr(PATH, &VALUE, 0); _ = llistxattr(PATH, &VALUE, SIZE); So if any new xattrs added just after first call by any other worker, second syscall will fail with ERANGE error. For Geo-replication, this is not critical, Geo-rep worker goes to Faulty and restarts automatically. Fix to be done in $SRC/geo-replication/syncdaemon/libcxattr.py Handle ERANGE error in listxattr, llistxattr, getxattr, lgetxattr, setxattr and lsetxattr. Retry 2-3 times when ERANGE error.
Upstream Patch sent http://review.gluster.org/#/c/13106/
Patch for this bug is available in rhgs-3.1.3 branch as part of rebase from upstream release-3.7.9.
Verified with the build: glusterfs-3.7.9-1 Ran automated geo-rep cases on Tiered and Non-Tiered volume. Also carried mountbroker cases. Haven't seen worker crashing with "Numerical result". Other crashes (IO error) seen during IO at slave and not during INIT of geo-replication. Moving this bug to verified state. Will revisit if seen after the other bz is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240