Description of problem: After adding a subvolume the directories are failing with EAGAIN which leads to rebalance failure Filed a issue with similiar steps earlier with permission denied warnings. I think this MIGHT be a repercussion of this issue https://bugzilla.redhat.com/show_bug.cgi?id=1399504 Version-Release number of selected component (if applicable): [root@dhcp47-175 ~]# rpm -qa | grep gluster glusterfs-fuse-3.8.4-6.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-6.el7rhgs.x86_64 gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-5.el7rhgs.x86_64 glusterfs-libs-3.8.4-6.el7rhgs.x86_64 glusterfs-3.8.4-6.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-6.el7rhgs.x86_64 glusterfs-api-3.8.4-6.el7rhgs.x86_64 glusterfs-server-3.8.4-6.el7rhgs.x86_64 python-gluster-3.8.4-6.el7rhgs.noarch vdsm-gluster-4.17.33-1.el7rhgs.noarch gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-cli-3.8.4-6.el7rhgs.x86_64 How reproducible: 2/2 Logs are placed at rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug> Steps to Reproduce: 1. create 1x(2+1) arbiter volume 2. create 1000 directories and 1000 files 3. replace a data brick from the volume 4. let the heals complete check using gluster volume heal <volname> info 5. add 3 bricks at once to form 2x(2+1) 6. now start rebalance of the volume using gluster volume rebalance <volname> start 7. output:- rebalance failed. Actual results: There are two different outputs shown:- [root@dhcp47-197 ~]# gluster v status Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.142:/bricks/brick1/testvol_b rick0 49153 0 Y 10271 Brick 10.70.47.197:/bricks/brick0/testvol_b rick1 49152 0 Y 10163 Brick 10.70.47.175:/bricks/brick0/testvol_b rick2 49152 0 Y 9876 Brick 10.70.46.142:/bricks/brick0/testvol_b rick0 49152 0 Y 10407 Brick 10.70.47.197:/bricks/brick1/testvol_b rick1 49153 0 Y 10441 Brick 10.70.47.175:/bricks/brick1/testvol_b rick2 49153 0 Y 10095 Self-heal Daemon on localhost N/A N/A Y 10461 Self-heal Daemon on dhcp46-142.lab.eng.blr. redhat.com N/A N/A Y 10427 Self-heal Daemon on 10.70.47.175 N/A N/A Y 10115 Task Status of Volume testvol ------------------------------------------------------------------------------ Task : Rebalance ID : 80843ccb-0391-4ad6-af5b-4662d8fced43 Status : failed ****************************************************************************** [root@dhcp47-175 ~]# gluster volume rebalance testvol status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 135 0 failed 0:0:38 dhcp46-142.lab.eng.blr.redhat.com 492 0Bytes 1001 163 0 failed 0:0:37 10.70.47.197 0 0Bytes 0 1 0 failed 0:0:0 volume rebalance: testvol: success [root@dhcp47-175 ~]# Expected results: There should be no rebalance failure No errors should be reported. Additional info: [root@dhcp47-175 ~]# getfattr -d -m . -e hex /bricks/brick?/*//dir998 getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/testvol_brick2/dir998 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 trusted.afr.testvol-client-0=0x000000000000000000000000 trusted.gfid=0x3098ee30bfac4cc6a512087008811ae1 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff # file: bricks/brick1/testvol_brick2/dir998 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 trusted.gfid=0x3098ee30bfac4cc6a512087008811ae1 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe ############################################################# [root@dhcp46-142 ~]# getfattr -d -m . -e hex /bricks/brick?/*//dir998 getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/testvol_brick0/dir998 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 trusted.gfid=0x3098ee30bfac4cc6a512087008811ae1 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe # file: bricks/brick1/testvol_brick0/dir998 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 trusted.gfid=0x3098ee30bfac4cc6a512087008811ae1 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff ############################################################# [root@dhcp47-197 ~]# getfattr -d -m . -e hex /bricks/brick?/*//dir998 getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/testvol_brick1/dir998 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 trusted.afr.testvol-client-0=0x000000000000000000000000 trusted.gfid=0x3098ee30bfac4cc6a512087008811ae1 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff # file: bricks/brick1/testvol_brick1/dir998 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 trusted.gfid=0x3098ee30bfac4cc6a512087008811ae1 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
I was able to recreate the issue downstream consistently .After applying https://code.engineering.redhat.com/gerrit/#/c/92316/ (sent against BZ 1393694). I was not able to hit it any more.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html