************** The bug is a clone 1333341 (Upstream) Description of problem: Rebooted the brick2 and started renaming the files in a brick1 which is full. The brick2 didn't came online after the reboot. Errors were seen in the brick logs. "Creation of unlink directory failed" sosreport kept at rhsqe-repo.lab.eng.blr.redhat.com://var/www/html/sosreports/<bugid> Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create replica 3 volume and mount the volume on client using fuse. 2. Create files using for (( i=1; i <= 50; i++ )) do dd if=/dev/zero of=file$i count=1000 bs=5M status=progress done 3. After the creation is done. reboot the second brick. 4. start the renaming process of the files to test$i..n 5. When the second brick comes up it fails with below errors. [2016-05-05 14:37:45.826772] E [MSGID: 113096] [posix.c:6443:posix_create_unlink_dir] 0-arbiter-posix: Creating directory /rhs/brick1/arbiter/.glusterfs/unlink failed [No space left on device] [2016-05-05 14:37:45.826856] E [MSGID: 113096] [posix.c:6866:init] 0-arbiter-posix: Creation of unlink directory failed [2016-05-05 14:37:45.826880] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-arbiter-posix: Initialization of volume 'arbiter-posix' failed, review your volfile again [2016-05-05 14:37:45.826925] E [graph.c:322:glusterfs_graph_init] 0-arbiter-posix: initializing translator failed [2016-05-05 14:37:45.826943] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed [2016-05-05 14:37:45.828349] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x331) [0x7f6ba63797d1] -->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x120) [0x7f6ba6374150] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x69) [0x7f6ba6373739] ) 0-: received signum (0), shutting down Actual results: [root@dhcp43-167 arbiter]# gluster volume status Status of volume: arbiter Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs /brick1/arbiter N/A N/A N N/A Brick dhcp43-142.lab.eng.blr.redhat.com:/rh s/brick1/arbiter 49157 0 Y 2120 Brick dhcp43-167.lab.eng.blr.redhat.com:/rh s/brick1/arbiter 49156 0 Y 2094 NFS Server on localhost 2049 0 Y 2679 Self-heal Daemon on localhost N/A N/A Y 3172 NFS Server on dhcp42-58.lab.eng.blr.redhat. com 2049 0 Y 2195 Self-heal Daemon on dhcp42-58.lab.eng.blr.r edhat.com N/A N/A Y 2816 NFS Server on dhcp43-142.lab.eng.blr.redhat .com 2049 0 Y 3072 Self-heal Daemon on dhcp43-142.lab.eng.blr. redhat.com N/A N/A Y 3160 Task Status of Volume arbiter ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: arbiternfs Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs /brick2/arbiternfs N/A N/A N N/A Brick dhcp43-142.lab.eng.blr.redhat.com:/rh s/brick2/arbiternfs 49158 0 Y 2128 Brick dhcp43-167.lab.eng.blr.redhat.com:/rh s/brick2/arbiternfs 49157 0 Y 2109 NFS Server on localhost 2049 0 Y 2679 Self-heal Daemon on localhost N/A N/A Y 3172 NFS Server on dhcp42-58.lab.eng.blr.redhat. com 2049 0 Y 2195 Self-heal Daemon on dhcp42-58.lab.eng.blr.r edhat.com N/A N/A Y 2816 NFS Server on dhcp43-142.lab.eng.blr.redhat .com 2049 0 Y 3072 Self-heal Daemon on dhcp43-142.lab.eng.blr. redhat.com N/A N/A Y 3160 Task Status of Volume arbiternfs ------------------------------------------------------------------------------ There are no active volume tasks ************************************************************* [root@dhcp43-142 arbiter]# gluster volume heal arbiternfs info Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs Status: Transport endpoint is not connected Number of entries: - Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs /file4 /file5 Status: Connected Number of entries: 2 Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs /file4 /file5 Status: Connected Number of entries: 2 [root@dhcp43-142 arbiter]# [root@dhcp43-142 arbiter]# [root@dhcp43-142 arbiter]# [root@dhcp43-142 arbiter]# gluster volume heal arbiter info Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick1/arbiter Status: Transport endpoint is not connected Number of entries: - Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick1/arbiter / - Possibly undergoing heal Status: Connected Number of entries: 1 Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick1/arbiter / Status: Connected Number of entries: 1 [root@dhcp43-142 arbiter]# Expected results: The bricks should be up and running and file names should have been renamed. Additional info:
*** Bug 1361517 has been marked as a duplicate of this bug. ***
Upstream mainline patch http://review.gluster.org/15030 is merged.
Upstream mainline : http://review.gluster.org/15030 Upstream 3.8 : http://review.gluster.org/15093 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Verified the bug on [root@dhcp47-144 brick0]# gluster --version glusterfs 3.8.4 built on Sep 29 2016 12:20:30 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. Steps:- 1. Create replica 3 volume and mount the volume on client using fuse. 2. Create files using for (( i=1; i <= 50; i++ )) do dd if=/dev/zero of=file$i count=1000 bs=5M status=progress done 3. After the creation is done. reboot the second brick. 4. start the renaming process of the files to test$i..n 5. When the second brick comes up it fails with below errors. No ERRORS and Warning messages were observed Functionality of the bricks were as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html