Description of problem: Selfheal on a volume stops at a particular point and does not resume for a long time Version-Release number of selected component (if applicable): glusterfs-3.7.1-3.el6rhs.x86_64 nfs-ganesha-2.2.0-3.el6rhs.x86_64 How reproducible: Always Steps to Reproduce: 1. create a 1X2 dist-rep volume and mount using nfs-ganesha vers=3 2. create directories and files 3. bring down 1 brick of the replica pair 4. rename all the files and directories 5. force start the volume 6. Self-heal process starts and then seems to hang Actual results: No. of enteries come to a specific number and then stops there Expected results: No. of enteries in self-heal info must become 0 Additional info: [root@nfs2 ~]# gluster v heal testvol info Brick nfs1:/rhs/brick1/brick1/testvol_brick0/ /x1/b1 /x1/b2 /x1/b3 /x1/b4 ' ' ' /x15/b19 /x15/b20 Number of entries: 300 Brick nfs2:/rhs/brick1/brick1/testvol_brick1/ Number of entries: 0 Self-heal eventually completes after a few hours
Created attachment 1042308 [details] sosreports 1
Created attachment 1042309 [details] sosreports 2
on the client used following script 1. to create files/directories: for i in {1..15}; do mkdir /mnt/testvol/a$i; mkdir /mnt/testvol/x$i; for j in {1..20}; do mkdir /mnt/testvol/a$i/b$j; mkdir /mnt/testvol/x$i/y$j; for k in {1..30}; do touch /mnt/testvol/a$i/b$j/c$k; done done done 2. to rename files/directories: for i in {1..15}; do for j in {1..20}; do mv /mnt/testvol/a$i/b$j /mnt/testvol/x$i/b$j; for k in {1..30}; do mv /mnt/testvol/x$i/b$j/c$k /mnt/testvol/x$i/y$j/c$k; done done done
Update: ============== Build used : glusterfs-3.12.2-6.el7rhgs.x86_64 Verified below scenarios for both 1 * 2 and 2 * 3 1. create a volume and mount 2. create directories and files using below for i in {1..15}; do mkdir /mnt/testvol/a$i; mkdir /mnt/testvol/x$i; for j in {1..20}; do mkdir /mnt/testvol/a$i/b$j; mkdir /mnt/testvol/x$i/y$j; for k in {1..30}; do touch /mnt/testvol/a$i/b$j/c$k; done done done 3. bring down 1 brick of the replica pair ( for 2 * 3 , bring down 1 brick for each replica set ) 4. rename all the files and directories using below for i in {1..15}; do for j in {1..20}; do mv /mnt/testvol/a$i/b$j /mnt/testvol/x$i/b$j; for k in {1..30}; do mv /mnt/testvol/x$i/b$j/c$k /mnt/testvol/x$i/y$j/c$k; done done done 5. force start the volume Healing is completed without any issues [root@dhcp35-163 ~]# gluster vol heal 23 info Brick 10.70.35.61:/bricks/brick0/testvol_distributed-replicated_brick0 Status: Connected Number of entries: 0 Brick 10.70.35.174:/bricks/brick0/testvol_distributed-replicated_brick1 Status: Connected Number of entries: 0 Brick 10.70.35.17:/bricks/brick0/testvol_distributed-replicated_brick2 Status: Connected Number of entries: 0 Brick 10.70.35.163:/bricks/brick0/testvol_distributed-replicated_brick3 Status: Connected Number of entries: 0 Brick 10.70.35.136:/bricks/brick0/testvol_distributed-replicated_brick4 Status: Connected Number of entries: 0 Brick 10.70.35.214:/bricks/brick0/testvol_distributed-replicated_brick5 Status: Connected Number of entries: 0 [root@dhcp35-163 ~]# > Also verified with the steps provided in comment 7 Changing status to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607