Log-file rotation on a Disperse Volume while a failed brick results in files that cannot be healed. Simple log-file rotation on a Disperse Volume while there is a failed brick results in files that cannot later be healed when the brick is restored. The rotation here is simply closing and renaming the old file and creating a new file of the old name in the same directory. What seems to happen is that on the failed and then restored brick links both the new file and the renamed old file together. The only way to manually get the file to heal is to either copy the file to a temp name, delete it, and move the temp file to the original name. However, this is not an acceptable procedure when the application has the file opened. Another manual procedure that seems to work is to delete all of the old name and new name files from the previously failed brick, and run heal. This problem is likely related to: Bug 1359681 – [disperse] Data gain while brick is down and rename a file https://bugzilla.redhat.com/show_bug.cgi?id=1359681 with the added issue of persistent heal failures, and manual intervention being required. Test plan below. # gluster --version glusterfs 3.12.14 ##### Create the disperse volume. # mkdir brick1 brick2 brick3 # gluster volume create disperse-vol disperse-data 2 redundancy 1 transport tcp \ 10.10.60.181:/exports/nas-segment-0002/brick1 \ 10.10.60.181:/exports/nas-segment-0002/brick2 \ 10.10.60.181:/exports/nas-segment-0002/brick3 force volume create: disperse-vol: success: please start the volume to access data # gluster volume start disperse-vol volume start: disperse-vol: success # mkdir /mnt/disperse-vol # mount -t glusterfs 10.10.60.181:/disperse-vol /mnt/disperse-vol/ ##### Place a test file, verify the contents. # cp -i /var/log/messages /mnt/disperse-vol/ # sum /mnt/disperse-vol/messages 36906 4082 # gluster volume status disperse-vol Status of volume: disperse-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.10.60.181:/exports/nas-segment-000 2/brick1 62001 0 Y 24636 Brick 10.10.60.181:/exports/nas-segment-000 2/brick2 62008 0 Y 24656 Brick 10.10.60.181:/exports/nas-segment-000 2/brick3 62009 0 Y 24676 Self-heal Daemon on localhost N/A N/A Y 24717 Task Status of Volume disperse-vol ------------------------------------------------------------------------------ There are no active volume tasks ##### Simulate a brick failure by killing the brick process for brick3. # kill 24676 ##### Verify the contents. # sum /mnt/disperse-vol/messages 36906 4082 ##### Manually simulate a logfile rotation. # mv /mnt/disperse-vol/messages /mnt/disperse-vol/messages-1 # cp -i /var/log/boot.log /mnt/disperse-vol/messages ##### Verify the contents. # sum /mnt/disperse-vol/messages 58391 5 # sum /mnt/disperse-vol/messages-1 36906 4082 ##### Fix the brick failure by force starting disperse-vol. # gluster volume start disperse-vol force volume start: disperse-vol: success ##### Force healing, and note that the new logfile is not healthy. # gluster volume heal disperse-vol full # gluster volume heal disperse-vol info Brick 10.10.60.181:/exports/nas-segment-0002/brick1 / /messages Status: Connected Number of entries: 2 Brick 10.10.60.181:/exports/nas-segment-0002/brick2 / /messages Status: Connected Number of entries: 2 Brick 10.10.60.181:/exports/nas-segment-0002/brick3 Status: Connected Number of entries: 0 ##### Problems with unable to heal in GlusterFS log file. [2018-10-24 20:19:25.611393] I [MSGID: 122059] [ec-heald.c:407:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-0 [2018-10-24 20:19:25.611824] I [MSGID: 122059] [ec-heald.c:407:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-1 [2018-10-24 20:19:25.612487] I [MSGID: 122059] [ec-heald.c:407:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-2 [2018-10-24 20:19:25.619637] I [MSGID: 122002] [ec-heal.c:1343:__ec_heal_name] 0-disperse-vol-disperse-0: 00000000-0000-0000-0000-000000000001/messages: Not able to heal [2018-10-24 20:19:25.628517] I [MSGID: 122059] [ec-heald.c:416:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-0 [2018-10-24 20:19:25.629298] I [MSGID: 122002] [ec-heal.c:1343:__ec_heal_name] 0-disperse-vol-disperse-0: 00000000-0000-0000-0000-000000000001/messages: Not able to heal [2018-10-24 20:19:25.633801] I [MSGID: 122059] [ec-heald.c:416:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-2 [2018-10-24 20:19:25.636178] I [MSGID: 122002] [ec-heal.c:1343:__ec_heal_name] 0-disperse-vol-disperse-0: 00000000-0000-0000-0000-000000000001/messages: Not able to heal [2018-10-24 20:19:25.637871] I [MSGID: 122059] [ec-heald.c:416:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-1 ##### Verify the contents. # sum /mnt/disperse-vol/messages 58391 5 # sum /mnt/disperse-vol/messages-1 36906 4082 ##### Dump the 'messages' file attributes. Note that the previously failed brick has 2 gfid2paths. # getfattr --absolute-names -m '.*' -d -e hex /exports/nas-segment-0002/brick*/messages # file: /exports/nas-segment-0002/brick1/messages trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x00000000000000010000000000000001 trusted.ec.size=0x000000000000128e trusted.ec.version=0x00000000000000010000000000000001 trusted.gfid=0x4d041ae05fd649e0a9ee6fc502899c73 trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573 trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text) # file: /exports/nas-segment-0002/brick2/messages trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x00000000000000010000000000000001 trusted.ec.size=0x000000000000128e trusted.ec.version=0x00000000000000010000000000000001 trusted.gfid=0x4d041ae05fd649e0a9ee6fc502899c73 trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573 trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text) # file: /exports/nas-segment-0002/brick3/messages trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000003fc5d1 trusted.ec.version=0x00000000000000200000000000000020 trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31 trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573 trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text) trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text) ##### Dump the 'messages-1' file attributes. Note that the previously failed brick has 2 gfid2paths. # getfattr --absolute-names -m '.*' -d -e hex /exports/nas-segment-0002/brick*/messages-1 # file: /exports/nas-segment-0002/brick1/messages-1 trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000003fc5d1 trusted.ec.version=0x00000000000000200000000000000020 trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31 trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text) # file: /exports/nas-segment-0002/brick2/messages-1 trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000003fc5d1 trusted.ec.version=0x00000000000000200000000000000020 trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31 trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text) # file: /exports/nas-segment-0002/brick3/messages-1 trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000003fc5d1 trusted.ec.version=0x00000000000000200000000000000020 trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31 trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573 trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text) trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text) ##### File 'messages' brick fragment stats. # stat /exports/nas-segment-0002/brick*/messages File: `/exports/nas-segment-0002/brick1/messages' Size: 2560 Blocks: 8 IO Block: 4096 regular file Device: 880h/2176d Inode: 92 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-10-24 13:18:37.935361600 -0700 Modify: 2018-10-24 13:18:37.953361595 -0700 Change: 2018-10-24 13:18:37.954361598 -0700 File: `/exports/nas-segment-0002/brick2/messages' Size: 2560 Blocks: 8 IO Block: 4096 regular file Device: 880h/2176d Inode: 268435548 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-10-24 13:18:37.935361600 -0700 Modify: 2018-10-24 13:18:37.953361595 -0700 Change: 2018-10-24 13:18:37.954361598 -0700 File: `/exports/nas-segment-0002/brick3/messages' Size: 2089984 Blocks: 4088 IO Block: 4096 regular file Device: 880h/2176d Inode: 537347155 Links: 3 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-10-24 13:17:16.301361665 -0700 Modify: 2018-10-24 13:17:16.438361664 -0700 Change: 2018-10-24 13:18:59.853361581 -0700 ##### File 'messages-1' brick fragment stats. # stat /exports/nas-segment-0002/brick*/messages-1 File: `/exports/nas-segment-0002/brick1/messages-1' Size: 2089984 Blocks: 4088 IO Block: 4096 regular file Device: 880h/2176d Inode: 89 Links: 2 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-10-24 13:17:16.301361665 -0700 Modify: 2018-10-24 13:17:16.438361664 -0700 Change: 2018-10-24 13:18:59.872361580 -0700 File: `/exports/nas-segment-0002/brick2/messages-1' Size: 2089984 Blocks: 4088 IO Block: 4096 regular file Device: 880h/2176d Inode: 268435544 Links: 2 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-10-24 13:17:16.301361665 -0700 Modify: 2018-10-24 13:17:16.438361664 -0700 Change: 2018-10-24 13:18:59.872361580 -0700 File: `/exports/nas-segment-0002/brick3/messages-1' Size: 2089984 Blocks: 4088 IO Block: 4096 regular file Device: 880h/2176d Inode: 537347155 Links: 3 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-10-24 13:17:16.301361665 -0700 Modify: 2018-10-24 13:17:16.438361664 -0700 Change: 2018-10-24 13:18:59.853361581 -0700 ##### Find the 'messages' file fragments on previously failed brick3. # find /exports/nas-segment-0002/brick3 -samefile /exports/nas-segment-0002/brick3/messages -ls 537347155 2044 -rw------- 3 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/.glusterfs/1d/28/1d288eab-6fca-4561-a162-f0a6b13a6d3a 537347155 2044 -rw------- 3 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages 537347155 2044 -rw------- 3 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages-1 ##### Find the 'messages-1' file fragments on previously failed brick3. # find /exports/nas-segment-0002/brick3 -samefile /exports/nas-segment-0002/brick3/messages-1 -ls 537347155 2044 -rw------- 3 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/.glusterfs/1d/28/1d288eab-6fca-4561-a162-f0a6b13a6d3a 537347155 2044 -rw------- 3 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages 537347155 2044 -rw------- 3 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages-1 ##### Find the 'messages' file fragments on always good brick1. # find /exports/nas-segment-0002/brick1 -samefile /exports/nas-segment-0002/brick1/messages -ls 92 4 -rw-r--r-- 2 root root 2560 Oct 24 13:18 /exports/nas-segment-0002/brick1/.glusterfs/4d/04/4d041ae0-5fd6-49e0-a9ee-6fc502899c73 92 4 -rw-r--r-- 2 root root 2560 Oct 24 13:18 /exports/nas-segment-0002/brick1/messages ##### Find the 'messages-1' file fragments on always good brick1. # find /exports/nas-segment-0002/brick1 -samefile /exports/nas-segment-0002/brick1/messages-1 -ls 89 2044 -rw------- 2 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick1/.glusterfs/1d/28/1d288eab-6fca-4561-a162-f0a6b13a6d3a 89 2044 -rw------- 2 root root 2089984 Oct 24 13:17 /exports/nas-segment-0002/brick1/messages-1 ##### "Fix" the healing by deleting all of the unhealable file 'messages' from the previously failed brick3. # find /exports/nas-segment-0002/brick3 -samefile /exports/nas-segment-0002/brick3/messages -delete ##### Disperse Volume healing now works on the 'message' file. # gluster volume heal disperse-vol full # gluster volume heal disperse-vol info Brick 10.10.60.181:/exports/nas-segment-0002/brick1 Status: Connected Number of entries: 0 Brick 10.10.60.181:/exports/nas-segment-0002/brick2 Status: Connected Number of entries: 0 Brick 10.10.60.181:/exports/nas-segment-0002/brick3 Status: Connected Number of entries: 0 # sum /mnt/disperse-vol/messages 58391 5 # sum /mnt/disperse-vol/messages-1 36906 4082
Thanks for all the information. This bug is the same as bug #1359681. The only difference is that here you are recreating a file with the same original name, which causes a self-heal failure instead of data gain, but the underlying problem is the same: failure to correctly detect renamed entries before healing the original name. Since this also happens on mainline version, I'll keep the older bug open and close this one as a duplicate. *** This bug has been marked as a duplicate of bug 1359681 ***