Red Hat Bugzilla – Bug 1462210
[RFE][GSS]Geo-replication skip the deletion of files if a slave subvolume was down
Last modified: 2017-07-12 06:52:59 EDT
Description of problem:
Geo-replication is not syncing files in case a subvolume of slave volume(distribute-replicate) goes down and comes back.
Version-Release number of selected component (if applicable):
Red Hat Gluster Storage Server 3.3.0
Steps to Reproduce:
- Create slave and master volumes 2x2 .
- Kill 1 set replica bricks on the slave side. Keep the other replica set running.
on a 2x2 volume
kill brick1 & brick2 which were a replica set and keep the brick4 and brick5 up.
- geo-replications status showed the sessions active & passive(no faulty sessions)
- Delete the contents from master volume. Delete will be successful.
- Bring up both the replica bricks at slave side.
- The files in the slave volume which were in the bricks which were down will not be deleted.
This causes the slave and master volumes to be out of sync.
The slave volume has some stale data
The master and slave volumes should be in sync.
Customers can hit this issue easily. In case the brick process gets killed , The geo-replication status is not reporting any errors, but the slave and master volumes will be out of sync.
I think this is the expected behavior if all the bricks of a subvolume is down and the file is hashed to that subvolume.
gluster volume create gv1 node1:/bricks/b1 node2:/bricks/b2 force
gluster volume start gv1
mount -t glusterfs localhost:gv1 /mnt/gv1
echo "Hello World" > /mnt/gv1/f1
Check in backend and kill the brick where it is hashed. Then try to access the file or delete the file. We always get "rm: cannot remove 'f1': No such file or directory". Geo-rep thinks that file is already deleted and proceeds without logging.
There is no way to differentiate from Geo-replication if the file is already deleted or subvolume is down. I think DHT can be enhanced to return different error code if the subvolume is down.
Adding Raghavendra to check the possibility from DHT to differentiate these errors.