Description of problem: After the /var partition filled up and was subsequently cleaned up, one geo-replication session is showing faulty for one pair of nodes. Version-Release number of selected component (if applicable): RHGS 3.1.3 on RHEL 6.8 How reproducible: Ongoing [root@rhssp1 ~]# gluster v geo-replication node8_dir status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------- node1 vol8_dir /vol8b1/dir root ssh://node11::vol8_dir_slave node11 Active Changelog Crawl 2016-08-08 13:24:52 node3 vol8_dir /vol8b3/dir root ssh://node11::vol8_dir_slave N/A Faulty N/A N/A node4 vol8_dir /vol8b4/dir root ssh://node11::gvol8_dir_slave N/A Faulty N/A N/A node2 vol8_dir /vol8b2/dir root ssh://node11::vol8_dir_slave node11 Passive N/A N/A
Checked the log files, Observations: gsyncd.conf file corruption: When /var partition is full, on glustered restart it is corrupting the gsyncd.conf(Geo-rep session conf). As a workaround, conf file copied from good peer node to other nodes. To fix this issue in future, we should avoid regeneration of conf file every time when glusterd starts and handling failures. Python Traceback causing Geo-rep status Faulty. We found following traceback in Slave log file. Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 772, in entry_ops os.unlink(entry) OSError: [Errno 21] Is a directory: '.gfid/12711ebf-7fdc-4f4b-9850-2d75581eb452/New folder' During Rename, if source and Target has same inode then Geo-rep is deleting the source since No rename required. It is not handling the directories. When it tries to delete the source directory it is failing with this error. We will work on this fix. As a workaround, In all brick backends of Slave, ls .glusterfs/12/71/12711ebf-7fdc-4f4b-9850-2d75581eb452/New folder If Empty then cleanup/delete "New folder" in Slave. So that Geo-rep can continue by passing this error. If not empty, backup the files and then delete the "New folder". We can trigger sync for the files from this directory if any.
Geo-replication support added to Glusterd2 project, which will be available with Gluster upstream 4.0 and 4.1 releases. Most of the issues already fixed with issue https://github.com/gluster/glusterd2/issues/271 and remaining fixes are noted in issue https://github.com/gluster/glusterd2/issues/557 We can close these issues since we are not planning any fixes for 3.x series.
I see the issues are fixed UPSTREAM right now. Customer case is closed too. When the GD2 comes to product, this gets automatically closed! Please re-open the issue if the ask is to get it in GD1 itself, which means we have to rescope the effort and see what can be done!