Description of problem: ======================= Upon Setting the non-root geo-rep session between 1 Master and 2 Slaves (slave1,and slave2). The directory rename created a new directory and doesn't unlink the old one on the second slave. Rename of file works as: Master Side Operation: ====================== [root@wingo master]# cp -rf /etc/host* . [root@wingo master]# ls host.conf hosts hosts.allow hosts.deny ifcfg-eth0 ifcfg-lo [root@wingo master]# mv host.conf conf [root@wingo master]# mkdir temp1 [root@wingo master]# mv temp1 temp2 [root@wingo master]# [root@wingo master]# ls conf hosts hosts.allow hosts.deny ifcfg ifcfg-lo temp2 [root@wingo master]# Slave Side Entry: ================ [root@wingo ~]# ls /mnt/slave* /mnt/slave: conf hosts hosts.allow hosts.deny ifcfg ifcfg-lo temp2 /mnt/slave2: conf hosts hosts.allow hosts.deny ifcfg ifcfg-lo temp1 temp2 /mnt/slave2_nfs: /mnt/slave_nfs: conf hosts hosts.allow hosts.deny ifcfg ifcfg-lo temp2 /mnt/slave_nfs2: conf hosts hosts.allow hosts.deny ifcfg ifcfg-lo temp1 temp2 [root@wingo ~]# [root@wingo ~]# mount | grep slave 10.70.46.101:/slave1 on /mnt/slave type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) 10.70.46.101:/slave2 on /mnt/slave2 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) 10.70.46.101:/slave1 on /mnt/slave_nfs type nfs (rw,vers=3,addr=10.70.46.101) 10.70.46.101:/slave2 on /mnt/slave_nfs2 type nfs (rw,vers=3,addr=10.70.46.101) [root@wingo ~]# Please look at the temp1=>temp2. It is renamed in one slave volume and not on other. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.1-7.el6rhs.x86_64 How reproducible: ================= Have tried once on fanout/non-root setup. Steps Carried: ============== 1. Create Master (2 nodes) and Slave Cluster (4 nodes) 2. Create and Start Master and 2 Slave Volumes (Each 2x2) 3. Create mount-broker geo-rep session between master and 2 slave volumes 4. Mount the Master and Slave Volume (NFS and Fuse) 5. Create dir/files on master. Let it sync to slave. 6. Rename files and directories. Actual results: =============== Both original and renamed directory is present at slave 2. It properly synced to slave 1. Expected results: ================= Rename should be synced on both the slaves.
RCA: Geo-rep rename of directory just do os.rename and nothing else. Since we see RENAME entry in both master active bricks in .processed, it is clear that os.rename is issued on temp1. From the logs below, we see the DHT self heal happened on the directory 'temp1'. So it seems 'temp1' got created as part of DHT self heal. I discussed with Du (rgowdapp) and this is a known issue it seems. LOGS: [root@georep1 ssh%3A%2F%2Fgeoaccount%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave2]# find . | xargs grep RENAME 2>/dev/null | grep temp ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1436013576:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 ./764586b145d7206a154a778f64bd2f50/.processed/CHANGELOG.1436013576:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 [root@georep2 ssh%3A%2F%2Fgeoaccount%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave2]# find . | xargs grep RENAME 2>/dev/null | grep temp ./c19b89ac45352ab8c894d210d136dd56/.processed/CHANGELOG.1436013567:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1436013567:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 And DHT LOGS says this: [root@georep3 geo-replication-slaves]# grep temp1 * d01c04e8-09af-4c02-920f-f4bd60433a9e:gluster%3A%2F%2F127.0.0.1%3Aslave2.gluster.log:[2015-07-04 12:39:29.566451] I [MSGID: 109066] [dht-rename.c:1410ht_rename] 0-slave2-dht: renaming /.gfid/00000000-0000-0000-0000-000000000001/temp1 (hash=slave2-replicate-0/cache=slave2-replicate-1) => /.gfid/00000000-0000-0000-0000-000000000001/temp2 (hash=slave2-replicate-1/cache=<nul>) grep: mbr: Is a directory [root@georep4 geo-replication-slaves]# grep temp1 * d01c04e8-09af-4c02-920f-f4bd60433a9e:gluster%3A%2F%2F127.0.0.1%3Aslave2.gluster.log:[2015-07-04 12:39:23.566466] I [MSGID: 109036] [dht-common.c:7106ht_log_new_layout_for_dir_selfheal] 0-slave2-dht: Setting layout of /temp1 with [Subvol_name: slave2-replicate-0, Err: -1 , Start: 2147475392 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: slave2-replicate-1, Err: -1 , Start: 0 , Stop: 2147475391 , Hash: 1 ], d01c04e8-09af-4c02-920f-f4bd60433a9e:gluster%3A%2F%2F127.0.0.1%3Aslave2.gluster.log:[2015-07-04 12:39:30.284235] I [MSGID: 109036] [dht-common.c:7106ht_log_new_layout_for_dir_selfheal] 0-slave2-dht: Setting layout of /temp1 with [Subvol_name: slave2-replicate-0, Err: -1 , Start: 2147475392 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: slave2-replicate-1, Err: -1 , Start: 0 , Stop: 2147475391 , Hash: 1 ], Du, Please provide the existing DHT bugs on rename self-heal race and more info on this.
Doc text is edited. Please sign off to be included in Known Issues.
This issue is similar to the one tracked by bz 1118762. Hence marking this as a dependent bug. Also please note that, bz 1118762 is fixed in rhgs 3.1.3 and hence I think current bz is also fixed.
Carried the same scenario on build: glusterfs-server-3.8.4-2.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-2.el7rhgs.x86_64 Didn't hit the issue even after trying the scenario multiple times. As mentioned in comment 9, this seems to be fixed with the latest downstream 3.2.0 build.
Moving this BZ to On_QA.
Based on comment 9 and 10, moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html