Description of problem: After doing upgrade testing from Anshi U5 build to latest 2.1 build, files are not getting synced even after more than 12 hours. For upgrading the geo-rep, I ran slave-upgrade.sh script which makes the gfid of the slave same as master. But this had not changed the gfid of the symbolic links. Now I ran rm -rf on the master mount point and then untarred the linux kernel. Both of these creates and deletes are not synced to slave volume. After more than 18 hours it's still in .processing. Version-Release number of selected component (if applicable): glusterfs-3.4.0.30rhs-2.el6rhs.x86_64 How reproducible: Have hit 1/1. Haven't tried reproducing it. Steps to Reproduce: 1. Create and start a geo-rep session between 2*2 dist-rep master and 2*2 dist-rep slave. On the 3.3.0.11rhs build. 2. Now create some files on mountopoint by copying /etc into the mount point for about 5 times. 3. Wait for this to get synced to slave volume via geo-rep. 4. Now stop the geo-rep session and stop both the master and slave volume. 5. Install the latest 3.4.0.30rhs-2 gluster build and start the volumes. 6. Run upgrade geo-rep scripts which proper steps to upgrade the slave to latest one. 7. After running slave-upgrade.sh there was difference in the gfids of the symlinks in master and slave. 8. Now create and start the geo-rep session between master and slave again. 9. Execute rm -rf /mnt/master/* && tar -xzvf linux*.tar.gz -C /mnt/master Actual results: Even after about more than 18 hours, files are not synced to the slave volume. The status detail still shows bunch of files to be synced. [root@spitfire ~]# gluster v geo master falcon::slave status detail MASTER: master SLAVE: falcon::slave NODE HEALTH UPTIME FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING -------------------------------------------------------------------------------------------------------------------- spitfire.blr.redhat.com Stable 20:34:17 7323 352 2.7MB 2387 mustang.blr.redhat.com Stable 20:34:13 0 0 0Bytes 0 harrier.blr.redhat.com Stable 20:34:13 7223 359 2.2MB 2209 typhoon.blr.redhat.com Stable 20:34:13 0 0 0Bytes 0 [root@spacex ~]# ls /mnt/master/ linux-3.10 [root@spacex ~]# ls /mnt/slave/ etc etc.1 etc.2 etc.3 etc.4 etc.5 gfid Expected results: The deletes and creates should be synced to slave. Additional info: In the working dir, the .processing directory had bunch of entries for files to be synced. [root@spitfire ~]# ls -lrt /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.43.152%3Agluster%3A%2F%2F127.0.0.1%3Aslave/59ddf777397e52a13ba1333653d63854/.processing/ total 5384 -rw-r--r-- 1 root root 387830 Sep 3 18:33 CHANGELOG.1378213414 -rw-r--r-- 1 root root 328662 Sep 3 18:34 CHANGELOG.1378213474 -rw-r--r-- 1 root root 324667 Sep 3 18:35 CHANGELOG.1378213535 -rw-r--r-- 1 root root 319931 Sep 3 18:36 CHANGELOG.1378213595 -rw-r--r-- 1 root root 203135 Sep 3 18:37 CHANGELOG.1378213655 -rw-r--r-- 1 root root 221602 Sep 3 18:38 CHANGELOG.1378213715 -rw-r--r-- 1 root root 209204 Sep 3 18:39 CHANGELOG.1378213775 -rw-r--r-- 1 root root 211050 Sep 3 18:40 CHANGELOG.1378213835 -rw-r--r-- 1 root root 199732 Sep 3 18:41 CHANGELOG.1378213895 -rw-r--r-- 1 root root 216232 Sep 3 18:42 CHANGELOG.1378213955 -rw-r--r-- 1 root root 193036 Sep 3 18:43 CHANGELOG.1378214015 -rw-r--r-- 1 root root 188544 Sep 3 18:44 CHANGELOG.1378214075 -rw-r--r-- 1 root root 186787 Sep 3 18:45 CHANGELOG.1378214136 -rw-r--r-- 1 root root 187200 Sep 3 18:46 CHANGELOG.1378214196 -rw-r--r-- 1 root root 185567 Sep 3 18:47 CHANGELOG.1378214256 -rw-r--r-- 1 root root 205367 Sep 3 18:48 CHANGELOG.1378214316 -rw-r--r-- 1 root root 182104 Sep 3 18:49 CHANGELOG.1378214376 -rw-r--r-- 1 root root 177566 Sep 3 18:50 CHANGELOG.1378214436 -rw-r--r-- 1 root root 180512 Sep 3 18:51 CHANGELOG.1378214496 -rw-r--r-- 1 root root 180398 Sep 3 18:52 CHANGELOG.1378214556 -rw-r--r-- 1 root root 197543 Sep 3 18:53 CHANGELOG.1378214616 -rw-r--r-- 1 root root 198779 Sep 3 18:54 CHANGELOG.1378214676 -rw-r--r-- 1 root root 203346 Sep 3 18:55 CHANGELOG.1378214736 -rw-r--r-- 1 root root 202392 Sep 3 18:56 CHANGELOG.1378214796 -rw-r--r-- 1 root root 156091 Sep 3 18:57 CHANGELOG.1378214857 I will archive all the logs.
I tried to restart the geo-replication session as a work around. But that doesn't fix it completely. The after restart the newly created files are synced but the deletes are not synced anyway.
Was the 'rm -rf' done during xsync crawl? if yes, it ill not be synced to the slave side
> After doing upgrade testing from Anshi U5 build to latest 2.1 build, files are not getting synced even after more than 12 hours. For upgrading the geo-rep, I ran slave-upgrade.sh script which makes the gfid of the slave same as master. But this had not changed the gfid of the symbolic links. Considering bug 1001089 is fixed, the above issue should not be happening again. Can we test it with glusterfs-3.4.0.33rhs ?
(In reply to Amar Tumballi from comment #4) > > After doing upgrade testing from Anshi U5 build to latest 2.1 build, files are not getting synced even after more than 12 hours. For upgrading the geo-rep, I ran slave-upgrade.sh script which makes the gfid of the slave same as master. But this had not changed the gfid of the symbolic links. > > Considering bug 1001089 is fixed, the above issue should not be happening > again. Can we test it with glusterfs-3.4.0.33rhs ? No it doesn't happen with glusterfs-3.4.0.33rhs. rm -rf propagates to the slave properly. Can I directly move it to verified?
It's working now. Moving to verified. Tested in version: glusterfs-3.4.0.33rhs-1.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html