Bug 1004235 - Dist-geo-rep: Both creates and deletes are not synced to slave after rpm upgrade from 2.0 to 2.1
Dist-geo-rep: Both creates and deletes are not synced to slave after rpm upgr...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Venky Shankar
Sudhir D
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-04 05:18 EDT by M S Vishwanath Bhat
Modified: 2016-05-31 21:56 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.33rhs-1
Doc Type: Bug Fix
Doc Text:
With new enchancements for better performance and distribution in geo-replication feature, the logic of syncing files from master cluster to slave has been modified. Due to this, after upgrade previously set geo-replication session failed to sync files to slave. Now, an upgrade script has been included in this update. Running this script after upgrade would ensure the sync is handled properly.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-27 10:36:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description M S Vishwanath Bhat 2013-09-04 05:18:27 EDT
Description of problem:
After doing upgrade testing from Anshi U5 build to latest 2.1 build, files are not getting synced even after more than 12 hours. For upgrading the geo-rep, I ran slave-upgrade.sh script which makes the gfid of the slave same as master. But this had not changed the gfid of the symbolic links. Now I ran rm -rf on the master mount point and then untarred the linux kernel. Both of these creates and deletes are not synced to slave volume. After more than 18 hours it's still in .processing.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.30rhs-2.el6rhs.x86_64

How reproducible:
Have hit 1/1. Haven't tried reproducing it.

Steps to Reproduce:
1. Create and start a geo-rep session between 2*2 dist-rep master and 2*2 dist-rep slave. On the 3.3.0.11rhs build.
2. Now create some files on mountopoint by copying /etc into the mount point for about 5 times.
3. Wait for this to get synced to slave volume via geo-rep.
4. Now stop the geo-rep session and stop both the master and slave volume.
5. Install the latest 3.4.0.30rhs-2 gluster build and start the volumes.
6. Run upgrade geo-rep scripts which proper steps to upgrade the slave to latest one.
7. After running slave-upgrade.sh there was difference in the gfids of the symlinks in master and slave.
8. Now create and start the geo-rep session between master and slave again.
9. Execute rm -rf /mnt/master/* && tar -xzvf linux*.tar.gz -C /mnt/master



Actual results:
Even after about more than 18 hours, files are not synced to the slave volume. The status detail still shows bunch of files to be synced.

[root@spitfire ~]# gluster v geo master falcon::slave status detail
 
                                        MASTER: master  SLAVE: falcon::slave
 
NODE                         HEALTH    UPTIME      FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING   
--------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com      Stable    20:34:17    7323           352              2.7MB            2387              
mustang.blr.redhat.com       Stable    20:34:13    0              0                0Bytes           0                 
harrier.blr.redhat.com       Stable    20:34:13    7223           359              2.2MB            2209              
typhoon.blr.redhat.com       Stable    20:34:13    0              0                0Bytes           0                 


[root@spacex ~]# ls /mnt/master/
linux-3.10
[root@spacex ~]# ls /mnt/slave/
etc  etc.1  etc.2  etc.3  etc.4  etc.5  gfid



Expected results:
The deletes and creates should be synced to slave.


Additional info:

In the working dir, the .processing directory had bunch of entries for files to be synced.

[root@spitfire ~]# ls -lrt /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.43.152%3Agluster%3A%2F%2F127.0.0.1%3Aslave/59ddf777397e52a13ba1333653d63854/.processing/
total 5384
-rw-r--r-- 1 root root 387830 Sep  3 18:33 CHANGELOG.1378213414
-rw-r--r-- 1 root root 328662 Sep  3 18:34 CHANGELOG.1378213474
-rw-r--r-- 1 root root 324667 Sep  3 18:35 CHANGELOG.1378213535
-rw-r--r-- 1 root root 319931 Sep  3 18:36 CHANGELOG.1378213595
-rw-r--r-- 1 root root 203135 Sep  3 18:37 CHANGELOG.1378213655
-rw-r--r-- 1 root root 221602 Sep  3 18:38 CHANGELOG.1378213715
-rw-r--r-- 1 root root 209204 Sep  3 18:39 CHANGELOG.1378213775
-rw-r--r-- 1 root root 211050 Sep  3 18:40 CHANGELOG.1378213835
-rw-r--r-- 1 root root 199732 Sep  3 18:41 CHANGELOG.1378213895
-rw-r--r-- 1 root root 216232 Sep  3 18:42 CHANGELOG.1378213955
-rw-r--r-- 1 root root 193036 Sep  3 18:43 CHANGELOG.1378214015
-rw-r--r-- 1 root root 188544 Sep  3 18:44 CHANGELOG.1378214075
-rw-r--r-- 1 root root 186787 Sep  3 18:45 CHANGELOG.1378214136
-rw-r--r-- 1 root root 187200 Sep  3 18:46 CHANGELOG.1378214196
-rw-r--r-- 1 root root 185567 Sep  3 18:47 CHANGELOG.1378214256
-rw-r--r-- 1 root root 205367 Sep  3 18:48 CHANGELOG.1378214316
-rw-r--r-- 1 root root 182104 Sep  3 18:49 CHANGELOG.1378214376
-rw-r--r-- 1 root root 177566 Sep  3 18:50 CHANGELOG.1378214436
-rw-r--r-- 1 root root 180512 Sep  3 18:51 CHANGELOG.1378214496
-rw-r--r-- 1 root root 180398 Sep  3 18:52 CHANGELOG.1378214556
-rw-r--r-- 1 root root 197543 Sep  3 18:53 CHANGELOG.1378214616
-rw-r--r-- 1 root root 198779 Sep  3 18:54 CHANGELOG.1378214676
-rw-r--r-- 1 root root 203346 Sep  3 18:55 CHANGELOG.1378214736
-rw-r--r-- 1 root root 202392 Sep  3 18:56 CHANGELOG.1378214796
-rw-r--r-- 1 root root 156091 Sep  3 18:57 CHANGELOG.1378214857


I will archive all the logs.
Comment 2 M S Vishwanath Bhat 2013-09-04 06:44:04 EDT
I tried to restart the geo-replication session  as a work around. But that doesn't fix it completely. The after restart the newly created files are synced but the deletes are not synced anyway.
Comment 3 Amar Tumballi 2013-09-05 01:56:13 EDT
Was the 'rm -rf' done during xsync crawl? if yes, it ill not be synced to the slave side
Comment 4 Amar Tumballi 2013-09-11 09:28:23 EDT
> After doing upgrade testing from Anshi U5 build to latest 2.1 build, files are not getting synced even after more than 12 hours. For upgrading the geo-rep, I ran slave-upgrade.sh script which makes the gfid of the slave same as master. But this had not changed the gfid of the symbolic links. 

Considering bug 1001089 is fixed, the above issue should not be happening again. Can we test it with glusterfs-3.4.0.33rhs ?
Comment 5 M S Vishwanath Bhat 2013-09-12 07:10:46 EDT
(In reply to Amar Tumballi from comment #4)
> > After doing upgrade testing from Anshi U5 build to latest 2.1 build, files are not getting synced even after more than 12 hours. For upgrading the geo-rep, I ran slave-upgrade.sh script which makes the gfid of the slave same as master. But this had not changed the gfid of the symbolic links. 
> 
> Considering bug 1001089 is fixed, the above issue should not be happening
> again. Can we test it with glusterfs-3.4.0.33rhs ?

No it doesn't happen with glusterfs-3.4.0.33rhs. rm -rf propagates to the slave properly.

Can I directly move it to verified?
Comment 6 M S Vishwanath Bhat 2013-09-12 07:16:19 EDT
It's working now. Moving to verified.

Tested in version: glusterfs-3.4.0.33rhs-1.el6rhs.x86_64
Comment 8 errata-xmlrpc 2013-11-27 10:36:43 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html

Note You need to log in before you can comment on or make changes to this bug.