1139156 – dist-geo-rep: Few files are not synced to slave when files are being created during geo-rep start

Bug 1139156 - dist-geo-rep: Few files are not synced to slave when files are being created during geo-rep start

Summary: dist-geo-rep: Few files are not synced to slave when files are being created ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.3
Assignee:	Kotresh HR
QA Contact:	M S Vishwanath Bhat
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1139196 1159205 1162694
TreeView+	depends on / blocked

Reported:	2014-09-08 09:04 UTC by M S Vishwanath Bhat
Modified:	2016-06-01 01:56 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.6.0.31-1.el6rhs
Doc Type:	Bug Fix
Doc Text:	Previously, Geo-replication missed synchronizing a few files to slave when I/O happened during geo-replication start. With this fix, slave does not miss any files if I/O happens during geo-replication start.
Clone Of:
Clones:	1139196 (view as bug list)
Environment:
Last Closed:	2015-01-15 13:39:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0038	0	normal	SHIPPED_LIVE	Red Hat Storage 3.0 enhancement and bug fix update #3	2015-01-15 18:35:28 UTC

Description M S Vishwanath Bhat 2014-09-08 09:04:49 UTC

Description of problem:
If the I/O is running during the geo-rep start time, then few of the files are not synced to slave.


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.28-1.el6rhs.x86_64

How reproducible:
Seems to be reproducible frequently.

Steps to Reproduce:
1. create master and slave volume and start file creation in master.
2. Create and start geo-rep.
3. Wait for files to sync to slave.

Actual results:
Few files are skipped while syncing. As a result, slave will not be having all the files in master.

Expected results:
All the files in master should be synced to slave in any case.

Additional info:


log info from one of the log.

[2014-09-08 12:51:40.558691] W [syncdutils(slave):480:errno_wrap] <top>: reached maximum retries (['.gfid/9611d805-4647-4647-b969-0f5596f8591c', 'glusterfs.gfid.newfile', '\x00\x00\x00\x00\x00\x00\x00\x0046a7f526-fbac-41f8-ac18-2c46b05bfddc\x00\x00\x00\x81\x80tc-actions-env-rules.txt\x00\x00\x00\x01\x80\x00\x00\x00\x00\x00\x00\x00\x00'])...
[2014-09-08 12:51:45.628182] W [syncdutils(slave):480:errno_wrap] <top>: reached maximum retries (['.gfid/9611d805-4647-4647-b969-0f5596f8591c', 'glusterfs.gfid.newfile', '\x00\x00\x00\x00\x00\x00\x00\x004abd9cb1-a7dd-4228-aa47-873c343ebcfd\x00\x00\x00A\xc0timestamping\x00\x00\x00\x01\xc0\x00\x00\x00\x00'])...
[2014-09-08 12:51:50.994118] W [syncdutils(slave):480:errno_wrap] <top>: reached maximum retries (['.gfid/4abd9cb1-a7dd-4228-aa47-873c343ebcfd', 'glusterfs.gfid.newfile', '\x00\x00\x00\x00\x00\x00\x00\x0037d26f46-d8d3-4cec-90b1-87c5efd45eb3\x00\x00\x00\x81\x80.gitignore\x00\x00\x00\x01\x80\x00\x00\x00\x00\x00\x00\x00\x00'])...
[2014-09-08 12:51:56.281146] W [syncdutils(slave):480:errno_wrap] <top>: reached maximum retries (['.gfid/4abd9cb1-a7dd-4228-aa47-873c343ebcfd', 'glusterfs.gfid.newfile', '\x00\x00\x00\x00\x00\x00\x00\x000a373591-3499-4642-94b3-7858227a9de0\x00\x00\x00\x81\x80timestamping.c\x00\x00\x00\x01\x80\x00\x00\x00\x00\x00\x00\x00\x00'])...
[2014-09-08 12:52:01.956902] W [syncdutils(slave):480:errno_wrap] <top>: reached maximum retries (['.gfid/9611d805-4647-4647-b969-0f5596f8591c', 'glusterfs.gfid.newfile', '\x00\x00\x00\x00\x00\x00\x00\x0064cecf23-d45e-4976-aa99-c75eb23c2f90\x00\x00\x00\x81\x80tlan.txt\x00\x00\x00\x01\x80\x00\x00\x00\x00\x00\x00\x00\x00'])...
[2014-09-08 12:52:07.236882] W [syncdutils(slave):480:errno_wrap] <top>: reached maximum retries (['.gfid/9611d805-4647-4647-b969-0f5596f8591c', 'glusterfs.gfid.newfile', '\x00\x00\x00\x00\x00\x00\x00\x002c7f5d32-ece4-4288-bf00-aeb79316138b\x00\x00\x00\x81\x80vxge.txt\x00\x00\x00\x01\x80\x00\x00\x00\x00\x00\x00\x00\x00'])...

Comment 2 Kotresh HR 2014-09-08 11:22:36 UTC

Patch sent upstream:
http://review.gluster.org/#/c/8650/

Comment 5 Kotresh HR 2014-10-28 09:23:59 UTC

Upstream Patch: (Status: Merged)
http://review.gluster.org/#/c/8650/

Downstream Patch:
https://code.engineering.redhat.com/gerrit/#/c/35561/

Comment 7 shilpa 2014-11-21 11:58:36 UTC

Verified on glusterfs-3.6.0.33-1.el6rhs.x86_64

All data created during geo-rep start synced to the slave successfully. No files/dirs were skipped.

# gluster v geo master acdc::slave stop
Stopping geo-replication session between master & acdc::slave has been successful
# gluster v geo master acdc::slave start
Starting geo-replication session between master & acdc::slave has been successful
# for i in {1..100};do touch f$i;done
# ls
dir1    dir2   dir30  dir41  dir52  dir63  dir74  dir85  dir96  f16  f27  f38  f49  f6   f70  f81  f92
dir10   dir20  dir31  dir42  dir53  dir64  dir75  dir86  dir97  f17  f28  f39  f5   f60  f71  f82  f93
dir100  dir21  dir32  dir43  dir54  dir65  dir76  dir87  dir98  f18  f29  f4   f50  f61  f72  f83  f94
dir11   dir22  dir33  dir44  dir55  dir66  dir77  dir88  dir99  f19  f3   f40  f51  f62  f73  f84  f95
dir12   dir23  dir34  dir45  dir56  dir67  dir78  dir89  f1     f2   f30  f41  f52  f63  f74  f85  f96
dir13   dir24  dir35  dir46  dir57  dir68  dir79  dir9   f10    f20  f31  f42  f53  f64  f75  f86  f97
dir14   dir25  dir36  dir47  dir58  dir69  dir8   dir90  f100   f21  f32  f43  f54  f65  f76  f87  f98
dir15   dir26  dir37  dir48  dir59  dir7   dir80  dir91  f11    f22  f33  f44  f55  f66  f77  f88  f99
dir16   dir27  dir38  dir49  dir6   dir70  dir81  dir92  f12    f23  f34  f45  f56  f67  f78  f89
dir17   dir28  dir39  dir5   dir60  dir71  dir82  dir93  f13    f24  f35  f46  f57  f68  f79  f9
dir18   dir29  dir4   dir50  dir61  dir72  dir83  dir94  f14    f25  f36  f47  f58  f69  f8   f90
dir19   dir3   dir40  dir51  dir62  dir73  dir84  dir95  f15    f26  f37  f48  f59  f7   f80  f91
[root@ccr master]# gluster v geo master acdc::slave status
 
MASTER NODE                 MASTER VOL    MASTER BRICK      SLAVE               STATUS             CHECKPOINT STATUS    CRAWL STATUS        
-------------------------------------------------------------------------------------------------------------------------------------
abc          master        /bricks/brick0    nirvana::slave      Initializing...    N/A                  N/A                 
dfg   master        /bricks/brick1    acdc::slave         Initializing...    N/A                  N/A                 
hij      master        /bricks/brick3    rammstein::slave    Initializing...    N/A                  N/A                 
klm    master        /bricks/brick2    led::slave          Initializing...    N/A                  N/A                 

# ls -l /mnt/master | wc -l
201

# ls -l /mnt/slave | wc -l
201

arequal-checksum of master is : 
 
Entry counts
Regular files   : 200
Directories     : 111
Symbolic links  : 0
Other           : 0
Total           : 311

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 00
Directories     : 30313130312e00
Symbolic links  : 0
Other           : 0
Total           : 30313130312e00

arequal-checksum of geo_rep_slave slave: 
 
Entry counts
Regular files   : 200
Directories     : 111
Symbolic links  : 0
Other           : 0
Total           : 311

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 00
Directories     : 30313130312e00
Symbolic links  : 0
Other           : 0
Total           : 30313130312e00

Successfully synced all the files from master to the slave

Comment 8 Shalaka 2015-01-09 09:30:14 UTC

Please review and sign-off edited doc text.

Comment 9 Kotresh HR 2015-01-12 09:05:41 UTC

Doc text looks fine to me.

Comment 11 errata-xmlrpc 2015-01-15 13:39:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html

Note You need to log in before you can comment on or make changes to this bug.