1219412 – Geo-Replication - Fails to handle file renaming correctly between master and slave

Bug 1219412 - Geo-Replication - Fails to handle file renaming correctly between master and slave

Summary: Geo-Replication - Fails to handle file renaming correctly between master and ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	3.7.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1141379
Blocks:	1002026 1154307
TreeView+	depends on / blocked

Reported:	2015-05-07 09:30 UTC by Saravanakumar
Modified:	2015-05-14 17:35 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.0beta2
Clone Of:	1141379
Environment:
Last Closed:	2015-05-14 17:27:33 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Saravanakumar 2015-05-07 09:30:22 UTC

+++ This bug was initially created as a clone of Bug #1141379 +++

Description of problem:
Geo-Replication fails to correctly rename a file/folder when they are created and renamed quickly.

This is especially noticeable when using Windows mounted Samba shares and a new folder is created. Windows creates the folder as New Folder instantly, and gives the option of renaming. If a new name is typed, both New Folder and the newly renamed folder will be created on the remote volume.

Version-Release number of selected component (if applicable):
3.5.2

How reproducible:
Consistent

Steps to Reproduce:
1. Standard 3.5 distributed geo-rep setup 
2. Run the following as a script:

mkdir test
mv test test2
chown 100:100 test2

3. Let Geo Rep sync the directories

Actual results:
Node 1 is the master
Node 2 is the slave

[root@node1 gluster]# ls -la
total 12
drwxr-xr-x. 4 root root  4096 Sep 12 22:13 .
drwxr-xr-x. 5 root root  4096 Aug 23 17:57 ..
drwxr-xr-x. 2  100 users 4096 Sep 12 21:54 test2

[root@node2 gluster]# ls -la
total 28
drwxr-xr-x. 5 root root  8192 Aug 23 18:03 .
drwxr-xr-x. 6 root root  4096 Aug 23 17:41 ..
drwxr-xr-x. 2  100 users 8192 Aug 23 17:45 test
drwxr-xr-x. 2  100 users 8192 Aug 23 17:45 test2


Expected results:

[root@node2 gluster]# ls -la
total 28
drwxr-xr-x. 5 root root  8192 Aug 23 18:03 .
drwxr-xr-x. 6 root root  4096 Aug 23 17:41 ..
drwxr-xr-x. 2  100 users 8192 Aug 23 17:45 test2

Additional info:
None

--- Additional comment from James Payne on 2014-09-12 17:27:19 EDT ---

Testing this on files gives the same results where two files will be created instead of one on the remote volume

--- Additional comment from James Payne on 2014-09-12 17:28:45 EDT ---

[root@node1 gluster]# gluster volume geo-rep archive0 192.168.0.16::archive0 status detail

MASTER NODE    MASTER VOL    MASTER BRICK    SLAVE                     STATUS    CHECKPOINT STATUS    CRAWL STATUS       FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
node1          archive0      /mnt/brick0     192.168.0.16::archive0    Active    N/A                  Changelog Crawl    7              0                0                0                  0
node1          archive0      /mnt/brick1     192.168.0.16::archive0    Active    N/A                  Changelog Crawl    7              0                0                0                  0

--- Additional comment from James Payne on 2014-09-15 17:07:59 EDT ---

I have installed a 3.4.5 cluster today and verified the behavior does not exist in that release if that helps.

--- Additional comment from James Payne on 2014-10-05 17:58:57 EDT ---

Note that this bug is still present in 3.5.3 beta 1.

More worryingly I have also noticed the following:

Running the script below:

touch filejp1 
mv filejp1 jp1
rm -f jp1

touch filejp2
mv filejp2 jp2
rm -f jp2


Causes empty files filejp1 and filejp2 to be synced to the remote node. 

Once the above syncs with the two incorrect files run:

echo content > filejp1

More concerning is that the content which should now be in filejp1 is never synced at all, I'm guessing as filejp1 is now 'new' to the local side. The logs suggest gfid issues?

Logs as follows:
[2014-10-05 22:48:21.731467] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:21.731796] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:21.732918] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:21.733189] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:22.585704] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:22.586113] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:22.606767] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:22.607155] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:23.355896] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:23.356179] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:23.356673] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:23.356923] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:24.94445] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:24.94724] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:24.113476] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:24.113760] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:24.972117] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:24.972403] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:24.973083] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:24.973351] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:25.692605] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:25.692920] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:25.711666] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:25.711941] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:26.587243] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:26.587522] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:26.588766] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:26.589200] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:27.196441] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:27.196727] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:27.216696] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:27.216976] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:28.204376] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:28.204658] W [master(/mnt/brick0):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:28.205504] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:28.205862] W [master(/mnt/brick1):877:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1412545699
[2014-10-05 22:48:28.811841] W [master(/mnt/brick1):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:28.812125] W [master(/mnt/brick1):860:process] _GMaster: changelogs CHANGELOG.1412545699 could not be processed - moving on...
[2014-10-05 22:48:28.813575] W [master(/mnt/brick1):862:process] _GMaster: SKIPPED GFID =
[2014-10-05 22:48:29.299180] W [master(/mnt/brick0):250:regjob] <top>: Rsync: .gfid/96b35cb5-96cc-43cc-8cd3-11dace030d68 [errcode: 23]
[2014-10-05 22:48:29.299433] W [master(/mnt/brick0):860:process] _GMaster: changelogs CHANGELOG.1412545699 could not be processed - moving on...

--- Additional comment from Aravinda VK on 2014-10-15 01:52:24 EDT ---

Please try the 3.6 beta 3 version. Two patches landed related to the rename and retry issues.

http://review.gluster.org/#/c/8879/
http://review.gluster.org/#/c/8880/

--- Additional comment from Kingsley Tart on 2014-10-15 06:10:18 EDT ---

I have tested geo-replication on 3.6.0 beta3 on CentOS 6.5 and have managed to run into what appears to be the same issue.

I posted this to the gluster-users mailing list; my original email is below. It's worth me adding that since geo-replication broke, if I query the
volume status (in this instance, on my test1 server), I get this:

test1# gluster volume status
Another transaction is in progress. Please try again after sometime.

It's still giving this error, 24 hours later.

Original message posted to list:

I have a small script to simulate file activity for an application we
have. It breaks geo-replication within about 15 - 20 seconds when I try
it.

This is on a small Gluster test environment running in some VMs running
CentOS 6.5 and using gluster 3.6.0 beta3. I have 6 VMs - test1, test2,
test3, test4, test5 and test6. test1, test2 , test3 and test4 are
gluster servers while test5 and test6 are the clients. test3 is actually
not used in this test.


Before the test, I had a single gluster volume as follows:

test1# gluster volume status
Status of volume: gv0
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick test1:/data/brick/gv0                             49168   Y       12017
Brick test2:/data/brick/gv0                             49168   Y       11835
NFS Server on localhost                                 2049    Y       12032
Self-heal Daemon on localhost                           N/A     Y       12039
NFS Server on test4                                     2049    Y       7934
Self-heal Daemon on test4                               N/A     Y       7939
NFS Server on test3                                     2049    Y       11768
Self-heal Daemon on test3                               N/A     Y       11775
NFS Server on test2                                     2049    Y       11849
Self-heal Daemon on test2                               N/A     Y       11855

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks


I created a new volume and set up geo-replication as follows (as these
are test machines I only have one file system on each, hence using
"force" to create the bricks in the root FS):

test4# date ; gluster volume create gv0-slave test4:/data/brick/gv0-slave force; date
Mon Oct 13 15:03:14 BST 2014
volume create: gv0-slave: success: please start the volume to access data
Mon Oct 13 15:03:15 BST 2014

test4# date ; gluster volume start gv0-slave; date
Mon Oct 13 15:03:36 BST 2014
volume start: gv0-slave: success
Mon Oct 13 15:03:39 BST 2014

test4# date ; gluster volume geo-replication gv0 test4::gv0-slave create push-pem force ; date
Mon Oct 13 15:05:59 BST 2014
Creating geo-replication session between gv0 & test4::gv0-slave has been successful
Mon Oct 13 15:06:11 BST 2014


I then mount volume gv0 on one of the client machines. I can create
files within the gv0 volume and can see the changes being replicated to
the gv0-slave volume, so I know that geo-replication is working at the
start.

When I run my script (which quickly creates, deletes and renames files),
geo-replication breaks within a very short time. The test script output
is in
http://gluster.dogwind.com/files/georep20141013/test6_script-output.log
(I interrupted the script once I saw that geo-replication was broken).
Note that when it deletes a file, it renames any later-numbered file so
that the file numbering remains sequential with no gaps; this simulates
a real world application that we use.

If you want a copy of the test script, it's here:
http://gluster.dogwind.com/files/georep20141013/test_script.tar.gz


The various gluster log files can be downloaded from here:
http://gluster.dogwind.com/files/georep20141013/ - each log file has the
actual log file path at the top of the file.

If you want to run the test script on your own system, edit test.pl so
that @mailstores contains a directory path to a gluster volume.

My systems' timezone is BST (GMT+1 / UTC+1) so any timestamps outside of
gluster logs are in this timezone.

Let me know if you need any more info.

--- Additional comment from James Payne on 2014-10-15 17:17:57 EDT ---

(In reply to Aravinda VK from comment #5)
> Please try the 3.6 beta 3 version. Two patches landed related to the rename
> and retry issues.
> 
> http://review.gluster.org/#/c/8879/
> http://review.gluster.org/#/c/8880/

Can confirm the issue still exists in beta3. 

v easy to replicate 

touch file && mv file file2

Will create 2 files on the geo-replica and only one on the primary cluster

--- Additional comment from James Payne on 2014-10-15 17:37:15 EDT ---

Not sure if it helps:

Debug output for the following command:

touch file && mv file JAMESFILE

[2014-10-15 22:31:09.581840] D [repce(/mnt/brick1):209:__call__] RepceClient: call 22162:140250622097152:1413408669.58 getchanges -> ['/var/lib/misc/glusterfsd/archive0/ssh%3A%2F%2Froot%40192.168.0.16%3Agluster%3A%2F%2F127.0.0.1%3Aarchive0/63c9ed499e50ba44b851fa8a63110d6e/.processing/CHANGELOG.1413408667']
[2014-10-15 22:31:09.581922] I [master(/mnt/brick1):1182:crawl] _GMaster: slave's time: (1413408621, 0)
[2014-10-15 22:31:09.581996] D [master(/mnt/brick1):1193:crawl] _GMaster: processing changes ['/var/lib/misc/glusterfsd/archive0/ssh%3A%2F%2Froot%40192.168.0.16%3Agluster%3A%2F%2F127.0.0.1%3Aarchive0/63c9ed499e50ba44b851fa8a63110d6e/.processing/CHANGELOG.1413408667']
[2014-10-15 22:31:09.582066] D [master(/mnt/brick1):945:process] _GMaster: processing change /var/lib/misc/glusterfsd/archive0/ssh%3A%2F%2Froot%40192.168.0.16%3Agluster%3A%2F%2F127.0.0.1%3Aarchive0/63c9ed499e50ba44b851fa8a63110d6e/.processing/CHANGELOG.1413408667
[2014-10-15 22:31:09.582959] D [master(/mnt/brick1):905:process_change] _GMaster: entries: [{'uid': 0, 'gfid': 'ecdf997c-0092-452f-914e-1a1ccad3ebc0', 'gid': 0, 'mode': 33188, 'entry': '.gfid/00000000-0000-0000-0000-000000000001/file', 'op': 'CREATE'}, {'gfid': 'ecdf997c-0092-452f-914e-1a1ccad3ebc0', 'entry': '.gfid/00000000-0000-0000-0000-000000000001/file', 'stat': {'gid': 0, 'uid': 0, 'mode': 33188}, 'entry1': '.gfid/00000000-0000-0000-0000-000000000001/JAMESFILE', 'op': 'RENAME'}]
[2014-10-15 22:31:09.584343] D [repce(/mnt/brick1):191:push] RepceClient: call 22162:140250622097152:1413408669.58 entry_ops([{'uid': 0, 'gfid': 'ecdf997c-0092-452f-914e-1a1ccad3ebc0', 'gid': 0, 'mode': 33188, 'entry': '.gfid/00000000-0000-0000-0000-000000000001/file', 'op': 'CREATE'}, {'gfid': 'ecdf997c-0092-452f-914e-1a1ccad3ebc0', 'entry': '.gfid/00000000-0000-0000-0000-000000000001/file', 'stat': {'gid': 0, 'uid': 0, 'mode': 33188}, 'entry1': '.gfid/00000000-0000-0000-0000-000000000001/JAMESFILE', 'op': 'RENAME'}],) ...
[2014-10-15 22:31:09.586972] D [repce(/mnt/brick1):209:__call__] RepceClient: call 22162:140250622097152:1413408669.58 entry_ops -> None
[2014-10-15 22:31:09.587193] D [repce(/mnt/brick1):191:push] RepceClient: call 22162:140250622097152:1413408669.59 done('/var/lib/misc/glusterfsd/archive0/ssh%3A%2F%2Froot%40192.168.0.16%3Agluster%3A%2F%2F127.0.0.1%3Aarchive0/63c9ed499e50ba44b851fa8a63110d6e/.processing/CHANGELOG.1413408667',) ...

--- Additional comment from Aravinda VK on 2014-10-16 00:43:42 EDT ---

(In reply to James Payne from comment #7)
> (In reply to Aravinda VK from comment #5)
> > Please try the 3.6 beta 3 version. Two patches landed related to the rename
> > and retry issues.
> > 
> > http://review.gluster.org/#/c/8879/
> > http://review.gluster.org/#/c/8880/
> 
> Can confirm the issue still exists in beta3. 
> 
> v easy to replicate 
> 
> touch file && mv file file2
> 
> Will create 2 files on the geo-replica and only one on the primary cluster

With the above patches, GFID mismatch and rsync retry issues are resolved, Do you see any data loss in Slave? both file and file2 present in slave, Is data intact in file2?

Geo-rep has a known issue with Rename handling when renamed file fall into separate brick other than the brick in which it is created.

Summarized RENAME issues here, http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042316.html

Development is in progress to fix all these RENAME issues. Will update the bug with progress.

--- Additional comment from James Payne on 2014-10-16 16:22:34 EDT ---

(In reply to Aravinda VK from comment #9)
> (In reply to James Payne from comment #7)
> > (In reply to Aravinda VK from comment #5)
> > > Please try the 3.6 beta 3 version. Two patches landed related to the rename
> > > and retry issues.
> > > 
> > > http://review.gluster.org/#/c/8879/
> > > http://review.gluster.org/#/c/8880/
> > 
> > Can confirm the issue still exists in beta3. 
> > 
> > v easy to replicate 
> > 
> > touch file && mv file file2
> > 
> > Will create 2 files on the geo-replica and only one on the primary cluster
> 
> With the above patches, GFID mismatch and rsync retry issues are resolved,
> Do you see any data loss in Slave? both file and file2 present in slave, Is
> data intact in file2?
> 
> Geo-rep has a known issue with Rename handling when renamed file fall into
> separate brick other than the brick in which it is created.
> 
> Summarized RENAME issues here,
> http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042316.
> html
> 
> Development is in progress to fix all these RENAME issues. Will update the
> bug with progress.



You can get data loss on slave in the following scenario:

touch file1 && mv file2

** Wait for replication

echo contents > file1


You will find on the geo replica that file1 is now no longer consistent with the local copies. The same goes for change of attributes as well. I believe the contents of file2 in the case above will be correct but file1 will never be consistent with the local copy.

Comment 1 Anand Avati 2015-05-07 09:46:11 UTC

REVIEW: http://review.gluster.org/10628 (geo-rep: rename handling in dht volume) posted (#1) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)

Comment 2 Anand Avati 2015-05-07 11:39:26 UTC

REVIEW: http://review.gluster.org/10633 (geo-rep: rename handling in dht volume) posted (#1) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)

Comment 3 Anand Avati 2015-05-07 12:31:00 UTC

REVIEW: http://review.gluster.org/10633 (geo-rep: rename handling in dht volume) posted (#2) for review on release-3.7 by Kotresh HR (khiremat)

Comment 4 Anand Avati 2015-05-08 06:04:33 UTC

REVIEW: http://review.gluster.org/10633 (geo-rep: rename handling in dht volume(changelog changes)) posted (#3) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)

Comment 5 Anand Avati 2015-05-08 13:46:55 UTC

REVIEW: http://review.gluster.org/10633 (geo-rep: rename handling in dht volume(changelog changes)) posted (#4) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)

Comment 6 Anand Avati 2015-05-09 04:53:46 UTC

REVIEW: http://review.gluster.org/10633 (geo-rep: rename handling in dht volume(changelog changes)) posted (#5) for review on release-3.7 by Kotresh HR (khiremat)

Comment 7 Anand Avati 2015-05-09 09:08:12 UTC

COMMIT: http://review.gluster.org/10633 committed in release-3.7 by Vijay Bellur (vbellur) 
------
commit c466b137b0cabb844ce7a1f92549ff9b72369830
Author: Saravanakumar Arumugam <sarumuga>
Date:   Tue May 5 17:03:39 2015 +0530

    geo-rep: rename handling in dht volume(changelog changes)
    
    Background:
    
    Glusterfs changelogs are stored in each brick, which records the changes
    happened in that brick. Georep will run in all the nodes of master and processes
    changelogs "independently".  Processing changelogs is in brick level, but
    all the fops will be replayed on "slave mount" point.
    
    Problem:
    
    With a DHT volume, in changelog "internal fops" are NOT recorded.
    For Rename case, Rename is recorded in "hashed" brick changelog.
    (DHT's internal fops like creating linkto file, unlink is NOT recorded).
    This lead us to inconsistent rename operations.
    
    For example,
    Distribute volume created with Two bricks B1, B2.
    
    //Consider master volume mounted @ /mnt/master and following operations
    executed:
    cd /mnt/master
    touch f1 // f1 falls on B1 Hash
    mv f1 f2 // f2 falls on B2 Hash
    
    // Here, Changelogs are recorded as below:
    @B1
    CREATE f1
    
    @B2
    RENAME f1 f2
    
    Here, race exists between Brick B1 and B2, say B2 will get executed first.
    Source file f1 itself is "NOT PRESENT", so it will go ahead and create
    f2 (Current implementation).
    
    We have this problem When rename falls in another brick and
    file is unlinked in Master.
    
    Similar kind of issue exists in following case too(multiple rename):
    CREATE f1
    RENAME f1 f2
    RENAME f2 f1
    
    Solution:
    
    Instead of carrying out "changelogging" at "HASHED volume",
    carry out  at the "CACHED volume".
    This way we have rename operations carried out where actual files are present.
    
    So,Changelog recorded as :
    @B1
    CREATE f1
    RENAME f1 f2
    
    Note:
    This patch is dependent on dht changes from this patch.
    http://review.gluster.org/10410/
    changelog related changes are separated out for review.
    
    In changelog, xdata passed from DHT is considered as:
    
    1. In case of unlink (internal operation as part of rename), xdata value
    is set , it is considered as RENAME and recorded accordingly.
    2. In case of rename (Hash and Cache different), xdata value is NOT
    set, recording rename operation is SKIPPED.
    
    BUG: 1219412
    Change-Id: I7691166c84991482b2cfe073df64e2317c935b13
    Reviewed-On: http://review.gluster.org/#/c/10220/
    Signed-off-by: Saravanakumar Arumugam <sarumuga>
    Reviewed-on: http://review.gluster.org/10633
    Reviewed-by: Aravinda VK <avishwan>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 Aravinda VK 2015-05-10 03:12:03 UTC

DHT Patch http://review.gluster.org/#/c/10628/ and Changelog patch http://review.gluster.org/#/c/10633/ are merged.

Comment 9 Niels de Vos 2015-05-14 17:27:33 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 10 Niels de Vos 2015-05-14 17:28:57 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Niels de Vos 2015-05-14 17:35:21 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.