Bug 1512483

Summary: Not all files synced using geo-replication
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bugs, dimitri.ars, khiremat, moagrawa
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-4.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1510342 Environment:
Last Closed: 2018-03-15 11:20:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1510342    
Bug Blocks:    

Description Kotresh HR 2017-11-13 10:42:08 UTC
+++ This bug was initially created as a clone of Bug #1510342 +++

Description of problem:
When using Sonatype Nexus3 as a docker repository on glusterfs and geo-replicating this volume, at least the .bytes files which contain the docker layer data do not get synced. The files are created on the geo-replicated site, but remain 0 bytes. Other files, like the .properties files are synced properly.
The moment you add a character to the .bytes file manually (echo >>), the .bytes file data does get synced...it seems like gluster doesn't detect writing data to the file in some cases, at least the way Nexus3 does it to those .bytes files. We suspect that there will more applications / files affected by this, resulting in a corrupt / incomplete data on the geo-replicated site.

Version-Release number of selected component (if applicable):
3.12.1-2

How reproducible:
100%

Steps to Reproduce:
1. Run Sonatype Nexus3 with a hosted docker repository and it's data (/nexus-data) on a glusterfs volume which is geo-replicated.
2. docker push an arbitrary image into this nexus docker repo
3. ls -laRf blobs/default/content | grep .bytes
   on both main site and geo-replicated site and see that the ones on the main site are non-0 bytes and on the geo-replicated site they're 0 bytes

Actual results:
main:
-rw-r--r--. 2 200 200  529 Nov  3 15:27 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes
-rw-r--r--. 1 200 200 1991435 Nov  3 19:00 613953bb-b542-4db7-ba18-01a331361994.bytes


geo-replicated:
-rw-r--r--. 0 root root    0 Nov  3 19:01 613953bb-b542-4db7-ba18-01a331361994.bytes
-rw-r--r--. 0 root root    0 Nov  3 15:28 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes

Expected results:
main:
-rw-r--r--. 2 200 200  529 Nov  3 15:27 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes
-rw-r--r--. 1 200 200 1991435 Nov  3 19:00 613953bb-b542-4db7-ba18-01a331361994.bytes


geo-replicated:
-rw-r--r--. 2 200 200  529 Nov  3 15:27 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes
-rw-r--r--. 1 200 200 1991435 Nov  3 19:00 613953bb-b542-4db7-ba18-01a331361994.bytes

Additional info:
Tried both rsync and use-tarssh, same issue.
Date/time is the same on main and geo-replicated site servers
An initial sync does correctly sync the .bytes files.
Maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=1437244

--- Additional comment from Mohit Agrawal on 2017-11-08 23:06:21 EST ---

Hi,

 Can you please share the brick logs from master and slave nodes?

Regards
Mohit Agrawal

--- Additional comment from Dimitri Ars on 2017-11-09 02:29 EST ---



--- Additional comment from Dimitri Ars on 2017-11-09 02:30 EST ---



--- Additional comment from Dimitri Ars on 2017-11-09 02:52:40 EST ---

logs attached, not containing the files from the first comment, but others which have the same problem, for example:

[root@X94pabgluster0 chap-24]# ls -al
total 9
drwxr-sr-x. 2  200 200 4096 Nov  8 20:04 .
drwxr-sr-x. 3  200 200 4096 Nov  8 20:04 ..
-rw-r--r--. 0 root 200    0 Nov  8 20:04 d327a69b-f9fb-455e-80b6-5bec8df0b6b9.bytes
-rw-r--r--. 1  200 200  356 Nov  8 20:04 d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties
[root@X94pabgluster0 chap-24]# getfattr -n glusterfs.gfid2path /mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties
getfattr: Removing leading '/' from absolute path names
# file: mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties
glusterfs.gfid2path="/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties"

[root@X94pabgluster0 chap-24]# getfattr -n glusterfs.gfid2path /mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.bytes
getfattr: Removing leading '/' from absolute path names
# file: mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.bytes
glusterfs.gfid2path="/blobs/default/content/vol-29/chap-39/9ba4d149-3d54-4988-912a-cf3e81a4854d.bytes"


So there's a 0 bytes file with 0 links owned by root (but the group is already the "destination owner".

--- Additional comment from Kotresh HR on 2017-11-09 05:31:34 EST ---

Hi,

We are interested in the I/O pattern that's get recorded in the changelog. 
Could you please share the changelogs from below path. It will be a tar file.

/var/lib/glusterfsd/misc/<master-vol-name>/<ssh....>/<md5sum of brickpath>/.processed/archive<date>.tar

--- Additional comment from Dimitri Ars on 2017-11-09 07:47 EST ---

changelog for Nexus3 adding docker images and anything else it does...we didn't use nexus to do other transactions, so hopefully it isn't too cluttered. As stated, the .properties files created are doing fine, the .bytes files end up as 0 on site B...
Don't know about the strange gfid2path's on site A and B for the .bytes files...

--- Additional comment from Dimitri Ars on 2017-11-09 16:05:38 EST ---

Did some more testing...reproducing the problem is easily done with for example the following shell commands:

cp /etc/group f1 ; \
mv f1 f2 ; \
ln f2 f3 ; \
mv f3 f4 ; \
unlink f2

file f4 should contain the /etc/group contents, and it does on site A, but on site B it's the 0 bytes...

A:
-rw-r--r--.  1 nexus nexus  395 Nov  9 21:00 f4

B:
-rw-r--r--.  0 root 200    0 Nov  9 21:00 f4

--- Additional comment from Dimitri Ars on 2017-11-09 16:32:32 EST ---

Even further deduced;
echo testing > f1 ; \
ln f1 f2 ; \
mv f2 f3 ; \
unlink f1

Then f3 is the problem file.
If you takeout the unlink things are a bit better but still not correct, we then have 2 linked files (f1 and f3) on site A, and 2 separate files (f1 and f3) on site B, both have the correct content, but expected that site B would have 2 linked files as well..
A:
-rw-r--r--.  2 nexus nexus    8 Nov  9 21:16 f1
-rw-r--r--.  2 nexus nexus    8 Nov  9 21:16 f3
B:
-rw-r--r--.  1 200 200    8 Nov  9 21:16 f1
-rw-r--r--.  1 200 200    8 Nov  9 21:16 f3

If I leave out the rename of the hardlink as well, things go fine. If I then rename f2 to f3 after waiting for 15 seconds or so (changelog rollover) then the rename goes fine as well, all correct on both A and B.
It looks kind of like https://bugzilla.redhat.com/show_bug.cgi?id=1448914 which had this issue for extended attributes. Also looks like https://bugzilla.redhat.com/show_bug.cgi?id=1296175 was somewhat related.

--- Additional comment from Dimitri Ars on 2017-11-10 06:17:46 EST ---

Although it's easy to reproduce and see, this is the related error logging happening on site B geo-replication-slaves log (4a734e2c-202a-4b11-8676-9a3219dc2101:192.168.5.7.%2Fvar%2Flib%2Fheketi%2Fmounts%2Fvg_1f8ca94513acde49ebe3167b58004159%2Fbrick_fe0308e73bb49c5a30f504ef853a731d%2Fbrick.vol_20e37cc674b396d041691341d69b81a6.gluster.log:)

[2017-11-10 10:06:06.718657] W [MSGID: 114031] [client-rpc-fops.c:493:client3_3_stat_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-2: remote operation failed [No such file or directory]
[2017-11-10 10:06:06.719246] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-2: remote operation failed. Path: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 (5cbc6952-f50c-4040-af3b-cd9dfb2f8596) [No such file or directory]
[2017-11-10 10:06:06.719300] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-1: remote operation failed. Path: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 (5cbc6952-f50c-4040-af3b-cd9dfb2f8596) [No such file or directory]
[2017-11-10 10:06:06.719322] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-0: remote operation failed. Path: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 (5cbc6952-f50c-4040-af3b-cd9dfb2f8596) [No such file or directory]
[2017-11-10 10:06:06.721119] E [MSGID: 109040] [dht-helper.c:1378:dht_migration_complete_check_task] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-dht: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596: failed to lookup the file on vol_eeaf4e18532e9769aed04199eda0d1bd-dht [No such file or directory]
[2017-11-10 10:06:06.721213] W [fuse-bridge.c:874:fuse_attr_cbk] 0-glusterfs-fuse: 2510782: STAT() /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 => -1 (No such file or directory)

Comment 1 Worker Ant 2017-11-13 10:46:44 UTC
REVIEW: https://review.gluster.org/18731 (geo-rep: Fix data sync issue during hardlink, rename) posted (#1) for review on master by Kotresh HR

Comment 2 Worker Ant 2017-11-14 11:14:24 UTC
COMMIT: https://review.gluster.org/18731 committed in master by \"Kotresh HR\" <khiremat> with a commit message- geo-rep: Fix data sync issue during hardlink, rename

Problem:
The data is not getting synced if master witnessed
IO as below.

1. echo "test_data" > f1
2. ln f1 f2
3. mv f2 f3
4. unlink f1

On master, 'f3' exists with data "test_data" but on
slave, only f3 exists with zero byte file without
backend gfid link.

Cause:
On master, since 'f2' no longer exists, the hardlink
is skipped during processing. Later, on trying to sync
rename, since source ('f2') doesn't exist, dst ('f3')
is created with same gfid. But in this use case, it
succeeds but backend gfid would not have linked as 'f1'
exists with the same gfid. So, rsync would fail with
ENOENT as backend gfid is not linked with 'f3' and 'f1'
is unlinked.

Fix:
On processing rename, if src doesn't exist on slave,
don't blindly create dst with same gfid. The gfid
needs to be checked, if it exists, hardlink needs
to be created instead of mknod.

Thanks Aravinda for helping in RCA :)

Change-Id: I5af4f99798ed1bcb297598a4bc796b701d1e0130
Signed-off-by: Kotresh HR <khiremat>
BUG: 1512483
Reporter: dimitri.ars

Comment 3 Shyamsundar 2018-03-15 11:20:54 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/