Bug 1510342 - Not all files synced using geo-replication
Summary: Not all files synced using geo-replication
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.12
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1510994 1512483 1512496
TreeView+ depends on / blocked
 
Reported: 2017-11-07 09:23 UTC by Dimitri Ars
Modified: 2018-03-05 07:14 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.12.6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1510994 1512483 1512496 (view as bug list)
Environment:
Last Closed: 2018-03-05 07:14:08 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
master logs (662.32 KB, application/x-gzip)
2017-11-09 07:29 UTC, Dimitri Ars
no flags Details
slave logs (4.63 KB, application/x-gzip)
2017-11-09 07:30 UTC, Dimitri Ars
no flags Details
changelog (148.28 KB, application/x-gzip)
2017-11-09 12:47 UTC, Dimitri Ars
no flags Details

Description Dimitri Ars 2017-11-07 09:23:59 UTC
Description of problem:
When using Sonatype Nexus3 as a docker repository on glusterfs and geo-replicating this volume, at least the .bytes files which contain the docker layer data do not get synced. The files are created on the geo-replicated site, but remain 0 bytes. Other files, like the .properties files are synced properly.
The moment you add a character to the .bytes file manually (echo >>), the .bytes file data does get synced...it seems like gluster doesn't detect writing data to the file in some cases, at least the way Nexus3 does it to those .bytes files. We suspect that there will more applications / files affected by this, resulting in a corrupt / incomplete data on the geo-replicated site.

Version-Release number of selected component (if applicable):
3.12.1-2

How reproducible:
100%

Steps to Reproduce:
1. Run Sonatype Nexus3 with a hosted docker repository and it's data (/nexus-data) on a glusterfs volume which is geo-replicated.
2. docker push an arbitrary image into this nexus docker repo
3. ls -laRf blobs/default/content | grep .bytes
   on both main site and geo-replicated site and see that the ones on the main site are non-0 bytes and on the geo-replicated site they're 0 bytes

Actual results:
main:
-rw-r--r--. 2 200 200  529 Nov  3 15:27 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes
-rw-r--r--. 1 200 200 1991435 Nov  3 19:00 613953bb-b542-4db7-ba18-01a331361994.bytes


geo-replicated:
-rw-r--r--. 0 root root    0 Nov  3 19:01 613953bb-b542-4db7-ba18-01a331361994.bytes
-rw-r--r--. 0 root root    0 Nov  3 15:28 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes

Expected results:
main:
-rw-r--r--. 2 200 200  529 Nov  3 15:27 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes
-rw-r--r--. 1 200 200 1991435 Nov  3 19:00 613953bb-b542-4db7-ba18-01a331361994.bytes


geo-replicated:
-rw-r--r--. 2 200 200  529 Nov  3 15:27 e39d9b3a-53e0-44bc-b4a2-31d145aeec81.bytes
-rw-r--r--. 1 200 200 1991435 Nov  3 19:00 613953bb-b542-4db7-ba18-01a331361994.bytes

Additional info:
Tried both rsync and use-tarssh, same issue.
Date/time is the same on main and geo-replicated site servers
An initial sync does correctly sync the .bytes files.
Maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=1437244

Comment 1 Mohit Agrawal 2017-11-09 04:06:21 UTC
Hi,

 Can you please share the brick logs from master and slave nodes?

Regards
Mohit Agrawal

Comment 2 Dimitri Ars 2017-11-09 07:29:37 UTC
Created attachment 1349776 [details]
master logs

Comment 3 Dimitri Ars 2017-11-09 07:30:01 UTC
Created attachment 1349777 [details]
slave logs

Comment 4 Dimitri Ars 2017-11-09 07:52:40 UTC
logs attached, not containing the files from the first comment, but others which have the same problem, for example:

[root@X94pabgluster0 chap-24]# ls -al
total 9
drwxr-sr-x. 2  200 200 4096 Nov  8 20:04 .
drwxr-sr-x. 3  200 200 4096 Nov  8 20:04 ..
-rw-r--r--. 0 root 200    0 Nov  8 20:04 d327a69b-f9fb-455e-80b6-5bec8df0b6b9.bytes
-rw-r--r--. 1  200 200  356 Nov  8 20:04 d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties
[root@X94pabgluster0 chap-24]# getfattr -n glusterfs.gfid2path /mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties
getfattr: Removing leading '/' from absolute path names
# file: mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties
glusterfs.gfid2path="/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.properties"

[root@X94pabgluster0 chap-24]# getfattr -n glusterfs.gfid2path /mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.bytes
getfattr: Removing leading '/' from absolute path names
# file: mnt/blobs/default/content/vol-05/chap-24/d327a69b-f9fb-455e-80b6-5bec8df0b6b9.bytes
glusterfs.gfid2path="/blobs/default/content/vol-29/chap-39/9ba4d149-3d54-4988-912a-cf3e81a4854d.bytes"


So there's a 0 bytes file with 0 links owned by root (but the group is already the "destination owner".

Comment 5 Kotresh HR 2017-11-09 10:31:34 UTC
Hi,

We are interested in the I/O pattern that's get recorded in the changelog. 
Could you please share the changelogs from below path. It will be a tar file.

/var/lib/glusterfsd/misc/<master-vol-name>/<ssh....>/<md5sum of brickpath>/.processed/archive<date>.tar

Comment 6 Dimitri Ars 2017-11-09 12:47:16 UTC
Created attachment 1349918 [details]
changelog

changelog for Nexus3 adding docker images and anything else it does...we didn't use nexus to do other transactions, so hopefully it isn't too cluttered. As stated, the .properties files created are doing fine, the .bytes files end up as 0 on site B...
Don't know about the strange gfid2path's on site A and B for the .bytes files...

Comment 7 Dimitri Ars 2017-11-09 21:05:38 UTC
Did some more testing...reproducing the problem is easily done with for example the following shell commands:

cp /etc/group f1 ; \
mv f1 f2 ; \
ln f2 f3 ; \
mv f3 f4 ; \
unlink f2

file f4 should contain the /etc/group contents, and it does on site A, but on site B it's the 0 bytes...

A:
-rw-r--r--.  1 nexus nexus  395 Nov  9 21:00 f4

B:
-rw-r--r--.  0 root 200    0 Nov  9 21:00 f4

Comment 8 Dimitri Ars 2017-11-09 21:32:32 UTC
Even further deduced;
echo testing > f1 ; \
ln f1 f2 ; \
mv f2 f3 ; \
unlink f1

Then f3 is the problem file.
If you takeout the unlink things are a bit better but still not correct, we then have 2 linked files (f1 and f3) on site A, and 2 separate files (f1 and f3) on site B, both have the correct content, but expected that site B would have 2 linked files as well..
A:
-rw-r--r--.  2 nexus nexus    8 Nov  9 21:16 f1
-rw-r--r--.  2 nexus nexus    8 Nov  9 21:16 f3
B:
-rw-r--r--.  1 200 200    8 Nov  9 21:16 f1
-rw-r--r--.  1 200 200    8 Nov  9 21:16 f3

If I leave out the rename of the hardlink as well, things go fine. If I then rename f2 to f3 after waiting for 15 seconds or so (changelog rollover) then the rename goes fine as well, all correct on both A and B.
It looks kind of like https://bugzilla.redhat.com/show_bug.cgi?id=1448914 which had this issue for extended attributes. Also looks like https://bugzilla.redhat.com/show_bug.cgi?id=1296175 was somewhat related.

Comment 9 Dimitri Ars 2017-11-10 11:17:46 UTC
Although it's easy to reproduce and see, this is the related error logging happening on site B geo-replication-slaves log (4a734e2c-202a-4b11-8676-9a3219dc2101:192.168.5.7.%2Fvar%2Flib%2Fheketi%2Fmounts%2Fvg_1f8ca94513acde49ebe3167b58004159%2Fbrick_fe0308e73bb49c5a30f504ef853a731d%2Fbrick.vol_20e37cc674b396d041691341d69b81a6.gluster.log:)

[2017-11-10 10:06:06.718657] W [MSGID: 114031] [client-rpc-fops.c:493:client3_3_stat_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-2: remote operation failed [No such file or directory]
[2017-11-10 10:06:06.719246] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-2: remote operation failed. Path: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 (5cbc6952-f50c-4040-af3b-cd9dfb2f8596) [No such file or directory]
[2017-11-10 10:06:06.719300] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-1: remote operation failed. Path: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 (5cbc6952-f50c-4040-af3b-cd9dfb2f8596) [No such file or directory]
[2017-11-10 10:06:06.719322] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-client-0: remote operation failed. Path: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 (5cbc6952-f50c-4040-af3b-cd9dfb2f8596) [No such file or directory]
[2017-11-10 10:06:06.721119] E [MSGID: 109040] [dht-helper.c:1378:dht_migration_complete_check_task] 0-vol_eeaf4e18532e9769aed04199eda0d1bd-dht: /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596: failed to lookup the file on vol_eeaf4e18532e9769aed04199eda0d1bd-dht [No such file or directory]
[2017-11-10 10:06:06.721213] W [fuse-bridge.c:874:fuse_attr_cbk] 0-glusterfs-fuse: 2510782: STAT() /.gfid/5cbc6952-f50c-4040-af3b-cd9dfb2f8596 => -1 (No such file or directory)

Comment 10 Kotresh HR 2017-11-13 10:47:51 UTC
Patch posted for master branch:
https://review.gluster.org/18731

Comment 11 Kotresh HR 2017-11-13 10:49:04 UTC
    Problem:
    The data is not getting synced if master witnessed
    IO as below.
    
    1. echo "test_data" > f1
    2. ln f1 f2
    3. mv f2 f3
    4. unlink f1
    
    On master, 'f3' exists with data "test_data" but on
    slave, only f3 exists with zero byte file without
    backend gfid link.
    
    Cause:
    On master, since 'f2' no longer exists, the hardlink
    is skipped during processing. Later, on trying to sync
    rename, since source ('f2') doesn't exist, dst ('f3')
    is created with same gfid. But in this use case, it
    succeeds but backend gfid would not have linked as 'f1'
    exists with the same gfid. So, rsync would fail with
    ENOENT as backend gfid is not linked with 'f3' and 'f1'
    is unlinked.
    
    Fix:
    On processing rename, if src doesn't exist on slave,
    don't blindly create dst with same gfid. The gfid
    needs to be checked, if it exists, hardlink needs
    to be created instead of mknod.

Comment 12 Kotresh HR 2017-11-13 10:50:44 UTC
Thanks Dimitri Ars for providing the testcase. It really helped in fixing the issue faster. Keeping using the gluster and report bugs!

Comment 13 Dimitri Ars 2017-11-13 22:39:41 UTC
Thanks Kotresh HR! I did an initial test with the patch and the replication of .bytes files of the Nexus3 application looks good now! Will test some more this week.

Comment 14 Dimitri Ars 2017-11-23 21:13:42 UTC
Hi Kotresh HR,

I tested some more and conclude that the patch fixes the issue so this can be closed (do you do that or do I). One final question / request though...can this be backported to the 3.12.x release, eg the next release at Dec 10?

Thanks,

Dimitri

Comment 15 Kotresh HR 2018-01-17 11:03:56 UTC
Sorry, I missed it and I thought I had backported this. I will do it and you would be getting it in next release.

Thanks,
Kotresh HR

Comment 16 Worker Ant 2018-01-17 11:05:40 UTC
REVIEW: https://review.gluster.org/19217 (geo-rep: Fix data sync issue during hardlink, rename) posted (#1) for review on release-3.12 by Kotresh HR

Comment 17 Worker Ant 2018-02-02 06:46:57 UTC
COMMIT: https://review.gluster.org/19217 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- geo-rep: Fix data sync issue during hardlink, rename

Problem:
The data is not getting synced if master witnessed
IO as below.

1. echo "test_data" > f1
2. ln f1 f2
3. mv f2 f3
4. unlink f1

On master, 'f3' exists with data "test_data" but on
slave, only f3 exists with zero byte file without
backend gfid link.

Cause:
On master, since 'f2' no longer exists, the hardlink
is skipped during processing. Later, on trying to sync
rename, since source ('f2') doesn't exist, dst ('f3')
is created with same gfid. But in this use case, it
succeeds but backend gfid would not have linked as 'f1'
exists with the same gfid. So, rsync would fail with
ENOENT as backend gfid is not linked with 'f3' and 'f1'
is unlinked.

Fix:
On processing rename, if src doesn't exist on slave,
don't blindly create dst with same gfid. The gfid
needs to be checked, if it exists, hardlink needs
to be created instead of mknod.

Thanks Aravinda for helping in RCA :)

> Change-Id: I5af4f99798ed1bcb297598a4bc796b701d1e0130
> Signed-off-by: Kotresh HR <khiremat>
> BUG: 1512483
> Reporter: dimitri.ars
(cherry picked from commit 6e2ce37341e5d600d8fd5648b39eec0dbdbe45ad)

Change-Id: I5af4f99798ed1bcb297598a4bc796b701d1e0130
Signed-off-by: Kotresh HR <khiremat>
BUG: 1510342

Comment 18 Jiffin 2018-03-05 07:14:08 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.6, please open a new bug report.

glusterfs-3.12.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2018-February/033552.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.