Bug 1332080
Summary: | [geo-rep+shard]: Files which were synced to slave before enabling shard doesn't get sync/remove upon modification | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> |
Component: | geo-replication | Assignee: | Kotresh HR <khiremat> |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | amukherj, avishwan, chrisw, csaba, khiremat, nlevinki, rcyriac |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:29:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1344908 | ||
Bug Blocks: | 1351522 |
Description
Rahul Hinduja
2016-05-02 07:01:55 UTC
Moving this bug out of 3.1.3 since it is not reproducible always and this bug is not applicable for the Hyperconvergence use case. Hit this issue again with the build: glusterfs-3.7.9-4 Master: ======= [root@dj master]# du -sh * 279K file 8.2M new_file [root@dj master]# pwd /mnt/master [root@dj master]# Slave: ====== [root@dj slave]# du -sh * 279K file 2.8M new_file [root@dj slave]# pwd /mnt/slave [root@dj slave]# Shard feature enabled on both master and slave: =============================================== Master: +++++++ [root@dhcp37-182 scripts]# gluster volume info po | grep shard features.shard: enable [root@dhcp37-182 scripts]# Slave: +++++++ [root@dhcp37-122 scripts]# gluster volume info shifu | grep shard features.shard: enable [root@dhcp37-122 scripts]# RCA Update: I have verified the following things. 1. The issue is not related to quota and USS which were enabled in the volume. 2 The changelog records the DATA entry for the problematic file. 3. Geo-replication picks the changelog and processes it. But geo-rep does 'lstat' on .gfid/<gfid> on master volume before syncing to check for the presence for the file. The lstat on .gfid/<gfid> on is failing on master and hence the data sync is being missed. lstat is failing because the lookup is failing for '.gfid' virtual directory with ESTALE. We need to further debug for why lookup is failing for first time on enabling sharding on master. Following are lookup errors: [2016-05-20 12:01:32.538645] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-00000000000d: failed to resolve (Stale file handle) [2016-05-20 12:01:32.538663] E [fuse-bridge.c:564:fuse_lookup_resume] 0-fuse: failed to resolve path (null) [2016-05-20 12:01:32.539294] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-00000000000d: failed to resolve (Stale file handle) [2016-05-20 12:01:32.539319] E [fuse-bridge.c:564:fuse_lookup_resume] 0-fuse: failed to resolve path (null) [2016-05-20 12:01:32.583722] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-00000000000d: failed to resolve (Stale file handle) [2016-05-20 12:01:32.583761] E [fuse-bridge.c:564:fuse_lookup_resume] 0-fuse: failed to resolve path (null) [2016-05-20 12:00:24.235843] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-master-md-cache: adding option 'cache-posix-acl' for volume 'master-md-cache' with value 'true' Upstream Patch http://review.gluster.org/14773 (master) Upstream mainline : http://review.gluster.org/14773 Upstream 3.8 : http://review.gluster.org/14776 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4. Verified with the build: glusterfs-geo-replication-3.8.4-13.el7rhgs.x86_64 After enabling the shard, the files gets properly sync or removed. Moving this bug to verified state. Master: ======= [root@dj master]# ls -l total 1960 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 1783293 Feb 7 2017 new_file [root@dj master]# cat /root/files/new_file >> new_file [root@dj master]# ls -l total 5590 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 5499686 Feb 7 2017 new_file [root@dj master]# Slave: ====== [root@dj slave]# ls -l total 1960 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 1783293 Feb 7 2017 new_file [root@dj slave]# [root@dj slave]# ls -l total 5590 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 5499686 Feb 7 2017 new_file [root@dj slave]# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Master: ======= [root@dj master]# cp new_file after_shard [root@dj master]# ls -l total 10961 -rw-r--r--. 1 root root 5499686 Feb 7 2017 after_shard -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 5499686 Feb 7 2017 new_file [root@dj master]# Slave: ====== [root@dj slave]# ls -l total 5590 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 5499686 Feb 7 2017 new_file [root@dj slave]# [root@dj slave]# ls -l total 10961 -rw-r--r--. 1 root root 5499686 Feb 7 2017 after_shard -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 5499686 Feb 7 2017 new_file [root@dj slave]# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Master: ======= [root@dj master]# rm -rf after_shard [root@dj master]# ls -l total 5590 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 5499686 Feb 7 2017 new_file [root@dj master]# rm -rf new_file [root@dj master]# ls file1 [root@dj master]# ls -l total 219 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 [root@dj master]# Slave: ====== [root@dj slave]# ls -l total 5590 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 -rw-r--r--. 1 root root 5499686 Feb 7 2017 new_file [root@dj slave]# [root@dj slave]# [root@dj slave]# ls -l total 219 -rw-r--r--. 1 root root 223741 Feb 7 2017 file1 [root@dj slave]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |