Bug 1414456

Summary: [GSS]Entry heal pending for directories which has symlinks to a different replica set
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Riyas Abdulrasak <rnalakka>
Component: quotaAssignee: Sanoj Unnikrishnan <sunnikri>
Status: CLOSED ERRATA QA Contact: Vinayak Papnoi <vpapnoi>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: amukherj, bkunal, nchilaka, ravishankar, rcyriac, rhinduja, rhs-bugs, sheggodu, srmukher, storage-qa-internal, sunnikri
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: rebase
Fixed In Version: glusterfs-3.12.2-1 Doc Type: Bug Fix
Doc Text:
The path ancestry was not accurately populated when a symbolic link file had multiple hard links to it. This resulted in entry heal pending. This fix populates the ancestry precisely by handling the scenario of symbolic link file with multiple hard links.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:32:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1429198, 1429402, 1429405, 1436673    
Bug Blocks: 1408949, 1472361, 1503135    
Attachments:
Description Flags
Brick logs from the n6 server
none
glustershd.log from n7 server
none
tcpdump from the source server
none
gdb script to print ancestry none

Description Riyas Abdulrasak 2017-01-18 14:39:19 UTC
Description of problem:

Entry heal pending for directories which has symlinks to a different replica set. Customer noticed this after a rebalance failure. 

~~~~
[2016-12-23 02:35:36.425400] I [MSGID: 109028] [dht-rebalance.c:3872:gf_defrag_status_get] 0-nfs-vol1-dht: Rebalance is completed. Time taken is 594617.00 secs
[2016-12-23 02:35:36.425418] I [MSGID: 109028] [dht-rebalance.c:3876:gf_defrag_status_get] 0-nfs-vol1-dht: Files migrated: 942363, size: 863115246048, lookups: 6538531, failures: 18981, skipped: 1281102
~~~~

* Around 1000+ directories are shown to be healed from n7-gluster1-qh2 to n6-gluster1-qh2

~~~~
<snip> from gluster v heal info
Brick n6-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Status: Connected
Number of entries: 0

Brick n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
/6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/proxy/proxy_core/source/autom4te.cache 
/6000-science/6040-RWJ44/space/scarassou/anaconda/pkgs/openssl-1.0.1c-0/lib 
/6000-science/6040-RWJ44/space/cchung/cchung/tmp/fftw-3.0.1/.libs 
/6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/proxy/proxy_ssl/source/autom4te.cache 
/6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/proxy/proxy_ssl/source/doxygen 
/6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/sasl/gssplugins 
/6000-science/6040-RWJ44/space/scarassou/anaconda/pkgs/opencv-2.4.2-np17py27_1/lib 
[.....]

/6000-science/6040-RWJ44/space/cmagoulas/LAPTOP/Library/Application Support/iDVD/Installed Themes/iDVD 6/Travel-Main+.theme/Contents/Resources 
Status: Connected
Number of entries: 1027

Brick n8-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Status: Connected
Number of entries: 0
~~~~


* The glustershd shows the log messages like below

~~~~
[2016-12-31 16:51:15.007765] I [MSGID: 108026] [afr-self-heal-entry.c:589:afr_selfheal_entry_do] 0-nfs-vol1-replicate-3: performing entry selfheal on f1f3a846-f07d-49bb-999f-d3ab78568cce
[2016-12-31 16:51:15.011284] W [MSGID: 114031] [client-rpc-fops.c:2812:client3_3_link_cbk] 0-nfs-vol1-client-6: remote operation failed: (<gfid:f3c7f4ee-9db8-4126-b9f7-14de175c5f02> -> (null)) [Invalid argument]
~~~~

* The error in the above gfid points to a symlink exists in the "n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1". The symlink(file) doesn't exist in it's replica, but the gfid link exists. 

* The brick logs of n6-gluster1-qh2 shows the below errors

~~~~
[2017-01-08 16:13:56.011298] I [MSGID: 115062] [server-rpc-fops.c:1208:server_link_cbk] 0-nfs-vol1-server: 211804157: LINK <gfid:f3c7f4ee-9db8-4126-b9f7-14de175c5f02> (f3c7f4ee-9db8-4126-b9f7-14de175c5f02) -> f1f3a846-f07d-49bb-999f-d3ab78568cce/output.0 ==> (Invalid argument) [Invalid argument]
~~~~


Version-Release number of selected component (if applicable):

glusterfs-3.7.9-10.el7rhgs.x86_64


How reproducible:

Happened once for the customer. 

Actual results:

Large number of directories shown to be healed

Expected results:

Need engineering help for resolving the heal issue. 

Additional info:

Volume Name: nfs-vol1
Type: Distributed-Replicate
Volume ID: 3c0b3e98-ef93-4502-a0e4-63d5da5963f6
Status: Started
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Bricks:
Brick1: n0-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick2: n1-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick3: n2-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick4: n3-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick5: n4-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick6: n5-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick7: n6-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick8: n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick9: n8-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick10: n9-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick11: n10-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick12: n11-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick13: n10-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick14: n11-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick15: n8-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick16: n9-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick17: n6-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick18: n7-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick19: n4-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick20: n5-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Options Reconfigured:
diagnostics.client-log-level: INFO
cluster.quorum-type: auto
cluster.server-quorum-type: server
performance.readdir-ahead: on
performance.cache-size: 1GB
features.cache-invalidation: off
ganesha.enable: on
nfs.disable: on
performance.read-ahead-page-count: 8
cluster.read-hash-mode: 2
client.event-threads: 4
server.event-threads: 4
server.outstanding-rpc-limit: 256
performance.io-thread-count: 64
network.ping-timeout: 42
features.uss: disable
features.barrier: disable
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
cluster.self-heal-daemon: on
nfs.outstanding-rpc-limit: 16
diagnostics.brick-log-level: INFO
nfs-ganesha: enable
cluster.enable-shared-storage: enable
snap-activate-on-create: enable
auto-delete: enable
cluster.server-quorum-ratio: 51%

Comment 2 Riyas Abdulrasak 2017-01-18 15:09:40 UTC
Created attachment 1242190 [details]
Brick logs from the n6 server

Comment 3 Riyas Abdulrasak 2017-01-18 15:13:12 UTC
Created attachment 1242197 [details]
glustershd.log from n7 server

Comment 7 Riyas Abdulrasak 2017-01-27 06:53:40 UTC
Created attachment 1245019 [details]
tcpdump from the source server

Comment 22 Sanoj Unnikrishnan 2017-04-06 06:18:42 UTC
Created attachment 1269167 [details]
gdb script to print ancestry

I was not able to do it with systemtap,
attaching gdb script for the same.

The script prints the dentry list and gfid on which inode->parent call failed.

Comment 25 Sanoj Unnikrishnan 2017-04-07 05:15:34 UTC
An update on the RCA of issue. 
The issue was seen when hard link is attempted to a symlink file.

Attempting same scenario with above script:
sym1 is a symlink under "/1" and sym3, sym4, sym9, sym11... were hard links created for the sym1 under "/2"

0x7eff987a5a40]
 --> [0x7effa00f8a30]/<GFID:00000000000000000000000000000001>
 --> [0x7effa00f2020]2<GFID:04b5d81439b045cf9824d1f2adadd4ef>
 --> [0x7effa00f03b0]sym3<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00e7de0]sym4<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00ad1e0]sym9<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00f2e40]sym11<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00d9550]sym12<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa0003ab0]sym13<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00d8fb0]sym17<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00f1250]/<GFID:00000000000000000000000000000001>
 --> [0x7effa00f1350]sym1<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa0001990]sym2<GFID:1c613a545ed341f7853ffe2f184fd783>

Quota_build_ancestry_cbk expects successive entries to be ancestors along the path and attempts to link them. So, in the above case we will attempt linking sym4 to sym3 , sym9 to sym4 and so on.

In the inode_link code we have,
...
                if (parent->ia_type != IA_IFDIR) {
                        GF_ASSERT (!"link attempted on non-directory parent");
                        return NULL;
                }
...
So the parent is not linked. 
This seems to be an issue that needs to be fixed.
However, this must have errored out in quota_build_ancestry_cbk code and not reached the statement where "parent is null" is logged. Looking further into this.

Comment 30 Sanoj Unnikrishnan 2017-08-04 09:55:17 UTC
The issue is solved by https://review.gluster.org/#/c/17730/ merged in upstream.

Comment 34 Vinayak Papnoi 2018-04-19 10:02:52 UTC
Build Number : glusterfs-3.12.2-7.el7rhgs.x86_64

With quota enabled on a distribute-replicate volume, volume heal is successful with files having symlinks and said symlinks having hardlinks.

Hence, moving bug to verified.

Comment 36 errata-xmlrpc 2018-09-04 06:32:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Comment 37 Red Hat Bugzilla 2023-09-14 03:52:26 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days