Bug 1414456 - [GSS]Entry heal pending for directories which has symlinks to a different replica set [NEEDINFO]
Summary: [GSS]Entry heal pending for directories which has symlinks to a different rep...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: quota
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: Sanoj Unnikrishnan
QA Contact: Vinayak Papnoi
Whiteboard: rebase
Depends On: 1429198 1429402 1429405 1436673
Blocks: 1408949 RHGS-3.4-GSS-proposed-tracker 1503135
TreeView+ depends on / blocked
Reported: 2017-01-18 14:39 UTC by Riyas Abdulrasak
Modified: 2018-09-04 06:33 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.12.2-1
Doc Type: Bug Fix
Doc Text:
The path ancestry was not accurately populated when a symbolic link file had multiple hard links to it. This resulted in entry heal pending. This fix populates the ancestry precisely by handling the scenario of symbolic link file with multiple hard links.
Clone Of:
Last Closed: 2018-09-04 06:32:03 UTC
srmukher: needinfo? (sunnikri)

Attachments (Terms of Use)
Brick logs from the n6 server (10.93 MB, application/x-xz)
2017-01-18 15:09 UTC, Riyas Abdulrasak
no flags Details
glustershd.log from n7 server (5.72 MB, application/x-xz)
2017-01-18 15:13 UTC, Riyas Abdulrasak
no flags Details
tcpdump from the source server (11.32 MB, application/x-xz)
2017-01-27 06:53 UTC, Riyas Abdulrasak
no flags Details
gdb script to print ancestry (1.07 KB, text/plain)
2017-04-06 06:18 UTC, Sanoj Unnikrishnan
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:33:46 UTC

Description Riyas Abdulrasak 2017-01-18 14:39:19 UTC
Description of problem:

Entry heal pending for directories which has symlinks to a different replica set. Customer noticed this after a rebalance failure. 

[2016-12-23 02:35:36.425400] I [MSGID: 109028] [dht-rebalance.c:3872:gf_defrag_status_get] 0-nfs-vol1-dht: Rebalance is completed. Time taken is 594617.00 secs
[2016-12-23 02:35:36.425418] I [MSGID: 109028] [dht-rebalance.c:3876:gf_defrag_status_get] 0-nfs-vol1-dht: Files migrated: 942363, size: 863115246048, lookups: 6538531, failures: 18981, skipped: 1281102

* Around 1000+ directories are shown to be healed from n7-gluster1-qh2 to n6-gluster1-qh2

<snip> from gluster v heal info
Brick n6-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Status: Connected
Number of entries: 0

Brick n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1

/6000-science/6040-RWJ44/space/cmagoulas/LAPTOP/Library/Application Support/iDVD/Installed Themes/iDVD 6/Travel-Main+.theme/Contents/Resources 
Status: Connected
Number of entries: 1027

Brick n8-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Status: Connected
Number of entries: 0

* The glustershd shows the log messages like below

[2016-12-31 16:51:15.007765] I [MSGID: 108026] [afr-self-heal-entry.c:589:afr_selfheal_entry_do] 0-nfs-vol1-replicate-3: performing entry selfheal on f1f3a846-f07d-49bb-999f-d3ab78568cce
[2016-12-31 16:51:15.011284] W [MSGID: 114031] [client-rpc-fops.c:2812:client3_3_link_cbk] 0-nfs-vol1-client-6: remote operation failed: (<gfid:f3c7f4ee-9db8-4126-b9f7-14de175c5f02> -> (null)) [Invalid argument]

* The error in the above gfid points to a symlink exists in the "n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1". The symlink(file) doesn't exist in it's replica, but the gfid link exists. 

* The brick logs of n6-gluster1-qh2 shows the below errors

[2017-01-08 16:13:56.011298] I [MSGID: 115062] [server-rpc-fops.c:1208:server_link_cbk] 0-nfs-vol1-server: 211804157: LINK <gfid:f3c7f4ee-9db8-4126-b9f7-14de175c5f02> (f3c7f4ee-9db8-4126-b9f7-14de175c5f02) -> f1f3a846-f07d-49bb-999f-d3ab78568cce/output.0 ==> (Invalid argument) [Invalid argument]

Version-Release number of selected component (if applicable):


How reproducible:

Happened once for the customer. 

Actual results:

Large number of directories shown to be healed

Expected results:

Need engineering help for resolving the heal issue. 

Additional info:

Volume Name: nfs-vol1
Type: Distributed-Replicate
Volume ID: 3c0b3e98-ef93-4502-a0e4-63d5da5963f6
Status: Started
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Brick1: n0-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick2: n1-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick3: n2-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick4: n3-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick5: n4-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick6: n5-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick7: n6-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick8: n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick9: n8-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick10: n9-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick11: n10-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick12: n11-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1
Brick13: n10-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick14: n11-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick15: n8-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick16: n9-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick17: n6-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick18: n7-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick19: n4-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Brick20: n5-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4
Options Reconfigured:
diagnostics.client-log-level: INFO
cluster.quorum-type: auto
cluster.server-quorum-type: server
performance.readdir-ahead: on
performance.cache-size: 1GB
features.cache-invalidation: off
ganesha.enable: on
nfs.disable: on
performance.read-ahead-page-count: 8
cluster.read-hash-mode: 2
client.event-threads: 4
server.event-threads: 4
server.outstanding-rpc-limit: 256
performance.io-thread-count: 64
network.ping-timeout: 42
features.uss: disable
features.barrier: disable
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
cluster.self-heal-daemon: on
nfs.outstanding-rpc-limit: 16
diagnostics.brick-log-level: INFO
nfs-ganesha: enable
cluster.enable-shared-storage: enable
snap-activate-on-create: enable
auto-delete: enable
cluster.server-quorum-ratio: 51%

Comment 2 Riyas Abdulrasak 2017-01-18 15:09:40 UTC
Created attachment 1242190 [details]
Brick logs from the n6 server

Comment 3 Riyas Abdulrasak 2017-01-18 15:13:12 UTC
Created attachment 1242197 [details]
glustershd.log from n7 server

Comment 7 Riyas Abdulrasak 2017-01-27 06:53:40 UTC
Created attachment 1245019 [details]
tcpdump from the source server

Comment 22 Sanoj Unnikrishnan 2017-04-06 06:18:42 UTC
Created attachment 1269167 [details]
gdb script to print ancestry

I was not able to do it with systemtap,
attaching gdb script for the same.

The script prints the dentry list and gfid on which inode->parent call failed.

Comment 25 Sanoj Unnikrishnan 2017-04-07 05:15:34 UTC
An update on the RCA of issue. 
The issue was seen when hard link is attempted to a symlink file.

Attempting same scenario with above script:
sym1 is a symlink under "/1" and sym3, sym4, sym9, sym11... were hard links created for the sym1 under "/2"

 --> [0x7effa00f8a30]/<GFID:00000000000000000000000000000001>
 --> [0x7effa00f2020]2<GFID:04b5d81439b045cf9824d1f2adadd4ef>
 --> [0x7effa00f03b0]sym3<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00e7de0]sym4<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00ad1e0]sym9<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00f2e40]sym11<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00d9550]sym12<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa0003ab0]sym13<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00d8fb0]sym17<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa00f1250]/<GFID:00000000000000000000000000000001>
 --> [0x7effa00f1350]sym1<GFID:1c613a545ed341f7853ffe2f184fd783>
 --> [0x7effa0001990]sym2<GFID:1c613a545ed341f7853ffe2f184fd783>

Quota_build_ancestry_cbk expects successive entries to be ancestors along the path and attempts to link them. So, in the above case we will attempt linking sym4 to sym3 , sym9 to sym4 and so on.

In the inode_link code we have,
                if (parent->ia_type != IA_IFDIR) {
                        GF_ASSERT (!"link attempted on non-directory parent");
                        return NULL;
So the parent is not linked. 
This seems to be an issue that needs to be fixed.
However, this must have errored out in quota_build_ancestry_cbk code and not reached the statement where "parent is null" is logged. Looking further into this.

Comment 30 Sanoj Unnikrishnan 2017-08-04 09:55:17 UTC
The issue is solved by https://review.gluster.org/#/c/17730/ merged in upstream.

Comment 34 Vinayak Papnoi 2018-04-19 10:02:52 UTC
Build Number : glusterfs-3.12.2-7.el7rhgs.x86_64

With quota enabled on a distribute-replicate volume, volume heal is successful with files having symlinks and said symlinks having hardlinks.

Hence, moving bug to verified.

Comment 36 errata-xmlrpc 2018-09-04 06:32:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.