Description of problem: Entry heal pending for directories which has symlinks to a different replica set. Customer noticed this after a rebalance failure. ~~~~ [2016-12-23 02:35:36.425400] I [MSGID: 109028] [dht-rebalance.c:3872:gf_defrag_status_get] 0-nfs-vol1-dht: Rebalance is completed. Time taken is 594617.00 secs [2016-12-23 02:35:36.425418] I [MSGID: 109028] [dht-rebalance.c:3876:gf_defrag_status_get] 0-nfs-vol1-dht: Files migrated: 942363, size: 863115246048, lookups: 6538531, failures: 18981, skipped: 1281102 ~~~~ * Around 1000+ directories are shown to be healed from n7-gluster1-qh2 to n6-gluster1-qh2 ~~~~ <snip> from gluster v heal info Brick n6-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Status: Connected Number of entries: 0 Brick n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 /6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/proxy/proxy_core/source/autom4te.cache /6000-science/6040-RWJ44/space/scarassou/anaconda/pkgs/openssl-1.0.1c-0/lib /6000-science/6040-RWJ44/space/cchung/cchung/tmp/fftw-3.0.1/.libs /6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/proxy/proxy_ssl/source/autom4te.cache /6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/proxy/proxy_ssl/source/doxygen /6000-science/6040-RWJ44/space/cchung/cchung/ldg/gt4.0.7-all-source-installer/source-trees-thr/gsi/sasl/gssplugins /6000-science/6040-RWJ44/space/scarassou/anaconda/pkgs/opencv-2.4.2-np17py27_1/lib [.....] /6000-science/6040-RWJ44/space/cmagoulas/LAPTOP/Library/Application Support/iDVD/Installed Themes/iDVD 6/Travel-Main+.theme/Contents/Resources Status: Connected Number of entries: 1027 Brick n8-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Status: Connected Number of entries: 0 ~~~~ * The glustershd shows the log messages like below ~~~~ [2016-12-31 16:51:15.007765] I [MSGID: 108026] [afr-self-heal-entry.c:589:afr_selfheal_entry_do] 0-nfs-vol1-replicate-3: performing entry selfheal on f1f3a846-f07d-49bb-999f-d3ab78568cce [2016-12-31 16:51:15.011284] W [MSGID: 114031] [client-rpc-fops.c:2812:client3_3_link_cbk] 0-nfs-vol1-client-6: remote operation failed: (<gfid:f3c7f4ee-9db8-4126-b9f7-14de175c5f02> -> (null)) [Invalid argument] ~~~~ * The error in the above gfid points to a symlink exists in the "n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1". The symlink(file) doesn't exist in it's replica, but the gfid link exists. * The brick logs of n6-gluster1-qh2 shows the below errors ~~~~ [2017-01-08 16:13:56.011298] I [MSGID: 115062] [server-rpc-fops.c:1208:server_link_cbk] 0-nfs-vol1-server: 211804157: LINK <gfid:f3c7f4ee-9db8-4126-b9f7-14de175c5f02> (f3c7f4ee-9db8-4126-b9f7-14de175c5f02) -> f1f3a846-f07d-49bb-999f-d3ab78568cce/output.0 ==> (Invalid argument) [Invalid argument] ~~~~ Version-Release number of selected component (if applicable): glusterfs-3.7.9-10.el7rhgs.x86_64 How reproducible: Happened once for the customer. Actual results: Large number of directories shown to be healed Expected results: Need engineering help for resolving the heal issue. Additional info: Volume Name: nfs-vol1 Type: Distributed-Replicate Volume ID: 3c0b3e98-ef93-4502-a0e4-63d5da5963f6 Status: Started Number of Bricks: 10 x 2 = 20 Transport-type: tcp Bricks: Brick1: n0-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick2: n1-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick3: n2-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick4: n3-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick5: n4-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick6: n5-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick7: n6-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick8: n7-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick9: n8-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick10: n9-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick11: n10-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick12: n11-gluster1-qh2:/rhgs/bricks/brick1/bricksrv1 Brick13: n10-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Brick14: n11-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Brick15: n8-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Brick16: n9-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Brick17: n6-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Brick18: n7-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Brick19: n4-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Brick20: n5-gluster1-qh2:/rhgs/bricks/brick4/bricksrv4 Options Reconfigured: diagnostics.client-log-level: INFO cluster.quorum-type: auto cluster.server-quorum-type: server performance.readdir-ahead: on performance.cache-size: 1GB features.cache-invalidation: off ganesha.enable: on nfs.disable: on performance.read-ahead-page-count: 8 cluster.read-hash-mode: 2 client.event-threads: 4 server.event-threads: 4 server.outstanding-rpc-limit: 256 performance.io-thread-count: 64 network.ping-timeout: 42 features.uss: disable features.barrier: disable features.quota: on features.inode-quota: on features.quota-deem-statfs: on cluster.self-heal-daemon: on nfs.outstanding-rpc-limit: 16 diagnostics.brick-log-level: INFO nfs-ganesha: enable cluster.enable-shared-storage: enable snap-activate-on-create: enable auto-delete: enable cluster.server-quorum-ratio: 51%
Created attachment 1242190 [details] Brick logs from the n6 server
Created attachment 1242197 [details] glustershd.log from n7 server
Created attachment 1245019 [details] tcpdump from the source server
Created attachment 1269167 [details] gdb script to print ancestry I was not able to do it with systemtap, attaching gdb script for the same. The script prints the dentry list and gfid on which inode->parent call failed.
An update on the RCA of issue. The issue was seen when hard link is attempted to a symlink file. Attempting same scenario with above script: sym1 is a symlink under "/1" and sym3, sym4, sym9, sym11... were hard links created for the sym1 under "/2" 0x7eff987a5a40] --> [0x7effa00f8a30]/<GFID:00000000000000000000000000000001> --> [0x7effa00f2020]2<GFID:04b5d81439b045cf9824d1f2adadd4ef> --> [0x7effa00f03b0]sym3<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa00e7de0]sym4<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa00ad1e0]sym9<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa00f2e40]sym11<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa00d9550]sym12<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa0003ab0]sym13<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa00d8fb0]sym17<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa00f1250]/<GFID:00000000000000000000000000000001> --> [0x7effa00f1350]sym1<GFID:1c613a545ed341f7853ffe2f184fd783> --> [0x7effa0001990]sym2<GFID:1c613a545ed341f7853ffe2f184fd783> Quota_build_ancestry_cbk expects successive entries to be ancestors along the path and attempts to link them. So, in the above case we will attempt linking sym4 to sym3 , sym9 to sym4 and so on. In the inode_link code we have, ... if (parent->ia_type != IA_IFDIR) { GF_ASSERT (!"link attempted on non-directory parent"); return NULL; } ... So the parent is not linked. This seems to be an issue that needs to be fixed. However, this must have errored out in quota_build_ancestry_cbk code and not reached the statement where "parent is null" is logged. Looking further into this.
The issue is solved by https://review.gluster.org/#/c/17730/ merged in upstream.
Build Number : glusterfs-3.12.2-7.el7rhgs.x86_64 With quota enabled on a distribute-replicate volume, volume heal is successful with files having symlinks and said symlinks having hardlinks. Hence, moving bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days