Expanded my volume from 10*(4+2) to 12*(4+2) with linux untars and lookups from 6 gNFS mounts. Triggered rebalance. I/O errored out on 3/3 clients with ESTALE : On gqac007(Mounted via gqas008.sbu.lab.eng.bos.redhat.com) : <snip> tar: linux-4.13-rc3/arch/arm64/boot/dts: Cannot stat: Stale file handle tar: linux-4.13-rc3/arch/arm64/boot: Cannot stat: Stale file handle tar: linux-4.13-rc3/arch/arm64: Cannot stat: Stale file handle tar: linux-4.13-rc3/arch: Cannot stat: Stale file handle tar: linux-4.13-rc3: Cannot stat: Stale file handle tar: Exiting with failure status due to previous errors </snip> On gqac024(mounted via gqas013.sbu.lab.eng.bos.redhat.com) : <snip> tar: linux-4.13-rc3/arch/arm64/boot: Cannot stat: Stale file handle tar: linux-4.13-rc3/arch/arm64: Cannot stat: Stale file handle tar: linux-4.13-rc3/arch: Cannot stat: Stale file handle tar: linux-4.13-rc3: Cannot stat: Stale file handle tar: Exiting with failure status due to previous errors <snip> On gqac008 (Mounted via gqas007.sbu.lab.eng.bos.redhat.com) : <snip> tar: linux-4.13-rc3/arch/arm64/boot: Cannot stat: Stale file handle tar: linux-4.13-rc3/arch/arm64: Cannot stat: Stale file handle tar: linux-4.13-rc3/arch: Cannot stat: Stale file handle tar: linux-4.13-rc3: Cannot stat: Stale file handle tar: Exiting with failure status due to previous errors </snip> Version-Release number of selected component (if applicable): ------------------------------------------------------------- 3.8.4-38 How reproducible: ----------------- 1/1 Actual results: --------------- Application got affected.IO errored out with ESTALE. Expected results: ----------------- Successful IO completion. Additional info: Volume Name: butcher Type: Distributed-Disperse Volume ID: dbd4e36f-28de-445e-a5cd-c78d0a467fb6 Status: Started Snapshot Count: 0 Number of Bricks: 12 x (4 + 2) = 72 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick8: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick14: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick15: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick16: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick17: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick18: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick20: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick21: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick22: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick23: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick24: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick26: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick29: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick30: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick31: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick32: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick33: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick34: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick35: gqas003.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick36: gqas007.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick37: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick38: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick39: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick40: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick41: gqas003.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick42: gqas007.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick43: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick44: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick45: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick46: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick47: gqas003.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick48: gqas007.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick49: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick50: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick51: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick52: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick53: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick54: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick55: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick56: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick57: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick58: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick59: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick60: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick61: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick62: gqas005.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick63: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick64: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick65: gqas003.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick66: gqas007.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick67: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick68: gqas005.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick69: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick70: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick71: gqas003.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick72: gqas007.sbu.lab.eng.bos.redhat.com:/bricks12/brick Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on features.uss: enable client.event-threads: 4 server.event-threads: 4 network.inode-lru-limit: 50000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on transport.address-family: inet nfs.disable: off Below are few filters which I used to analyze the pkt trace (tcp.dstport != 49162) && (glusterfs.bname == "dir2") (glusterfs) && (glusterfs.proc == 27) ((glusterfs) && (glusterfs.proc == 27)) && (gluster.op_errno != 0) Since I couldn't find any LOOKUP failures (for 'dir2') sent from the brick servers, I am suspecting if the issue is in gf_svc_lookup if there could be cases in which it could have forwarded the LOOKUP to virtual namespace instead of first trying on regular graph. Request Rafi to comment. From code inspection so far, maybe the checks in place are not sufficient, like static int32_t gf_svc_lookup (call_frame_t *frame, xlator_t *this, loc_t *loc, dict_t *xdata) { ... ... ret = svc_inode_ctx_get (this, loc->inode, &inode_type); if (!__is_root_gfid (loc->gfid)) { if (loc->parent) { parent = inode_ref (loc->parent); ret = svc_inode_ctx_get (this, loc->parent, &parent_type); } else { parent = inode_parent (loc->inode, loc->pargfid, NULL); if (parent) ret = svc_inode_ctx_get (this, parent, &parent_type);' >>> We do not seem to be checking for 'ret' value here. What if parent_type is not yet set in inode_ctx? } } ... ... if (!loc->name) { if (gf_uuid_is_null (loc->inode->gfid)) { subvolume = FIRST_CHILD (this); local->subvolume = subvolume; wind = _gf_true; goto out; } else { if (inode_type >= 0) subvolume = svc_get_subvolume (this, inode_type); else subvolume = FIRST_CHILD (this); local->subvolume = subvolume; wind = _gf_true; goto out; } } if (strcmp (loc->name, priv->path)) { if (parent_type == NORMAL_INODE) { subvolume = FIRST_CHILD (this); local->subvolume = subvolume; } else { >>> Do we need to check if parent_type is actually set to VIRTUAL_INODE? subvolume = SECOND_CHILD (this); local->subvolume = subvolume; } } else { subvolume = SECOND_CHILD (this); local->subvolume = subvolume; if (parent_type == NORMAL_INODE) { /* Indication of whether the lookup is happening on the entry point or not, to the snapview-server. */ SVC_ENTRY_POINT_SET (this, xdata, op_ret, op_errno, new_xdata, priv, ret, out); } } --- Additional comment from Soumya Koduri on 2017-08-10 05:03:37 EDT --- From nfs.log provided in comment#14 - http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1478716/Aug09_Debug/ Around the time when nfs xlator logged ESTALE errors- [2017-08-09 14:32:22.109572] D [MSGID: 0] [client-rpc-fops.c:2948:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7f88d007dd00, butcher-snapd-client returned -1 error: Stale file handle [Stale file handle] [2017-08-09 14:32:22.109605] D [snapview-client.c:289:gf_svc_lookup_cbk] 0-butcher-snapview-client: Lookup failed on snapview graph with error Stale file handle >>>> If the lookup was sent to regular graph first, we should this error message recorded as well -"Lookup failed on normal graph with error... ". >>>> But that doesn't seem to be the case [2017-08-09 14:32:22.109634] D [MSGID: 0] [snapview-client.c:329:gf_svc_lookup_cbk] 0-stack-trace: stack-address: 0x7f88d007dd00, butcher-snapview-client returned -1 error: Stale file handle [Stale file handle] [2017-08-09 14:32:22.109664] D [MSGID: 0] [io-stats.c:2191:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f88d007dd00, butcher returned -1 error: Stale file handle [Stale file handle] [2017-08-09 14:32:22.109685] E [MSGID: 112198] [nfs3-helpers.c:3691:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: /d4/dir1: Stale file handle [Stale file handle] [2017-08-09 14:32:22.109719] E [MSGID: 112069] [nfs3.c:1692:nfs3_access_resume] 0-nfs-nfsv3: Stale file handle: (192.168.79.149:992) butcher : 8694b0d6-8fba-40e3-83b9-c45d0f1ca515 [2017-08-09 14:32:22.109747] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: /d4/dir1 => (XID: 6be86, ACCESS: NFS: 70(Invalid file handle), POSIX: 14(Bad address)) [2017-08-09 14:32:22.110010] D [MSGID: 0] [nfs3-helpers.c:1646:nfs3_log_common_call] 0-nfs-nfsv3: XID: 106be86, ACCESS: args: FH: exportid 1ff111d1-32be-4590-9b9b-b1b756e29ba9, gfid 8694b0d6-8fba-40e3-83b9-c45d0f1ca515, mountid e4706d5b-0000-0000-0000-000000000000
REVIEW: https://review.gluster.org/18028 (uss/svc: Send the first lookup to the normal graph) posted (#1) for review on master by mohammed rafi kc (rkavunga)
COMMIT: https://review.gluster.org/18028 committed in master by mohammed rafi kc (rkavunga) ------ commit d396d358d4f0cfe87693179cfd13eb2a84ce62c2 Author: Mohammed Rafi KC <rkavunga> Date: Fri Aug 11 17:50:52 2017 +0530 uss/svc: Send the first lookup to the normal graph When parent inode is not looked up before, the inode type won't be set. In this scenario we should send the lookup to the normal graph first. Change-Id: I5d3e53f5c1c33a17be32204f3114749d9b1e6db8 BUG: 1480591 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: https://review.gluster.org/18028 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: soumya k <skoduri>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report. glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html [2] https://www.gluster.org/pipermail/gluster-users/