| Summary: | [Tiering]: IO failure with 'Remote I/O error' on NFS-Ganesha -V4 mount on a tiered volume | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | krishnaram Karthick <kramdoss> |
| Component: | tier | Assignee: | Jiffin <jthottan> |
| Status: | CLOSED WONTFIX | QA Contact: | krishnaram Karthick <kramdoss> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.1 | CC: | kkeithle, mzywusko, nbalacha, ndevos, rhs-bugs, rkavunga, skoduri |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-02-06 17:52:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1331493/ There is an existing bug for 'inode ctx is NULL' error message --> https://bugzilla.redhat.com/show_bug.cgi?id=1324338 This error msgs can be ignored. Thank you for your bug report. We are not further root causing this bug, as a result this bug is being closed as WONTFIX. Please reopen if the problem continues to be observed after upgrading to a latest version. |
Description of problem: On a tiered volume configured over 4 node gluster cluster, all IO operations halted and errored with 'Remote I/O error' from ganesha-v4 mount. Any lookup/read/write operation on the mountpoint fails with 'Remote I/O error'. gluster v info ganesha-tier Volume Name: ganesha-tier Type: Tier Volume ID: 5dc054c0-b15c-49dc-9494-38bd04d05819 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: 10.70.47.156:/bricks/brick1/l1 Brick2: 10.70.47.156:/bricks/brick0/l1 Cold Tier: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 2) = 6 Brick3: 10.70.47.192:/bricks/brick0/l1 Brick4: 10.70.47.178:/bricks/brick0/l1 Brick5: 10.70.47.160:/bricks/brick0/l1 Brick6: 10.70.47.192:/bricks/brick1/l1 Brick7: 10.70.47.178:/bricks/brick1/l1 Brick8: 10.70.47.160:/bricks/brick1/l1 Options Reconfigured: cluster.watermark-hi: 10 cluster.watermark-low: 5 cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on nfs.disable: on features.cache-invalidation: on ganesha.enable: on features.quota: off features.inode-quota: off cluster.enable-shared-storage: enable nfs-ganesha: enable log snippet from brick logs of hot tier ========================================= [2016-04-28 20:30:29.005256] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ganesha-tier-posix: null gfid for path (null) [2016-04-28 20:30:29.005306] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ganesha-tier-posix: lstat on null failed [Invalid argument] The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ganesha-tier-posix: null gfid for path (null)" repeated 3 times between [2016-04-28 20:30:29.005256] and [2016-04-28 20:30:29.008873] The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ganesha-tier-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2016-04-28 20:30:29.005306] and [2016-04-28 20:30:29.008876] [2016-04-28 20:40:30.002214] I [dict.c:473:dict_get] (-->/lib64/libglusterfs.so.0(default_getxattr_cbk+0xac) [0x7f136ea53d1c] -->/usr/lib64/glusterfs/3.7.9/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f135af54917] -->/lib64/libglusterfs.so.0(dict_get+0xac) [0x7f136ea4415c] ) 0-dict: !this || key=() [Invalid argument] [2016-04-28 20:40:30.005899] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ganesha-tier-posix: null gfid for path (null) [2016-04-28 20:40:30.005956] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ganesha-tier-posix: lstat on null failed [Invalid argument] The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ganesha-tier-posix: null gfid for path (null)" repeated 3 times between [2016-04-28 20:40:30.005899] and [2016-04-28 20:40:30.009422] The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ganesha-tier-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2016-04-28 20:40:30.005956] and [2016-04-28 20:40:30.009428] log snippet from bricks of cold tier ====================================== l_get+0x5e)[0x7fd89d85790e] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_lookup_cbk+0x1e5)[0x7fd89d85b515] ))))) 0-ganesha-tier-access-control: inode ctx is NULL for 00000000-0000-0000-0000-000000000000 [2016-04-28 21:43:00.174049] E [posix-acl.c:357:posix_acl_ctx_get] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fd8aca36b42] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_ctx_get+0x88)[0x7fd89d8573c8] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(__posix_acl_get+0x11)[0x7fd89d857541] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_get+0x5e)[0x7fd89d85790e] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_lookup_cbk+0x1e5)[0x7fd89d85b515] ))))) 0-ganesha-tier-access-control: inode ctx is NULL for 00000000-0000-0000-0000-000000000000 [2016-04-28 21:43:00.761623] E [posix-acl.c:357:posix_acl_ctx_get] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fd8aca36b42] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_ctx_get+0x88)[0x7fd89d8573c8] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(__posix_acl_get+0x11)[0x7fd89d857541] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_get+0x5e)[0x7fd89d85790e] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_lookup_cbk+0x1e5)[0x7fd89d85b515] ))))) 0-ganesha-tier-access-control: inode ctx is NULL for 00000000-0000-0000-0000-000000000000 [2016-04-28 21:43:01.354477] E [posix-acl.c:357:posix_acl_ctx_get] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fd8aca36b42] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_ctx_get+0x88)[0x7fd89d8573c8] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(__posix_acl_get+0x11)[0x7fd89d857541] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_get+0x5e)[0x7fd89d85790e] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_lookup_cbk+0x1e5)[0x7fd89d85b515] ))))) 0-ganesha-tier-access-control: inode ctx is NULL for 00000000-0000-0000-0000-000000000000 [2016-04-28 21:43:01.361863] E [posix-acl.c:357:posix_acl_ctx_get] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fd8aca36b42] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_ctx_get+0x88)[0x7fd89d8573c8] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(__posix_acl_get+0x11)[0x7fd89d857541] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_get+0x5e)[0x7fd89d85790e] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_lookup_cbk+0x1e5)[0x7fd89d85b515] ))))) 0-ganesha-tier-access-control: inode ctx is NULL for 00000000-0000-0000-0000-000000000000 [2016-04-28 21:43:03.989918] E [posix-acl.c:357:posix_acl_ctx_get] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fd8aca36b42] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_ctx_get+0x88)[0x7fd89d8573c8] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(__posix_acl_get+0x11)[0x7fd89d857541] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_get+0x5e)[0x7fd89d85790e] (--> /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so(posix_acl_lookup_cbk+0x1e5)[0x7fd89d85b515] ))))) 0-ganesha-tier-access-control: inode ctx is NULL for 00000000-0000-0000-0000-000000000000 log snippet from /var/log/ganesha.log ======================================= 28/04/2016 20:39:51 : epoch 572216da : dhcp47-192.lab.eng.blr.redhat.com : ganesha.nfsd-15184[work-4] cache_inode_readdir :NFS READDIR :CRIT :cache_inode_lock_trust_attrs returned CACHE_INODE_SERVERFAULT for file-1 - bailing out 28/04/2016 20:39:54 : epoch 572216da : dhcp47-192.lab.eng.blr.redhat.com : ganesha.nfsd-15184[work-5] cache_inode_readdir :NFS READDIR :CRIT :cache_inode_lock_trust_attrs returned CACHE_INODE_SERVERFAULT for file-1 - bailing out 28/04/2016 20:41:43 : epoch 572216da : dhcp47-192.lab.eng.blr.redhat.com : ganesha.nfsd-15184[work-13] cache_inode_readdir :NFS READDIR :CRIT :cache_inode_lock_trust_attrs returned CACHE_INODE_SERVERFAULT for file-1 - bailing out 28/04/2016 20:41:49 : epoch 572216da : dhcp47-192.lab.eng.blr.redhat.com : ganesha.nfsd-15184[work-8] cache_inode_readdir :NFS READDIR :CRIT :cache_inode_lock_trust_attrs returned CACHE_INODE_SERVERFAULT for file-1 - bailing out 28/04/2016 20:42:00 : epoch 572216da : dhcp47-192.lab.eng.blr.redhat.com : ganesha.nfsd-15184[work-7] cache_inode_readdir :NFS READDIR :CRIT :cache_inode_lock_trust_attrs returned CACHE_INODE_SERVERFAULT for file-1 - bailing out Version-Release number of selected component (if applicable): glusterfs-server-3.7.9-2.el7rhgs.x86_64 How reproducible: Yet to determine Steps to Reproduce: 1. create a 4+2 dispersed volume from 3 nodes [2 bricks from each node] 2. Mount the volume using NFS-ganesha v4 protocol 3. create 50k files, 1k directories & perform few rename operations 4. untar linux kernel from another v4 mount 5. Attach a 1x2 hot tier 6. write 10 x 1Gb files from the mount point 7. Perform a lookup on all the dirs and sub-dirs on this volume 8. Allow fix layout to complete on the hot tier Actual results: File writes initiated in step 6, fails with 'Remote IO error'. Expected results: No IO error should be seen Additional info: sosreports and statedumps shall be attached