Before you record your issue, ensure you are using the latest version of Gluster. Provide version-Release number of selected component (if applicable): Red Hat Enterprise Linux release 7.7 glusterfs-fuse-3.12.2-47.5.el7rhgs.x86_64 Fri Nov 8 05:26:38 2019 glusterfs-server-3.12.2-47.5.el7rhgs.x86_64 Have you searched the Bugzilla archives for same/similar issues reported. Yes Did you run SoS report with Insights tool?. sos_report and abrt logs attached to the case from customer Have you discovered any workarounds?. If not, Read the troubleshooting documentation to help solve your issue. (https://mojo.redhat.com/groups/gss-gluster (Gluster feature and its troubleshooting) https://access.redhat.com/articles/1365073 (Specific debug data that needs to be collected for GlusterFS to help troubleshooting) Please provide the below Mandatory Information: 1 - gluster v <volname> info 2 - gluster v <volname> heal info 3 - gluster v <volname> status 4 - Fuse Mount/SMB/nfs-ganesha/OCS ??? Describe the issue:(please be detailed as possible and provide log snippets) [Provide TimeStamp when the issue is seen] glusterfsd process is killed by SIGSEGV on all the RHV hosts when a attempt is made to mount “rhvh1.infra.ul.pmrlabs.airbus.com:/data”. This is Master Storage domain and it fails to activate due this problem. Is this issue reproducible? If yes, share more details.: Unmounting, restarting the vsdm service and mounting again VDSM is having same problem Steps to Reproduce: 1.Unmount volume 2.Restart vsdm service 3.Rem mount Actual results: returned by VDSM was: Problem while trying to mount target Expected results: Volume mount with no issues Any Additional info: We tried to manually mount the volume and is working, we checked gluster daemons, bricks and volumes and could not find issues.
Hi, As per logs, it seems it is a known issue, most probably the issue should be similar to the bug (https://bugzilla.redhat.com/show_bug.cgi?id=1917488). I can confirm more after checking the coredump while setup will be available. [2022-09-20 11:33:05.591614] E [MSGID: 133010] [shard.c:2299:shard_common_lookup_shards_cbk] 0-data-shard: Lookup on shard 1729 failed. Base file gfid = 98f326c2-6a81-48c1-81e5-d93b41edb543 [Stale file handle] pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2022-09-20 11:33:05 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f6df6b11bdd] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f6df6b1c154] /lib64/libc.so.6(+0x363f0)[0x7f6df514b3f0] /lib64/libuuid.so.1(+0x2570)[0x7f6df6272570] /lib64/libuuid.so.1(+0x2606)[0x7f6df6272606] /lib64/libglusterfs.so.0(uuid_utoa+0x1c)[0x7f6df6b1b2ec] It is a known issue and we have already backported the patch in a downstream release(6.0.57). A fuse process is crashing due to a bug in write-behind while truncating a file. The patch is merged in the downstream build(glusterfs-fuse-6.0-57 from the bug https://bugzilla.redhat.com/show_bug.cgi?id=1917488). Either the user has to upgrade to the latest downstream release or we can suggest disabling write-behind to avoid the crash. Thanks, Mohit Agrawal
Hi, Thanks for sharing the environment to debug a core. The client process is getting crashed because a shard xlator is trying to access inode that is already unlinked while shard is trying to reattempt cleanup during remount.It is a known issue and the issue is already fixed in the release glusterfs-6.0.35(https://bugzilla.redhat.com/show_bug.cgi?id=1836233). gdb) bt #0 0x00007f916ca2e570 in uuid_unpack () from /lib64/libuuid.so.1 #1 0x00007f916ca2e606 in uuid_unparse_x () from /lib64/libuuid.so.1 #2 0x00007f916d2d72ec in gf_uuid_unparse (out=0x7f9130006cd0 "98f326c2-6a81-48c1-81e5-d93b41edb543", uuid=0x8 <Address 0x8 out of bounds>) at compat-uuid.h:57 #3 uuid_utoa (uuid=0x8 <Address 0x8 out of bounds>) at common-utils.c:2852 #4 0x00007f915e805596 in shard_post_lookup_shards_unlink_handler (frame=<optimized out>, this=0x7f915801e8d0) at shard.c:2915 #5 0x00007f915e803fa5 in shard_common_lookup_shards (frame=frame@entry=0x7f914801b598, this=this@entry=0x7f915801e8d0, inode=<optimized out>, handler=handler@entry=0x7f915e805540 <shard_post_lookup_shards_unlink_handler>) at shard.c:2458 #6 0x00007f915e80561c in shard_post_resolve_unlink_handler (frame=frame@entry=0x7f914801b598, this=this@entry=0x7f915801e8d0) at shard.c:2939 #7 0x00007f915e801b47 in shard_common_resolve_shards (frame=frame@entry=0x7f914801b598, this=this@entry=0x7f915801e8d0, post_res_handler=post_res_handler@entry=0x7f915e8055f0 <shard_post_resolve_unlink_handler>) at shard.c:1069 #8 0x00007f915e805721 in shard_regulated_shards_deletion (cleanup_frame=cleanup_frame@entry=0x7f914801b598, this=this@entry=0x7f915801e8d0, now=now@entry=100, first_block=first_block@entry=1701, entry=entry@entry=0x7f914c021c30) at shard.c:3178 #9 0x00007f915e805d84 in __shard_delete_shards_of_entry (cleanup_frame=cleanup_frame@entry=0x7f914801b598, this=this@entry=0x7f915801e8d0, entry=entry@entry=0x7f914c021c30, inode=inode@entry=0x7f914c00f888) at shard.c:3339 #10 0x00007f915e806196 in shard_delete_shards_of_entry (cleanup_frame=cleanup_frame@entry=0x7f914801b598, this=this@entry=0x7f915801e8d0, entry=entry@entry=0x7f914c021c30, inode=inode@entry=0x7f914c00f888) at shard.c:3395 #11 0x00007f915e80687f in shard_delete_shards (opaque=0x7f914801b598) at shard.c:3619 #12 0x00007f916d307840 in synctask_wrap () at syncop.c:375 #13 0x00007f916b919180 in ?? () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () (gdb) f 4 #4 0x00007f915e805596 in shard_post_lookup_shards_unlink_handler (frame=<optimized out>, this=0x7f915801e8d0) at shard.c:2915 2915 gf_msg (this->name, GF_LOG_ERROR, local->op_errno, (gdb) l 2910 shard_local_t *local = NULL; 2911 2912 local = frame->local; 2913 2914 if ((local->op_ret < 0) && (local->op_errno != ENOENT)) { 2915 gf_msg (this->name, GF_LOG_ERROR, local->op_errno, 2916 SHARD_MSG_FOP_FAILED, "failed to delete shards of %s", 2917 uuid_utoa (local->resolver_base_inode->gfid)); 2918 return 0; 2919 } (gdb) p local->resolver_base_inode $2 = (inode_t *) 0x0 (gdb) p local->resolver_base_inode->gfid Cannot access memory at address 0x8 Can we ask to upgrade the environment to avoid a crash. The earlier suggested workaround will not work in this case though they were facing that issue also because traceback was captured in logs but coredump is not available so We should suggest upgrading the environment after the release (6.0.57) to avoid both issues. Thanks, Mohit Agrawal
*** This bug has been marked as a duplicate of bug 1836233 ***