Bug 1694595
Summary: | gluster fuse mount crashed, when deleting 2T image file from RHV Manager UI | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | |
Component: | sharding | Assignee: | Krutika Dhananjay <kdhananj> | |
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | rhgs-3.4 | CC: | amukherj, bkunal, jahernan, kdhananj, pasik, rhs-bugs, sabose, sheggodu, storage-qa-internal, ykaul | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.5.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-6.0-5 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1694604 1696136 (view as bug list) | Environment: | ||
Last Closed: | 2019-10-30 12:20:50 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1694604, 1696136, 1696807 |
Description
SATHEESARAN
2019-04-01 08:32:45 UTC
frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-04-01 07:57:53 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7fc72c186b9d] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fc72c191114] /lib64/libc.so.6(+0x36280)[0x7fc72a7c2280] /usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0x9627)[0x7fc71f8ba627] /usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0x9ef1)[0x7fc71f8baef1] /usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so(+0x3ae9c)[0x7fc71fb15e9c] /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x9e8c)[0x7fc71fd88e8c] /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0xb79b)[0x7fc71fd8a79b] /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0xc226)[0x7fc71fd8b226] /usr/lib64/glusterfs/3.12.2/xlator/protocol/client.so(+0x17cbc)[0x7fc72413fcbc] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fc72bf2ca00] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x26b)[0x7fc72bf2cd6b] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fc72bf28ae3] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x7586)[0x7fc727043586] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x9bca)[0x7fc727045bca] /lib64/libglusterfs.so.0(+0x8a870)[0x7fc72c1e5870] /lib64/libpthread.so.0(+0x7dd5)[0x7fc72afc2dd5] /lib64/libc.so.6(clone+0x6d)[0x7fc72a889ead] Upstream patches - https://review.gluster.org/q/topic:%22ref-1696136%22+(status:open%20OR%20status:merged) Patch https://review.gluster.org/c/glusterfs/+/22517 fixes the original bug reported by Satheesaran. I also identified another bug while debugging this original issue which is fixed here - https://review.gluster.org/c/glusterfs/+/22507 The commit message and the .t should explain what it fixes and how to hit this crash. There's also a third crash I found while reading code that is harder to hit - it will be hit only when the lru list is filled with a mix of shards from > 160-170 different vm images per hypervisor and each of them being > 6GB in size and they're all created with preallocation in parallel and immediately deleted in parallel. I have yet to fix it because it's a harder problem to solve as the very shards that are required in a deletion operation could end up evicting and inode_unlink()ing the other participant shards of the same image leading to incorrect. In my tests, at best I could see a crash but the unlink succeeded. But I'm surprised unlink even worked. I need to debug why. Moving the bz to POST in any case. Krutika - we need to get this bug fixed in the early stage of development. I see that there's one patch which is pending review. Can we please ensure this patch is merged and backports are done so that this BZ can move to ON_QA in next build of RHGS 3.5.0? (In reply to Atin Mukherjee from comment #5) > Krutika - we need to get this bug fixed in the early stage of development. I > see that there's one patch which is pending review. Can we please ensure > this patch is merged and backports are done so that this BZ can move to > ON_QA in next build of RHGS 3.5.0? Ack. Will ping Xavi for review. Tested with RHVH 4.3.5 based on RHEL 7.7 with glusterfs-6.0-7 with 2 test scenarios 1. Created the multiple preallocated raw images with their aggregate size exceeding 2TB and deleted them all together ( concurrent ) 2. Created multiple 2TB preallocated raw images and delete them concurrently On the both of the above mentioned scenarios, the deletion of VM images was smooth, no issues seen, all hosts were operational and DC was fully functional Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249 |