Bug 1662059
Summary: | [RHV-RHGS] Fuse mount crashed while deleting a 1 TB image file from RHV | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | ||||||
Component: | sharding | Assignee: | Krutika Dhananjay <kdhananj> | ||||||
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | rhgs-3.4 | CC: | abhishku, rcyriac, rhs-bugs, sabose, sankarshan, sasundar, sheggodu, storage-qa-internal | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | RHGS 3.4.z Batch Update 3 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.12.2-37 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1662368 1663208 (view as bug list) | Environment: |
RHV-RHGS Integration
|
||||||
Last Closed: | 2019-02-04 07:41:44 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1662368, 1665803 | ||||||||
Bug Blocks: | 1663208 | ||||||||
Attachments: |
|
Description
SATHEESARAN
2018-12-25 17:54:03 UTC
Created attachment 1516688 [details]
Fuse mount log 1
Created attachment 1516689 [details]
Fuse mount log 2
So there is no core dump and I can't tell much from just the logs. From [root@dhcp37-127 ~]# cat /proc/sys/kernel/core_pattern |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %e %i Seems like this should be set to a valid path for us to get the core dump. Would be great if you can change this value to a meaningful path and recreate the issue. -Krutika (In reply to Krutika Dhananjay from comment #5) > So there is no core dump and I can't tell much from just the logs. > > From > [root@dhcp37-127 ~]# cat /proc/sys/kernel/core_pattern > |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %e %i > > Seems like this should be set to a valid path for us to get the core dump. > > Would be great if you can change this value to a meaningful path and > recreate the issue. > > -Krutika I could reproduce the issue consistently outside of RHV-RHGS setup. With 3 RHGS servers and 1 client. 1. Create 5 VM image files on the fuse mounted gluster volume using qemu-img command # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm1.img 10G # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm2.img 7G # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm3.img 5G # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm4.img 4G 2. Delete the files from the mount # rm -rf /mnt/testdata/* The above step hits the crash,close to consistent I will reinstall the required debug packages and will provide the setup for debugging (In reply to SATHEESARAN from comment #6) > (In reply to Krutika Dhananjay from comment #5) > > So there is no core dump and I can't tell much from just the logs. > > > > From > > [root@dhcp37-127 ~]# cat /proc/sys/kernel/core_pattern > > |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %e %i > > > > Seems like this should be set to a valid path for us to get the core dump. > > > > Would be great if you can change this value to a meaningful path and > > recreate the issue. > > > > -Krutika > > I could reproduce the issue consistently outside of RHV-RHGS setup. > With 3 RHGS servers and 1 client. > > 1. Create 5 VM image files on the fuse mounted gluster volume using qemu-img > command > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm1.img 10G > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm2.img 7G > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm3.img 5G > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm4.img 4G > > 2. Delete the files from the mount > # rm -rf /mnt/testdata/* > > The above step hits the crash,close to consistent > > I will reinstall the required debug packages and will provide the setup for > debugging Is the mountpoint in step 1 different from the one used in 2? In step 1, files are created under /mnt/test/. But the rm -rf is done from /mnt/testdata/ -Krutika (In reply to Krutika Dhananjay from comment #8) > (In reply to SATHEESARAN from comment #6) > > (In reply to Krutika Dhananjay from comment #5) > > > So there is no core dump and I can't tell much from just the logs. > > > > > > From > > > [root@dhcp37-127 ~]# cat /proc/sys/kernel/core_pattern > > > |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %e %i > > > > > > Seems like this should be set to a valid path for us to get the core dump. > > > > > > Would be great if you can change this value to a meaningful path and > > > recreate the issue. > > > > > > -Krutika > > > > I could reproduce the issue consistently outside of RHV-RHGS setup. > > With 3 RHGS servers and 1 client. > > > > 1. Create 5 VM image files on the fuse mounted gluster volume using qemu-img > > command > > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm1.img 10G > > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm2.img 7G > > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm3.img 5G > > # qemu-img create -f qcow2 -o preallocation=full /mnt/test/vm4.img 4G > > > > 2. Delete the files from the mount > > # rm -rf /mnt/testdata/* > > > > The above step hits the crash,close to consistent > > > > I will reinstall the required debug packages and will provide the setup for > > debugging > > > Is the mountpoint in step 1 different from the one used in 2? In step 1, > files are created under /mnt/test/. But the rm -rf is done from > /mnt/testdata/ > > -Krutika I did it from same mount. No different mounts I think I found more issues in the code - 1. 'No such file or directory' logs for shards that don't exist 2. Misleading logs of the kind "Failed to clean up shards of gfid %s" (msgid: 133021) shortly after logging the fact that deletion actuall succeeded (msgid:133022) Still working on rc'ing the crash. Meanwhile, sas, since you said you were able to recreate the problem of hosts going into unresponsive state during deletion, could you capture volume profile for a run that recreates this behavior? -Krutika I've sent https://review.gluster.org/c/glusterfs/+/21946 and https://review.gluster.org/c/glusterfs/+/21957 to fix the issues mentioned in comment #10. Oddly enough, the second patch also fixed the crash. I need to analyze the code a bit more to see the correlation between the crash and this fix. -Krutika (In reply to Krutika Dhananjay from comment #11) > I've sent https://review.gluster.org/c/glusterfs/+/21946 and > https://review.gluster.org/c/glusterfs/+/21957 to fix the issues mentioned > in comment #10. > Oddly enough, the second patch also fixed the crash. > > I need to analyze the code a bit more to see the correlation between the > crash and this fix. > > -Krutika I found the problem. So the entrylk that is taken to perform deletion of shards atomically didn't have lk-owner initialized. This means multiple background deletion tasks from the same client will end up in the critical section at the same time and attempt parallel deletion of shards leading to refcount on base shard at some point becoming 0 when all its related shards have been deleted. At this point the other task, by virtue of attempting the same operation again will crash upon accessing the now-destroyed base inode. The fix at https://review.gluster.org/c/glusterfs/+/21957 eliminate this race by means of never launching more than one task for clean up per client. But for the sake of completeness, I will still go ahead, change the patch to initialize lk-owner. -Krutika Oh and nice catch, Sas! :) -Krutika (In reply to Krutika Dhananjay from comment #10) > I think I found more issues in the code - > 1. 'No such file or directory' logs for shards that don't exist > 2. Misleading logs of the kind "Failed to clean up shards of gfid %s" > (msgid: 133021) shortly after logging the fact that deletion actuall > succeeded (msgid:133022) > > Still working on rc'ing the crash. > > Meanwhile, sas, since you said you were able to recreate the problem of > hosts going into unresponsive state during deletion, could you capture > volume profile for a run that recreates this behavior? > > -Krutika Thanks Krutika, I have raised another bug - https://bugzilla.redhat.com/show_bug.cgi?id=1663367 for the latency related issue. I will get you the volume profile information details on that bug QA ack is provided for this bug based on the following reasons: 1. This bug is a blocker, as there is a crash that is evident on parallel deleted 2. This may also affect the block store use case, as gluster block storage uses sharding too. 3. This bug can be verified by RHHI-V QE with 3 days effort and can be accomodated during regression test cycle of RHHI-V Patches posted downstream: https://code.engineering.redhat.com/gerrit/#/c/160437/ https://code.engineering.redhat.com/gerrit/#/c/160436/ Tests to be done: 1. Deleting a large file (of the order of a TB) 2. Deleting multiple large files simultaneously from the same mount point 3. Deleting multiple large files simultaneously from different mount points 4. Starting deletion of a large file that is around 1TB in size, killing the gluster fuse client which is doing the deletion midway and remounting and verifying that cleanup resumed from where it left off. This is the step that ensures any kind of outage doesn't leave ghost shards behind forever. In all these runs, also monitor the logs to be sure there is no excessive logging. (Some warning logs from protocol/{client,server} might be inevitable). But there should be no excessive logging from shard. Also make sure there is only one occurrence of MSGID 133022 for every gfid and one occurrence of MSGID 133021 as well. Multiple, sometimes conflicting messages for the same gfid would indicate something is not right. Please capture volume-profile in all of the cases. Profiling itself has little overhead, so it is safe to enable it before each run. -Krutika Tested with RHV 4.2.8 & RHGS 3.4.3 ( glusterfs-3.12.2-38.el7rhgs ) with the following tests: Test1: 1. Create a image of size more than 1TB(preallocated) and then delete it. 2. Repeated the test 20 times. Test2: 1. Create 4 image of size more than 1TB(preallocated) and then delete it. 2. Delete the images parallely from the same client or from different client. 3. Repeated the test 20 times. Test3: 1. Create 4 images of size more than 1TB(preallocated) 2. Delete the image from client and immediately poweroff the host. 3. Power-on the host With all these scenarios, there are no issues seen. But when creating the preallocated image, fuse mount process crashed in some scenarios. Created a new bug - BZ 1668304 - for tracking the above issue Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0263 |