Bug 1813925

Summary: remove-brick cause SIGSEGV when trash translator is enabled
Product: [Community] GlusterFS Reporter: Pavel Znamensky <kompastver>
Component: trash-xlatorAssignee: bugs <bugs>
Status: CLOSED UPSTREAM QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-17 03:37:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gdb thread apply all bt none

Description Pavel Znamensky 2020-03-16 13:52:49 UTC
Created attachment 1670549 [details]
gdb thread apply all bt

Description of problem:
Glusterd crashes during rebalancing after `remove-brick` when trash translator is enabled.
If trash translator is disabled it doesn't crash.

Version-Release number of selected component (if applicable):
6.8

How reproducible:
Always in our production environment. It crashes not immediately though.


Steps to Reproduce:
1. setup and start distributed-replicated volume
2. mount & copy some files
3. enable trash (features.trash + features.trash-internal-op)
4. remove bricks from the volume
5. wait

Actual results:
After some time all glusterd processes in the same replication group crashes

Expected results:
glusterd doesn't crash :)

Additional info:

Please see full gdb output in attachment. But the cause seems to be in this thread:

Thread 1 (Thread 0x7fd5b4574700 (LWP 3450)):
#0  0x00007fd5e8520cb4 in __strchr_sse42 () from /lib64/libc.so.6
#1  0x00007fd5dac67d24 in remove_trash_path () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so
#2  0x00007fd5dac684ae in trash_unlink_mkdir_cbk () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so
#3  0x00007fd5db4a6302 in posix_mkdir () from /usr/lib64/glusterfs/6.8/xlator/storage/posix.so
#4  0x00007fd5dac6f49f in trash_unlink_rename_cbk () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so
#5  0x00007fd5db4ab466 in posix_rename () from /usr/lib64/glusterfs/6.8/xlator/storage/posix.so
#6  0x00007fd5dac70773 in trash_unlink_stat_cbk () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so
#7  0x00007fd5db4b1431 in posix_stat () from /usr/lib64/glusterfs/6.8/xlator/storage/posix.so
#8  0x00007fd5dac71807 in trash_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so
#9  0x00007fd5daa29b0e in changelog_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/changelog.so
#10 0x00007fd5da5ea452 in br_stub_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/bitrot-stub.so
#11 0x00007fd5da3d5f68 in posix_acl_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/access-control.so
#12 0x00007fd5da1992ce in pl_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/locks.so
#13 0x00007fd5d9f869c8 in worm_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/worm.so
#14 0x00007fd5d9d751f7 in ro_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/read-only.so
#15 0x00007fd5d9b63d62 in leases_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/leases.so
#16 0x00007fd5d9950b90 in up_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/upcall.so
#17 0x00007fd5ea0bde03 in default_unlink_resume () from /lib64/libglusterfs.so.0
#18 0x00007fd5ea034f15 in call_resume () from /lib64/libglusterfs.so.0
#19 0x00007fd5d9730e28 in iot_worker () from /usr/lib64/glusterfs/6.8/xlator/performance/io-threads.so
#20 0x00007fd5e8c1ae65 in start_thread () from /lib64/libpthread.so.0
#21 0x00007fd5e84e088d in clone () from /lib64/libc.so.6


The only error message in that time I've found is in *-rebalance.log:

[2020-03-11 09:24:10.353206] E [dht-rebalance.c:3829:gf_defrag_fix_layout] 0-glusteraudio1-dht: /.trashcan/somedir gfid not present

Comment 1 Worker Ant 2020-03-17 03:37:36 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1120, and will be tracked there from now on. Visit GitHub issues URL for further details