Created attachment 1670549 [details] gdb thread apply all bt Description of problem: Glusterd crashes during rebalancing after `remove-brick` when trash translator is enabled. If trash translator is disabled it doesn't crash. Version-Release number of selected component (if applicable): 6.8 How reproducible: Always in our production environment. It crashes not immediately though. Steps to Reproduce: 1. setup and start distributed-replicated volume 2. mount & copy some files 3. enable trash (features.trash + features.trash-internal-op) 4. remove bricks from the volume 5. wait Actual results: After some time all glusterd processes in the same replication group crashes Expected results: glusterd doesn't crash :) Additional info: Please see full gdb output in attachment. But the cause seems to be in this thread: Thread 1 (Thread 0x7fd5b4574700 (LWP 3450)): #0 0x00007fd5e8520cb4 in __strchr_sse42 () from /lib64/libc.so.6 #1 0x00007fd5dac67d24 in remove_trash_path () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so #2 0x00007fd5dac684ae in trash_unlink_mkdir_cbk () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so #3 0x00007fd5db4a6302 in posix_mkdir () from /usr/lib64/glusterfs/6.8/xlator/storage/posix.so #4 0x00007fd5dac6f49f in trash_unlink_rename_cbk () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so #5 0x00007fd5db4ab466 in posix_rename () from /usr/lib64/glusterfs/6.8/xlator/storage/posix.so #6 0x00007fd5dac70773 in trash_unlink_stat_cbk () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so #7 0x00007fd5db4b1431 in posix_stat () from /usr/lib64/glusterfs/6.8/xlator/storage/posix.so #8 0x00007fd5dac71807 in trash_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/trash.so #9 0x00007fd5daa29b0e in changelog_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/changelog.so #10 0x00007fd5da5ea452 in br_stub_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/bitrot-stub.so #11 0x00007fd5da3d5f68 in posix_acl_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/access-control.so #12 0x00007fd5da1992ce in pl_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/locks.so #13 0x00007fd5d9f869c8 in worm_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/worm.so #14 0x00007fd5d9d751f7 in ro_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/read-only.so #15 0x00007fd5d9b63d62 in leases_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/leases.so #16 0x00007fd5d9950b90 in up_unlink () from /usr/lib64/glusterfs/6.8/xlator/features/upcall.so #17 0x00007fd5ea0bde03 in default_unlink_resume () from /lib64/libglusterfs.so.0 #18 0x00007fd5ea034f15 in call_resume () from /lib64/libglusterfs.so.0 #19 0x00007fd5d9730e28 in iot_worker () from /usr/lib64/glusterfs/6.8/xlator/performance/io-threads.so #20 0x00007fd5e8c1ae65 in start_thread () from /lib64/libpthread.so.0 #21 0x00007fd5e84e088d in clone () from /lib64/libc.so.6 The only error message in that time I've found is in *-rebalance.log: [2020-03-11 09:24:10.353206] E [dht-rebalance.c:3829:gf_defrag_fix_layout] 0-glusteraudio1-dht: /.trashcan/somedir gfid not present
This bug is moved to https://github.com/gluster/glusterfs/issues/1120, and will be tracked there from now on. Visit GitHub issues URL for further details