One brick on one server is crashed and all attempts to bring it back online have failed.The corresponding brick on the other (of a replica 2) server is ok. Other bricks are ok. The brick process is getting crashed at the time running a lookup operation.After analyzed the coredump we have found it was crashed because on file huge xattrs are created.On the brick side it run's every posix_operation under iot_worker thread and per thread stack size is around 256k. In posix_layer the function posix_get_ancestry_non_directory call's alloca based on backend xattr size because the size was bigger than 256k so it was crashed.
The issue is already fixed in upstream https://github.com/gluster/glusterfs/issues/1699
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1462