Hide Forgot
copying from https://bugzilla.redhat.com/show_bug.cgi?id=1400780#c9 From BT of core 1 in bz#1400780 and core 2 in bz#1401160 , it is clear that issue will hit only when ganesha is trying to remove a entry from its lru list. By default lru limit for ganesha's MD_CACHE is 25000 and in gfapi layer it is 131072. We suspect crashed occurred when there is race b/w removal of entry from ganesha and gluster layer. I tried to reproduce similar issue with 3 volumes(two 1x2 and one 1x1) and clients no varying from 4 to 7. Also I tried with lower value for lru limit to 20 for ganesha and 100 for gluster. But never hit this with ongoing I/O's (ran dd and linux untar from different clients). In my setup the I/O continuously ran for atleast 4 hours, then it error out saying "no space left on the device". But during clean up (rm -rf on same directories from different mount) I have consistently got crash with a similar BT during lru clean up. The crashes are more easily reproduced with lower lru limit value. When I increased the lru value to 150000 in ganesha, crash was not seen(may be it will crash eventually)
Devel ack is provided as the crash is consistently reproducible.
The reported issue was not reproducible on Ganesha 2.4.1-6,Gluster 3.8.4-12 on two tries. Will reopen if hit again during regressions.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html