Description of problem: ----------------------- 2 Node cluster. 3 clients mounted a 2*2 volume via v4 and were running Bonnie++ ina searate working directory. Not sure if this is related but I was also collecting sosreports(which may run heal info and other gluster cmds on backend). Other than this,nothing was done on the nodes. Ganesha crashed on one of my nodes and dumped a core : (gdb) bt #0 0x00007fccfa7661f7 in raise () from /lib64/libc.so.6 #1 0x00007fccfa7678e8 in abort () from /lib64/libc.so.6 #2 0x0000562bd0b1ff1a in nfs_dupreq_rele (req=0x7fc8de517818, func=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.4/src/RPCAL/nfs_dupreq.c:1256 #3 0x0000562bd0aa48e1 in nfs_rpc_execute (reqdata=reqdata@entry=0x7fc8de5177f0) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1405 #4 0x0000562bd0aa618a in worker_run (ctx=0x562bd24b3e90) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1548 #5 0x0000562bd0b2f889 in fridgethr_start_routine (arg=0x562bd24b3e90) at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550 #6 0x00007fccfb15be25 in start_thread () from /lib64/libpthread.so.0 #7 0x00007fccfa82934d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): ------------------------------------------------------------- [root@gqas009 tmp]# rpm -qa|grep ganesha nfs-ganesha-2.4.4-8.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-25.el7rhgs.x86_64 How reproducible: ---------------- 1/1 Additional info: ---------------- Volume Name: testvol Type: Distributed-Replicate Volume ID: 3b04b36a-1837-48e8-b437-fbc091b2f992 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas007.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas009.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas007.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas009.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on server.allow-insecure: on performance.stat-prefetch: off transport.address-family: inet nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable
Okay, so there are 9 DRC fixes in 2.5, which fix things like use-after-free, refcounting, and null deref. So this is very likely fixed in 2.5. Can we put this off until the 2.5 rebase? Or do all these patches need to be backported?
fix is in nfs-ganesha-2.5.x
POST with rebase to nfs-ganesha-2.5.x
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2610