Description of problem: ----------------------- Ganesha crashed on 2/4 nodes during multithreaded Iozone reads from 4 clients and 16 threads. Exact Workload : iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 1 -+n -r 64k -s 8g -t 16 The same issue is reproducible once you create files on the mount point using smallfile tool and try reading them in a multithreaded-distributed way. Version-Release number of selected component (if applicable): ------------------------------------------------------------- [root@gqas015 ~]# rpm -qa|grep ganesha glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64 nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64 nfs-ganesha-debuginfo-2.4.0-0.14dev26.el7.centos.x86_64 How reproducible: ----------------- 2/4 Steps to Reproduce: ------------------- 1. Setup consisted of 4 clients,4 servers.Mount gluster volume via v3.Each server mounts from 1 client. 2. Run multithreaded iozone sequential writes in a distributed way. iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16 3 . Try running seq reads the same way iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 1 -+n -r 64k -s 8g -t 16 Actual results: --------------- Ganesha crashed on 2/4 nodes Expected results: ---------------- Ganesha should not crash. Additional info: ---------------- Volume Name: testvol Type: Distributed-Replicate Volume ID: 9e8d9c1a-33da-4645-a6ad-630df25cb654 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas016.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on client.event-threads: 4 server.event-threads: 4 cluster.lookup-optimize: on ganesha.enable: on features.cache-invalidation: off nfs.disable: on performance.readdir-ahead: on performance.stat-prefetch: off server.allow-insecure: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas015 ~]#
I see nfs-ganesha re-exporting the volume - 02/08/2016 06:28:15 : epoch 57a071e6 : gqas014.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-19047[dbus_heartbeat] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Operation not supported 02/08/2016 06:28:24 : epoch 57a071e6 : gqas014.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-19047[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvol exported at : '/' Aug 2 06:28:55 gqas014 kernel: ganesha.nfsd[19062]: segfault at 7f2484946084 ip 00007f24bfb92210 sp 00007f24b4f94428 error 6 in libpthread-2.17.so[7f24bfb86000+16000] Aug 2 06:28:57 gqas014 systemd: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV Aug 2 06:28:57 gqas014 systemd: Unit nfs-ganesha.service entered failed state. Aug 2 06:28:57 gqas014 systemd: nfs-ganesha.service failed. And around the same time crash happened. So somehow volume is being re-exported resulting in crash which is being addressed as part of bug1361520 . Jiffin is building RPMs with the fix applied. Please re-test post that.
If issue is not present in new rpms, Can you please close this bug as duplicate of BZ1361520.