| Summary: | [Ganesha] : Ganesha crashes when nfs-ganesha is restarted amidst continuous I/O from heterogeneous clients. | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
| Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Ambarish <asoman> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.2 | CC: | amukherj, asoman, bturner, dang, ffilz, jthottan, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | rhgs-3.2.0 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-23 12:30:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
This BZ has been taken out from rhgs-3.2.0 as per today's triage exercise. |
Description of problem: ------------------------ 4 Node Ganesha cluster. 4 clients mounted a 2*2 volume,2 via v3 and 2 via v4. Workload : tarball untar,small file creates Restarted nfs-ganesha on all 4 nodes. Ganesha crashed on 2/4 nodes .The signature of both the BTs is different : ************** BT from Node 1 ************** (gdb) #0 0x00007fdbdb7fe3b0 in GlusterFS () from /usr/lib64/ganesha/libfsalgluster.so.4.2.0 #1 0x00007fdc6c7ec4c3 in mdcache_lru_clean (entry=0x7fd8f4139570) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:421 #2 mdcache_lru_unref (entry=entry@entry=0x7fd8f4139570, flags=flags@entry=0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1464 #3 0x00007fdc6c7e9e11 in mdcache_put (entry=0x7fd8f4139570) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:186 #4 mdcache_unexport (exp_hdl=0x7fdbdc0d2130) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_export.c:152 #5 0x00007fdc6c7cc473 in clean_up_export (export=0x7fdbdc002e28) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/exports.c:2266 #6 unexport (export=export@entry=0x7fdbdc002e28) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/exports.c:2287 #7 0x00007fdc6c7dc8d7 in remove_all_exports () at /usr/src/debug/nfs-ganesha-2.4.1/src/support/export_mgr.c:761 #8 0x00007fdc6c75014a in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:433 #9 admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:466 #10 0x00007fdc6acaedc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007fdc6a37d73d in clone () from /lib64/libc.so.6 (gdb) ************** BT from Node 2 ************** (gdb) #0 0x00007f230e17bc05 in __gf_free (free_ptr=0x7f2054008340) at mem-pool.c:314 #1 0x00007f230e1793ae in fd_destroy (bound=_gf_true, fd=0x7f20540d5b3c) at fd.c:523 #2 fd_unref (fd=0x7f20540d5b3c) at fd.c:568 #3 0x00007f22f7b4cd78 in client_local_wipe (local=local@entry=0x7f22f006b260) at client-helpers.c:131 #4 0x00007f22f7b52b38 in client3_3_flush_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f2304e43490) at client-rpc-fops.c:921 #5 0x00007f230df1f680 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f22f00728d0, pollin=pollin@entry=0x7f22f0446d70) at rpc-clnt.c:791 #6 0x00007f230df1f95f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f22f0072900, event=<optimized out>, data=0x7f22f0446d70) at rpc-clnt.c:962 #7 0x00007f230df1b883 in rpc_transport_notify (this=this@entry=0x7f22f0082600, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f22f0446d70) at rpc-transport.c:537 #8 0x00007f22fca20eb4 in socket_event_poll_in (this=this@entry=0x7f22f0082600) at socket.c:2267 #9 0x00007f22fca23365 in socket_event_handler (fd=<optimized out>, idx=1, data=0x7f22f0082600, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397 #10 0x00007f230e1af3d0 in event_dispatch_epoll_handler (event=0x7f22fca18540, event_pool=0x7f2308086710) at event-epoll.c:571 #11 event_dispatch_epoll_worker (data=0x7f22f8000920) at event-epoll.c:674 #12 0x00007f2399d39dc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f239940873d in clone () from /lib64/libc.so.6 (gdb) Once again,these cores were dumped when ganesha restart stopped the ganesha process ,as the process was alive and running post the crash. Also,this is a different use case as reported in BZ#1393526.The BTs are different as well. Version-Release number of selected component (if applicable): ------------------------------------------------------------- nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 How reproducible: ----------------- 2/2. Steps to Reproduce: ------------------ 1. Mount the volume via v3 and v4. 2. Run different types of workload from the application side on v3 as well as v4 mounts.(smallfile,kerenl untar etc) 3. Restart ganesha service on all nodes Actual results: --------------- Ganesha crashed and dumped core. Expected results: ------------------ No crashes. Additional info: ----------------- OS : RHEL 7.3 *Vol Config* : Volume Name: testvol Type: Distributed-Replicate Volume ID: aeab0f8a-1e34-4681-bdf4-5b1416e46f27 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on server.allow-insecure: on performance.stat-prefetch: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas013 tmp]#