1398846 – [Ganesha+SSL] : Ganesha crashes during random writes

Bug 1398846 - [Ganesha+SSL] : Ganesha crashes during random writes

Summary: [Ganesha+SSL] : Ganesha crashes during random writes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Matt Benjamin (redhat)
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1401182
TreeView+	depends on / blocked

Reported:	2016-11-26 15:29 UTC by Ambarish
Modified:	2017-03-28 06:56 UTC (History)
CC List:	12 users (show)
Fixed In Version:	nfs-ganesha-2.4.1-3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 06:25:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
thread_all_bt.txt (592.91 KB, text/plain) 2016-11-29 14:41 UTC, Soumya Koduri	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:0493	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.2.0 nfs-ganesha bug fix and enhancement update	2017-03-23 09:19:13 UTC

Description Ambarish 2016-11-26 15:29:54 UTC

Description of problem:
-----------------------

4 node cluster.4 clients mount a 2*2 export via v4.
Ran iozone random writes in a distributed-multithreaded way.
Ganesha crashed and dumped core on one of the nodes .

BT from crash :

(gdb) 
#0  0x00007fa3b4c5c1d7 in raise () from /lib64/libc.so.6
#1  0x00007fa3b4c5d8c8 in abort () from /lib64/libc.so.6
#2  0x00007fa3b715f80a in nfs_dupreq_rele (req=0x7f9ff4fb3348, func=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:1234
#3  0x00007fa3b70e3ef1 in nfs_rpc_execute (reqdata=reqdata@entry=0x7f9ff4fb3320)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1405
#4  0x00007fa3b70e578a in worker_run (ctx=0x7fa3b8537400)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#5  0x00007fa3b716f189 in fridgethr_start_routine (arg=0x7fa3b8537400)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#6  0x00007fa3b564fdc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fa3b4d1e73d in clone () from /lib64/libc.so.6
(gdb) 


Version-Release number of selected component (if applicable):
------------------------------------------------------------

nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64

openssl-1.0.1e-60.el7.x86_64


How reproducible:
-----------------

Reporting the first occurrence.

Steps to Reproduce:
-------------------

1. TLS authenticate all the 4 nodes and the clients.

2. Mount the volume via Ganesha vers=4.Create data set using iozone seq writes.

3. Run iozone rand writes (iozone -+m <conf file> -+h <host> -C -w -c -e -i 2 -J 3 -+n -r 64k -s 2g -t 16)

Actual results:
---------------

Ganesha crashes on 1 of the nodes.

Expected results:
----------------

No crashes.

Additional info:
----------------

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: c3d28ba6-98a2-4e66-b412-2a6580030ffd
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
server.ssl: on
client.ssl: on
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.stat-prefetch: off
server.allow-insecure: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas005 /]#

Comment 3 Ambarish 2016-11-26 15:34:59 UTC

One thing I forgot to mention in description is that both - I/O and management encryption are enabled :

[2016-11-26 12:41:45.479309] I [socket.c:4021:socket_init] 0-testvol-client-2: SSL support on the I/O path is ENABLED
[2016-11-26 12:41:45.479330] I [socket.c:4024:socket_init] 0-testvol-client-2: SSL support for glusterd is ENABLED

Comment 4 Soumya Koduri 2016-11-29 14:40:20 UTC


(gdb) bt
#0  0x00007fa3b4c5c1d7 in raise () from /lib64/libc.so.6
#1  0x00007fa3b4c5d8c8 in abort () from /lib64/libc.so.6
#2  0x00007fa3b715f80a in nfs_dupreq_rele (req=0x7f9ff4fb3348, func=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:1234
#3  0x00007fa3b70e3ef1 in nfs_rpc_execute (reqdata=reqdata@entry=0x7f9ff4fb3320) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1405
#4  0x00007fa3b70e578a in worker_run (ctx=0x7fa3b8537400) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#5  0x00007fa3b716f189 in fridgethr_start_routine (arg=0x7fa3b8537400) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#6  0x00007fa3b564fdc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fa3b4d1e73d in clone () from /lib64/libc.so.6
(gdb) f 2
#2  0x00007fa3b715f80a in nfs_dupreq_rele (req=0x7f9ff4fb3348, func=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:1234
1234		PTHREAD_MUTEX_lock(&dv->mtx);
(gdb) l
1229			func->free_function(req->rq_u2);
1230			free_nfs_res(req->rq_u2);
1231			goto out;
1232		}
1233	
1234		PTHREAD_MUTEX_lock(&dv->mtx);
1235	
1236		LogFullDebug(COMPONENT_DUPREQ,
1237			     "releasing dv=%p xid=%u on DRC=%p state=%s, refcnt=%d",
1238			     dv, dv->hin.tcp.rq_xid, dv->hin.drc,
(gdb) p &dv
Address requested for identifier "dv" which is in register $rbp
(gdb) p dv
$1 = (dupreq_entry_t *) 0x7fa264002460
(gdb) p dv->mtx
$2 = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}
(gdb) p &dv->mtx
$3 = (pthread_mutex_t *) 0x7fa264002490
(gdb) p *$
$4 = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}
(gdb)



/var/log/messages - 
Nov 26 09:14:33 gqas011 systemd: nfs-ganesha.service: main process exited, code=killed, status=6/ABRT
Nov 26 09:14:33 gqas011 systemd: Unit nfs-ganesha.service entered failed state.
Nov 26 09:14:33 gqas011 systemd: nfs-ganesha.service failed.


/var/log/ganesha.log -
26/11/2016 08:55:19 : epoch 58200000 : gqas011.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-16848[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
26/11/2016 09:14:28 : epoch 58200000 : gqas011.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-16848[work-56] nfs_dupreq_rele :RW LOCK :CRIT :Error 22, acquiring mutex 0x7fa264002490 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:1234

pthread_mutex_lock failed with Error 22 (EINVAL). Maybe this mutex was not initialized properly resulting in error while accessing. 

From the code, looks like we do not abort if init of this mtx fails -

759 static inline dupreq_entry_t *alloc_dupreq(void)
 760 {
 761         dupreq_entry_t *dv;
 762 
 763         dv = pool_alloc(dupreq_pool);
 764         gsh_mutex_init(&dv->mtx, NULL);
 765         TAILQ_INIT_ENTRY(dv, fifo_q);
 766 
 767         return dv;
 768 }


Not sure if that is actual cause. Request Frank/Dan/Matt to provide comments. 

Attaching bt of all threads.

Comment 5 Soumya Koduri 2016-11-29 14:41:32 UTC

Created attachment 1225880 [details]
thread_all_bt.txt

Comment 7 Daniel Gryniewicz 2016-11-29 17:38:17 UTC

This smells like a double-free to me, rather than a failure to initialize.  I don't know the dupreq code very well, but it looks like maybe both nfs_dupreq_finish() and nfs_dupreq_rele() can be called on the same dupreq?  Maybe there's a refcount bug here.

Frank?

Comment 8 Matt Benjamin (redhat) 2016-11-29 22:50:08 UTC

Initialization failure is more or less impossible, so yes, I'd guess that there was an attempt to access a destroyed mutex (or freed memory).

I don't see an easy way for glusterfs setup (e.g., SSL) to be conditioning this, unless somehow we scribbled on req->rq_u1 (ganesha private data).

Moving the trace statement at line 1236 out of the critical section, and logging, could be helpful.

Comment 9 Matt Benjamin (redhat) 2016-11-29 23:07:58 UTC

(In reply to Matt Benjamin (redhat) from comment #8)
addl notes (debugging the core)

1. dv looks vaguely sane, and it has refcnt==1 and state=DUPREQ_COMPLETE, which happens in nfs_dupreq_finish

2. one thing that jumps out at me is that it looks like nfs_dupreq_rele() and and nfs_dupreq_finish() could maybe free the same entry if dv==ov in nfs_dupreq_finish();  worth checking for that case and asserting?

Comment 10 Matt Benjamin (redhat) 2016-11-29 23:15:44 UTC

(In reply to Matt Benjamin (redhat) from comment #9)
> (In reply to Matt Benjamin (redhat) from comment #8)
> addl notes (debugging the core)
> 
> 1. dv looks vaguely sane, and it has refcnt==1 and state=DUPREQ_COMPLETE,
> which happens in nfs_dupreq_finish
> 
> 2. one thing that jumps out at me is that it looks like nfs_dupreq_rele()
> and and nfs_dupreq_finish() could maybe free the same entry if dv==ov in
> nfs_dupreq_finish();  worth checking for that case and asserting?

(not really that trivial, as we check ov's refcnt, and didn't decrement it--we don't do that till nfs_dupreq_rele();  as well, we have refcnt==1, as above, so I don't see how that call path could have happened)

Comment 12 Atin Mukherjee 2016-11-30 13:02:05 UTC

This BZ has been accepted to be fixed in rhgs-3.2.0 as per today's triage exercise.

Comment 15 Matt Benjamin (redhat) 2016-12-05 22:20:58 UTC

I think the answer was that deduced by Malahal Naineni upstream.

Basically, the retire logic in nfs_dupreq_finish() races with the logic in nfs_dupreq_start() to re-use an entry.  The existing code incorrectly tries to serialize on a mutex in the drc entry itself, which does not control its reachability.  Malahal fixes this by taking the rbt partiton lock and rechecking refcnt, in the retire path.

The upstream change (not yet merged to nfs-ganesha master) is here git fetch ssh://mattbenjamin.io:29419/ffilz/nfs-ganesha refs/changes/01/304701/2 && git cherry-pick FETCH_HEAD

At the same time, working through this code, I found 3 issues that appear to make caches grow too large.  Proposed changes for early review here:
https://review.gerrithub.io/#/c/304819/

Comment 16 Ambarish 2017-01-30 06:00:30 UTC

Cannot reproduce on 2.4.1-6/3.8.4-13.

Verified on latest bits.

Comment 18 errata-xmlrpc 2017-03-23 06:25:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html

Note You need to log in before you can comment on or make changes to this bug.