Bug 1467896 - [GANESHA] Ganesha crashed while running diskfill utility on nfs share mounted on windows client
[GANESHA] Ganesha crashed while running diskfill utility on nfs share mounted...
Status: VERIFIED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.3
Unspecified Unspecified
urgent Severity unspecified
: ---
: RHGS 3.4.0
Assigned To: Frank Filz
Manisha Saini
:
Depends On: 1562766 1562774
Blocks: 1503134
  Show dependency treegraph
 
Reported: 2017-07-05 08:43 EDT by Manisha Saini
Modified: 2018-05-11 09:55 EDT (History)
10 users (show)

See Also:
Fixed In Version: nfs-ganesha-2.5.4-1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Windows snippet (148.35 KB, image/png)
2018-05-07 13:20 EDT, Manisha Saini
no flags Details

  None (edit)
Description Manisha Saini 2017-07-05 08:43:01 EDT
Description of problem:
Ganesha crashed while running diskfill utility on nfs share mounted on windows client

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
nfs-ganesha-2.4.4-10.el7rhgs.x86_64


How reproducible:
3/3 times

Steps to Reproduce:
1.Create a 4 node ganesha cluster
2.Create a 6*2 dist-replicate volume and enable on it.
3.Mount the volume to windows client (V3)
4.Run diskfill utility from windows client

Actual results:
Ganesha got crashed on the node from which the volume was mounted on windows client.

Expected results:
Ganesha should not get crashed

Additional info:


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc16d772700 (LWP 26131)]
dec_nlm_state_ref (state=0x0) at /usr/src/debug/nfs-ganesha-2.4.4/src/SAL/nlm_state.c:276
276		refcount = atomic_dec_int32_t(&state->state_refcount);
(gdb) bt
#0  dec_nlm_state_ref (state=0x0) at /usr/src/debug/nfs-ganesha-2.4.4/src/SAL/nlm_state.c:276
#1  0x0000556f2173596e in nlm4_Lock (args=<optimized out>, req=<optimized out>, res=0x7fc1fc008770)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NLM/nlm_Lock.c:206
#2  0x0000556f216f6b1c in nfs_rpc_execute (reqdata=reqdata@entry=0x7fc1cc015090)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1281
#3  0x0000556f216f818a in worker_run (ctx=0x556f2310cdb0) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1548
#4  0x0000556f21781889 in fridgethr_start_routine (arg=0x556f2310cdb0) at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#5  0x00007fc213805e25 in start_thread () from /lib64/libpthread.so.0
#6  0x00007fc212ed334d in clone () from /lib64/libc.so.6


Ran the diskfill utility on RHEL client on both V3 and V4.
Runs fine with RHEL clients


Attaching sosreports and core file shortly
Comment 5 Jiffin 2017-07-06 02:00:13 EDT
(In reply to Daniel Gryniewicz from comment #4)
> This may be
> https://github.com/nfs-ganesha/nfs-ganesha/commit/
> 52e0e125322fb0cc5c608be4cd43b90a702d88e2

The issue in above case was "state" was not being allocated from get_nlm_state() and above change was addressing refcount issue. Correct me if I am wrong will above change prevent those scenario(guarantee that state entry will be present in the hashtable).
Comment 6 Daniel Gryniewicz 2017-07-06 09:08:48 EDT
Hmmm... I missed that nsm_state_applies is false.  This should *not* be possible for a LOCK op.  The only way to call nlm4_Lock() from that call path is to have req->rq_proc == NLMPROC4_NM_LOCK, which makes nsm_state_applies true.  So something else is very very wrong here, I think.
Comment 7 Frank Filz 2017-07-06 09:18:02 EDT
(In reply to Daniel Gryniewicz from comment #6)
> Hmmm... I missed that nsm_state_applies is false.  This should *not* be
> possible for a LOCK op.  The only way to call nlm4_Lock() from that call
> path is to have req->rq_proc == NLMPROC4_NM_LOCK, which makes
> nsm_state_applies true.  So something else is very very wrong here, I think.

Windows client uses NLMPROC4_NM_LOCK...

I probably messed something up for NM_LOCK...

Crud, we really need access to Windows client in development...
Comment 9 Manisha Saini 2017-07-11 15:21:42 EDT
Tested the same use case with 3.2-

While extracting the zip folder ganesha got crashed-

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3331db0700 (LWP 22701)]
0x0000560f118a49c5 in dec_nlm_state_ref ()
(gdb) bt
#0  0x0000560f118a49c5 in dec_nlm_state_ref ()
#1  0x0000560f11873cbe in nlm4_Lock ()
#2  0x0000560f11834eec in nfs_rpc_execute ()
#3  0x0000560f1183655a in worker_run ()
#4  0x0000560f118c02f9 in fridgethr_start_routine ()
#5  0x00007f334299ce25 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f334206a34d in clone () from /lib64/libc.so.6


But didn't hit the crash while running diskfill utility on windows mount point
Comment 10 Manisha Saini 2017-07-11 15:24:13 EDT
Got accidentally messed with assignee.Resting again
Comment 11 Frank Filz 2017-07-12 11:43:03 EDT
(In reply to Manisha Saini from comment #9)
> Tested the same use case with 3.2-
> 
> While extracting the zip folder ganesha got crashed-
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f3331db0700 (LWP 22701)]
> 0x0000560f118a49c5 in dec_nlm_state_ref ()
> (gdb) bt
> #0  0x0000560f118a49c5 in dec_nlm_state_ref ()
> #1  0x0000560f11873cbe in nlm4_Lock ()
> #2  0x0000560f11834eec in nfs_rpc_execute ()
> #3  0x0000560f1183655a in worker_run ()
> #4  0x0000560f118c02f9 in fridgethr_start_routine ()
> #5  0x00007f334299ce25 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f334206a34d in clone () from /lib64/libc.so.6
> 
> 
> But didn't hit the crash while running diskfill utility on windows mount
> point

Yea, this fix needs back porting to 3.2 also.
Comment 16 Daniel Gryniewicz 2017-08-29 13:33:18 EDT
I confirm that this commit is in 2.5.2.
Comment 22 Manisha Saini 2018-05-07 13:20 EDT
Created attachment 1432751 [details]
Windows snippet

Note You need to log in before you can comment on or make changes to this bug.