Bug 1242756 - nfs-ganesha: nfs-ganesha process crashes while executing cthon lock test on vers=3 in a loop
Summary: nfs-ganesha: nfs-ganesha process crashes while executing cthon lock test on v...
Keywords:
Status: CLOSED DUPLICATE of bug 1257957
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Soumya Koduri
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1244792
TreeView+ depends on / blocked
 
Reported: 2015-07-14 07:09 UTC by Saurabh
Modified: 2016-01-19 06:14 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1244792 (view as bug list)
Environment:
Last Closed: 2015-07-20 13:21:15 UTC
Embargoed:


Attachments (Terms of Use)
nfs11 nfs-ganesha coredump (2.78 MB, application/x-xz)
2015-07-14 07:25 UTC, Saurabh
no flags Details

Description Saurabh 2015-07-14 07:09:55 UTC
Description of problem:
I executed the cthon lock test on vers=3, the nfs-ganesha crashed.
The volume is also having acls enabled

[root@nfs11 ~]# gluster volume status vol4
Status of volume: vol4
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.8:/rhs/brick1/d1r14          49156     0          Y       16720
Brick 10.70.46.27:/rhs/brick1/d1r24         49155     0          Y       31744
Brick 10.70.46.25:/rhs/brick1/d2r14         49157     0          Y       30081
Brick 10.70.46.29:/rhs/brick1/d2r24         49156     0          Y       22951
Brick 10.70.46.8:/rhs/brick1/d3r14          49157     0          Y       16738
Brick 10.70.46.27:/rhs/brick1/d3r24         49156     0          Y       31762
Brick 10.70.46.25:/rhs/brick1/d4r14         49158     0          Y       30099
Brick 10.70.46.29:/rhs/brick1/d4r24         49157     0          Y       22969
Brick 10.70.46.8:/rhs/brick1/d5r14          49158     0          Y       16756
Brick 10.70.46.27:/rhs/brick1/d5r24         49157     0          Y       31780
Brick 10.70.46.25:/rhs/brick1/d6r14         49159     0          Y       30117
Brick 10.70.46.29:/rhs/brick1/d6r24         49158     0          Y       22987
Self-heal Daemon on localhost               N/A       N/A        Y       10581
Quota Daemon on localhost                   N/A       N/A        Y       22205
Self-heal Daemon on 10.70.46.25             N/A       N/A        Y       21878
Quota Daemon on 10.70.46.25                 N/A       N/A        Y       31886
Self-heal Daemon on 10.70.46.27             N/A       N/A        Y       24236
Quota Daemon on 10.70.46.27                 N/A       N/A        Y       2719 
Self-heal Daemon on 10.70.46.29             N/A       N/A        Y       14763
Quota Daemon on 10.70.46.29                 N/A       N/A        Y       26234
Self-heal Daemon on 10.70.46.22             N/A       N/A        Y       1465 
Quota Daemon on 10.70.46.22                 N/A       N/A        Y       15541
Self-heal Daemon on 10.70.46.39             N/A       N/A        Y       20442
Quota Daemon on 10.70.46.39                 N/A       N/A        Y       1841 
 
Task Status of Volume vol4
------------------------------------------------------------------------------
There are no active volume tasks


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-9.el6rhs.x86_64
nfs-ganesha-2.2.0-5.el6rhs.x86_64

How reproducible:
happened for first time

Steps to Reproduce:
1. create a volume of 6x2 type, start it
2. configure nfs-ganesha, enable acls for the volume.
3. execute cthon lock test for vers=3, use the below command.
time ./server -l -o vers=3 -p /vol4 -m /mnt -N 3 <host-IP>

Actual results:
the -N 3, helps us to run the lock three time in a loop, first two time it has passed but third time during an UNLOCK operation it had failed as nfs-ganesha process crashed,
(gdb) bt
#0  0x00000000004939fa in nlm_send_async ()
#1  0x00000000004950dc in nlm4_send_grant_msg ()
#2  0x000000000049f1d8 in state_async_func_caller ()
#3  0x000000000050d836 in fridgethr_start_routine ()
#4  0x0000003e96a07a51 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003e966e896d in clone () from /lib64/libc.so.6


PS:- there was not failover triggerd.

Expected results:
cthon lock with vers=3 should be a pass.

Additional info:

Comment 2 Saurabh 2015-07-14 07:18:18 UTC
issue is seen even if acls are disabled,

Comment 3 Saurabh 2015-07-14 07:25:41 UTC
Created attachment 1051659 [details]
nfs11 nfs-ganesha coredump

Comment 4 Soumya Koduri 2015-07-14 10:01:02 UTC
I ran the cthon lock tests on two machines for about 10 times in a loop.
Haven't seen the crash 

[root@clus1 ~]# showmount -e localhost
Export list for localhost:
/vol1 (everyone)
/vol2 (everyone)
[root@clus1 ~]# 
[root@clus1 ~]# getenforce
Enforcing
[root@clus1 ~]# 

[root@dhcp42-219 cthon04]# ./server -l -p /vol1 -m /tmp/mnt 10.70.42.141 -N 10
...........
.........
	Parent: Truncated testfile.
	Parent: Truncated testfile.
	Parent: Truncated testfile.
	Parent: Truncated testfile.
	Parent: Truncated testfile.
	Parent: Wrote and read 256 KB file 10 times; [7420.29 +/- 51.49 KB/s].
	Parent: 14.1  - F_ULOCK [               0,          ENDING] PASSED.

Test #15 - Test 2nd open and I/O after lock and close.
	Parent: Second open succeeded.
	Parent: 15.0  - F_LOCK  [               0,          ENDING] PASSED.
	Parent: 15.1  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: Closed testfile.
	Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ].
	Parent: Read 'abcdefghij' from testfile [ 0, 11 ].
	Parent: 15.2  - COMPARE [               0,               b] PASSED.

** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).

**  CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
Congratulations, you passed the locking tests!

All tests completed


Request setup from QE to reproduce and further analyse the bug. Meanwhile trying to get all the required debuginfo packages to analyse the core.

Comment 6 Soumya Koduri 2015-07-14 13:46:05 UTC
Tried below tests on QE tests - 
* ran cthon tests for about 6 times on 2 different volumes  using two different clients multiple times
* restarted nfs-ganesha, statd and other services in between.

But unable to reproduce the issue.

Comment 10 Soumya Koduri 2015-07-20 13:21:15 UTC
Closing this bug as this issue doesn't seem reproducible. We shall check if nfs-ganesha debuginfo package is appropriate in a separate bug.

Comment 11 Niels de Vos 2015-09-03 15:50:12 UTC
Bug 1244792 is used for tracking and fixing the debuginfo problem.

*** This bug has been marked as a duplicate of bug 1257957 ***


Note You need to log in before you can comment on or make changes to this bug.