Bug 1369674 - Ganesha process crashes on all the nodes with segfault error with latest libntirpc packages
Summary: Ganesha process crashes on all the nodes with segfault error with latest libn...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: nfs-ganesha
Classification: Retired
Component: ntirpc
Version: 2.4
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Matt Benjamin
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-24 06:36 UTC by Shashank Raj
Modified: 2016-11-08 03:52 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-01 13:35:02 UTC


Attachments (Terms of Use)

Description Shashank Raj 2016-08-24 06:36:22 UTC
Description of problem:

Ganesha process crashes on all the nodes with segfault error with latest libntirpc packages, while enabling nfs-ganesha on the cluster.

Version-Release number of selected component (if applicable):

[root@dhcp43-116 ~]# rpm -qa|grep ganesha
nfs-ganesha-next.20160813.2f47e8a-1.el7.centos.x86_64
glusterfs-ganesha-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
nfs-ganesha-gluster-next.20160813.2f47e8a-1.el7.centos.x86_64

[root@dhcp43-116 ~]# rpm -qa|grep libntirpc
libntirpc-duplex13.20160726.d375195-1.el7.centos.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Install the latest nfs-ganesha and libntirpc packages from the below locations:

http://artifacts.ci.centos.org/nfs-ganesha/nightly/next/7/x86_64/x86_64/
http://artifacts.ci.centos.org/nfs-ganesha/nightly/libntirpc/duplex13/7/x86_64/

2.Try to setup ganesha on the cluster
3.Observe that after enabling ganesha on the cluster, ganesha process on all the nodes crashes with segfault error

[root@dhcp43-116 ~]# gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha : success 


[root@dhcp43-116 ~]# service nfs-ganesha status
Redirecting to /bin/systemctl status  nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2016-08-24 17:14:34 IST; 4min 18s ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
 Main PID: 9943 (code=killed, signal=SEGV)

Aug 24 17:14:34 dhcp43-116.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganes...
Aug 24 17:14:34 dhcp43-116.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganesh...
Aug 24 17:14:34 dhcp43-116.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servic...
Aug 24 17:14:34 dhcp43-116.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.s...
Aug 24 17:14:34 dhcp43-116.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servic...
Hint: Some lines were ellipsized, use -l to show in full.


Following segfault error messages are seen on all the nodes:

[592907.834912] ganesha.nfsd[9943]: segfault at 36 ip 00007f0e91926783 sp 00007ffcc871c810 error 6 in libntirpc.so.1.4.0[7f0e91913000+3f000]

[592900.525640] ganesha.nfsd[6192]: segfault at 36 ip 00007f723fbee783 sp 00007ffce0a17150 error 6 in libntirpc.so.1.4.0[7f723fbdb000+3f000]

[592894.520603] ganesha.nfsd[30173]: segfault at 36 ip 00007fd4302fc783 sp 00007ffeae644a70 error 6 in libntirpc.so.1.4.0[7fd4302e9000+3f000]

[582583.186948] ganesha.nfsd[31148]: segfault at 36 ip 00007fe01f583783 sp 00007fff8e71f350 error 6 in libntirpc.so.1.4.0[7fe01f570000+3f000]


pcs status:

[root@dhcp43-116 ~]# pcs status
Cluster name: G1471855029.47
Last updated: Wed Aug 24 17:21:11 2016		Last change: Wed Aug 24 17:16:28 2016 by root via cibadmin on dhcp43-116.lab.eng.blr.redhat.com
Stack: corosync
Current DC: dhcp43-88.lab.eng.blr.redhat.com (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
4 nodes and 16 resources configured

Online: [ dhcp42-237.lab.eng.blr.redhat.com dhcp42-47.lab.eng.blr.redhat.com dhcp43-116.lab.eng.blr.redhat.com dhcp43-88.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp42-237.lab.eng.blr.redhat.com dhcp42-47.lab.eng.blr.redhat.com dhcp43-116.lab.eng.blr.redhat.com dhcp43-88.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp42-237.lab.eng.blr.redhat.com dhcp42-47.lab.eng.blr.redhat.com dhcp43-116.lab.eng.blr.redhat.com dhcp43-88.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Stopped: [ dhcp42-237.lab.eng.blr.redhat.com dhcp42-47.lab.eng.blr.redhat.com dhcp43-116.lab.eng.blr.redhat.com dhcp43-88.lab.eng.blr.redhat.com ]
 dhcp43-116.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
 dhcp43-88.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
 dhcp42-47.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
 dhcp42-237.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped

Failed Actions:
* nfs-grace_monitor_0 on dhcp43-88.lab.eng.blr.redhat.com 'unknown error' (1): call=18, status=complete, exitreason='none',
    last-rc-change='Wed Aug 24 17:15:51 2016', queued=0ms, exec=336ms
* nfs-grace_monitor_0 on dhcp42-47.lab.eng.blr.redhat.com 'unknown error' (1): call=18, status=complete, exitreason='none',
    last-rc-change='Wed Aug 24 17:15:51 2016', queued=0ms, exec=354ms
* nfs-grace_monitor_0 on dhcp43-116.lab.eng.blr.redhat.com 'unknown error' (1): call=18, status=complete, exitreason='none',
    last-rc-change='Wed Aug 24 17:15:51 2016', queued=0ms, exec=322ms
* nfs-grace_monitor_0 on dhcp42-237.lab.eng.blr.redhat.com 'unknown error' (1): call=18, status=complete, exitreason='none',
    last-rc-change='Wed Aug 24 17:15:51 2016', queued=0ms, exec=365ms


PCSD Status:
  dhcp43-116.lab.eng.blr.redhat.com: Online
  dhcp43-88.lab.eng.blr.redhat.com: Online
  dhcp42-47.lab.eng.blr.redhat.com: Online
  dhcp42-237.lab.eng.blr.redhat.com: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled
[root@dhcp43-116 ~]# 

Actual results:

Ganesha process crashes on all the nodes with segfault error with latest libntirpc packages

Expected results:

there should not be any crashes

Additional info:

No core generated. sosreport and ganesha logs will be attached

Comment 1 Shashank Raj 2016-08-24 06:47:57 UTC
sosreport and logs can be found under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1369674

Comment 2 Shashank Raj 2016-09-01 13:28:03 UTC
This issue is not seen with latest nfs-ganesha and libntirpc packages:

[root@dhcp43-116 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-next.20160827.7641daf-1.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64
nfs-ganesha-debuginfo-next.20160827.7641daf-1.el7.centos.x86_64
nfs-ganesha-next.20160827.7641daf-1.el7.centos.x86_64

[root@dhcp43-116 ~]# rpm -qa|grep libntirpc
libntirpc-duplex13.20160825.d375195-1.el7.centos.x86_64

Can be closed. Will reopen, if i hit it again.

Comment 3 Soumya Koduri 2016-09-01 13:35:02 UTC
Thanks Shashank. Closing this bug.


Note You need to log in before you can comment on or make changes to this bug.