Bug 1128421 - gluster nfs server process was crashed multiple time while mounting volume and starting volume using force option [NEEDINFO]
Summary: gluster nfs server process was crashed multiple time while mounting volume an...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nfs
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Brad Hubbard
QA Contact:
URL:
Whiteboard:
Depends On: 1196520
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-10 11:34 UTC by Rachana Patel
Modified: 2015-08-10 07:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1196520 (view as bug list)
Environment:
Last Closed: 2015-08-10 07:43:27 UTC
Target Upstream Version:
ndevos: needinfo? (saujain)


Attachments (Terms of Use)

Description Rachana Patel 2014-08-10 11:34:05 UTC
Description of problem:
=======================
while mounting volume using nfs it was hung and found nfs server process was crashed and core was generated.
after that killed some brick process and during start force of that volume foun dcrash again


Version-Release number of selected component (if applicable):
=============================================================
3.4.0.59rhs-1.2.toyota.hotfix.el6rhs.x86_64

How reproducible:
=================
not always but got multiple times

Steps to Reproduce:
===================
-->start volume with 'start force' option 
--. mount volume using nfs

Actual results:
===============
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003cc180f807 in ?? () from /lib64/libgcc_s.so.1
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 libcom_err-1.41.12-14.el6_4.4.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.1e-16.el6_5.4.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0000003cc180f807 in ?? () from /lib64/libgcc_s.so.1
#1  0x0000003cc18100b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x0000003cbf8fe93e in backtrace () from /lib64/libc.so.6
#3  0x00007f489a50585f in gf_print_trace (signum=11, ctx=0x233a010) at common-utils.c:588
#4  <signal handler called>
#5  0x00007f4894fda635 in ?? ()
#6  0x00007f488b5c7700 in ?? ()
#7  0x0000003cbfc07851 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003cbf8e894d in clone () from /lib64/libc.so.6


nfs log snippet:-
[2014-08-09 03:25:04.297623] E [nfs.c:341:nfs_init_versions] 0-nfs: Program  NLM4 registration failed
[2014-08-09 03:25:04.297642] E [nfs.c:1327:init] 0-nfs: Failed to initialize protocols
[2014-08-09 03:25:04.297654] E [xlator.c:423:xlator_init] 0-nfs-server: Initialization of volume 'nfs-server' failed, review your volfile again
[2014-08-09 03:25:04.297684] E [graph.c:292:glusterfs_graph_init] 0-nfs-server: initializing translator failed
[2014-08-09 03:25:04.297698] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
pending frames:
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-08-09 03:25:04configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.59rhs
[2014-08-09 03:31:32.264927] I [glusterfsd.c:2026:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0.59rhs (/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/6c17a0c3d4fe4a812fb26d22741782c8.socket)



Expected results:
================


Additional info:
================

Comment 4 santosh pradhan 2014-08-13 08:59:12 UTC

1) The root cause:
The NLM was not able to register with portmapper which prohibited NFS to start. 

(Log snippet):

[2014-08-09 10:44:48.216245] E [rpcsvc.c:1260:rpcsvc_program_register_portmap] 0-rpc-service: Could not register with portmap
[2014-08-09 10:44:48.216278] E [nfs.c:341:nfs_init_versions] 0-nfs: Program  NLM4 registration failed
[2014-08-09 10:44:48.216291] E [nfs.c:1327:init] 0-nfs: Failed to initialize protocols

2) I am not able to repro the issue with several attempts. 

Hence, I am closing the bug for now. Please feel free to reopen if you see it again. But please check why NLM or (ACL or MOUNT or NFS) fails to register with portmapper (without which NFS cant work). 

Thanks,
Santosh

Comment 5 Brad Hubbard 2015-02-28 09:49:16 UTC
Reopening this as I can reproduce it with the reproducer in bz1196520. See comments 6,7 and 8.

[2015-02-28 09:35:51.873805] I [socket.c:3537:socket_init] 0-socket.NLM: using system polling thread
[2015-02-28 09:35:51.900285] E [nfs.c:341:nfs_init_versions] 0-nfs: Program  NLM4 registration failed
[2015-02-28 09:35:51.900347] E [nfs.c:1327:init] 0-nfs: Failed to initialize protocols
[2015-02-28 09:35:51.900367] E [xlator.c:423:xlator_init] 0-nfs-server: Initialization of volume 'nfs-server' failed, review your volfile again
[2015-02-28 09:35:51.900386] E [graph.c:292:glusterfs_graph_init] 0-nfs-server: initializing translator failed
[2015-02-28 09:35:51.900404] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
pending frames:
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2015-02-28 09:35:51configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.72rhs

Comment 11 nchilaka 2015-07-20 12:46:13 UTC
I did the following as part of QE validation for the fix:
1)had a 3 node cluster A,B,C
2)had a client
3)now created a volume with 2 bricks one each on node A and B(distribute only)
4)then started volume
5)killed one nfs process of one of the nodes(A) and one brick process(B)
6)Now did a force restart and mounted the volume on NodeA and Node C using NFs and on node C using fuse. Mount was successful without any crash

Also, did the following
1-4 same as above
5)Mounted volume on nfs client using nodeA server IP and fuse mount using C
6)killed brick and nfs process of node A
7)nfs mount point was not responding due to A nfs process down, fuse was responding
8)nfs mounted using node C, worked fine
9)restarted the volume using force
10)the mount point using nodeA was stuck for some time about say 3-4 min, but then started to respond


Did following on dist-rep volume
1-4 as above
5)mounted using nfs of node A
6)kept appending a file 
7)killed nfs process of A and B and brick of A
8)the writes stopped(append stopped)
9)mounted using fuse using C IP
10)saw contents of file, append from nfs mount of A had stopped as expected 
11)did a force start
12)mounted using nfs mount from node C on client and saw that the append started to continue from where it stopped and both node A and C mounts were responding



hence moving the bug to verified


Server version:
[root@nchilaka-nfsv3-6 yum.repos.d]# rpm -qa|grep gluster
gluster-nagios-common-0.2.0-1.el6rhs.noarch
glusterfs-3.7.1-9.el6rhs.x86_64
glusterfs-cli-3.7.1-9.el6rhs.x86_64
gluster-nagios-addons-0.2.4-4.el6rhs.x86_64
glusterfs-libs-3.7.1-9.el6rhs.x86_64
glusterfs-client-xlators-3.7.1-9.el6rhs.x86_64
glusterfs-api-3.7.1-9.el6rhs.x86_64
glusterfs-server-3.7.1-9.el6rhs.x86_64
glusterfs-rdma-3.7.1-9.el6rhs.x86_64
vdsm-gluster-4.16.20-1.2.el6rhs.noarch
python-gluster-3.7.1-8.el6rhs.x86_64
glusterfs-fuse-3.7.1-9.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-9.el6rhs.x86_64
[root@nchilaka-nfsv3-6 yum.repos.d]# cat /etc/redhat-*
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Red Hat Gluster Storage Server 3.1
[root@nchilaka-nfsv3-6 yum.repos.d]# gluster --version
glusterfs 3.7.1 built on Jul 12 2015 22:27:42
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@nchilaka-nfsv3-6 yum.repos.d]# 



NFS client version:
[root@nchilaka-nfs-client-6 distrep]# cat /etc/redhat-*
cat: /etc/redhat-access-insights: Is a directory
cat: /etc/redhat-lsb: Is a directory
Red Hat Enterprise Linux Server release 6.7 (Santiago)
[root@nchilaka-nfs-client-6 distrep]# rpm -qa|grep gluster



fuse mount client version:
[root@nchilaka-fuse-client-6 distrep]# cat /etc/redhat-
cat: /etc/redhat-: No such file or directory
[root@nchilaka-fuse-client-6 distrep]# cat /etc/redhat-*
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Red Hat Gluster Storage Server 3.1
[root@nchilaka-fuse-client-6 distrep]# rpm -qa|grep gluster
gluster-nagios-common-0.2.0-1.el6rhs.noarch
gluster-nagios-addons-0.2.4-4.el6rhs.x86_64
glusterfs-3.7.1-11.el6rhs.x86_64
glusterfs-fuse-3.7.1-11.el6rhs.x86_64
glusterfs-devel-3.7.1-11.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-11.el6rhs.x86_64
python-gluster-3.7.1-9.el6rhs.x86_64
glusterfs-libs-3.7.1-11.el6rhs.x86_64
glusterfs-client-xlators-3.7.1-11.el6rhs.x86_64
glusterfs-cli-3.7.1-11.el6rhs.x86_64
glusterfs-api-devel-3.7.1-11.el6rhs.x86_64
glusterfs-rdma-3.7.1-11.el6rhs.x86_64
vdsm-gluster-4.16.20-1.2.el6rhs.noarch
glusterfs-api-3.7.1-11.el6rhs.x86_64
glusterfs-server-3.7.1-11.el6rhs.x86_64
[root@nchilaka-fuse-client-6 distrep]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:11:06:C8  
          inet addr:10.70.43.157  Bcast:10.70.43.255  Mask:255.255.252.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:151875 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3857 errors:0 dropped:0 overruns:0 carrier:0


Note You need to log in before you can comment on or make changes to this bug.