982181 – DHT: NFS process crashed on a node in a cluster when another storage node in the cluster went offline

Bug 982181 - DHT: NFS process crashed on a node in a cluster when another storage node in the cluster went offline

Summary: DHT: NFS process crashed on a node in a cluster when another storage node in ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Amar Tumballi
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	983431 1139996
TreeView+	depends on / blocked

Reported:	2013-07-08 10:54 UTC by spandura
Modified:	2014-09-10 07:05 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.4.0.12rhs.beta5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	983431 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:35:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
SOS Reports, BT of core, commands execution history on nodes (7.75 MB, application/x-gzip) 2013-07-08 11:42 UTC, spandura	no flags	Details
View All

Description spandura 2013-07-08 10:54:17 UTC

Description of problem:
=======================
In a 4 node cluster with 1 x 2 replicate volume (created using bricks on 2 storage nodes) when one of the storage node  having one brick in the replicate volume goes offline, nfs server on the other storage node which is not part of the replicate volume crashed .

Version-Release number of selected component (if applicable):
============================================================
root@king [Jul-08-2013-16:05:04] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18

root@king [Jul-08-2013-16:05:07] >rpm -qa | grep glusterfs-server


How reproducible:


Steps to Reproduce:
====================
1. Have a cluster with 4 storage nodes. { node1, node2, node3, node4 }

2. Create a 1 x 2 replicate volume with bricks on 2 storage nodes{node1 , node2}. Start the volume. Set self-heal-daemon to off. 

3. Create nfs mount with node3 as server. 

4. Bring down node2 by killing all the gluster process : { killall glusterfs ; killall glusterfsd ; killall glusterd }

5. Create files and directories from the mount point. 

6. Bring online node2 by restarting glusterd : "service glusterd start"

7. Bring down node1 by killing all the gluster process : { killall glusterfs ; killall glusterfsd ; killall glusterd }

Actual results:
=================
NFS server process on Node3 got crashed. 

Expected results:
================
NFS server process should not crash on node3. 

Additional info:
==================

root@king [Jul-08-2013-14:50:58] >gluster v info vol_rep
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 57495d90-1797-43fd-86c0-496333cab217
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: king:/rhs/brick1/rep_brick0
Brick2: hicks:/rhs/brick1/rep_brick1

root@king [Jul-08-2013-12:39:54] >gluster peer status
Number of Peers: 3

Hostname: hicks
Uuid: 894e73be-7db0-42fb-a62c-80545e4aeb15
State: Peer in Cluster (Connected)

Hostname: luigi
Uuid: c8d38d47-c483-4e10-994f-95e28424b1d8
State: Peer in Cluster (Connected)

Hostname: lizzie
Uuid: 64fc3606-0c67-448a-8003-1597f150123f
State: Peer in Cluster (Connected)

Comment 2 spandura 2013-07-08 11:42:36 UTC

Created attachment 770453 [details]
SOS Reports, BT of core, commands execution history on nodes

Comment 3 santosh pradhan 2013-07-08 13:43:35 UTC


- Could you just clarify: What is the need of 4 node cluster for 1x2 dist-rep vol?

- Is core file included in the attachment?

Comment 4 spandura 2013-07-11 06:00:42 UTC

Necessary information has been provided to Santosh. Hence removing the need info flag .

Comment 5 santosh pradhan 2013-07-11 07:14:53 UTC

Thanks Shwetha for sharing the m/c, had a quick look at the core file.

The stack trace (from core) looks like:

#0  _gf_log (domain=0x1cae510 "vol_rep-replicate-0", file=0x7f85fe9aec3f "afr-common.c", function=0x7f85fe9b1350 "afr_get_call_child", line=745, level=GF_LOG_DEBUG, 
    fmt=0x7f85fe9af268 "Returning %d, call_child: %d, last_index: %d") at logging.c:442
#1  0x00007f85fe99d012 in afr_get_call_child (this=0x1caf160, child_up=0x26248800 "", read_child=<value optimized out>, fresh_children=<value optimized out>, call_child=<value optimized out>, 
    last_index=0x26247970) at afr-common.c:744
#2  0x00007f85fe96d9bf in afr_access (frame=0x262463cc, this=0x1caf160, loc=0x7f85f6afafec, mask=7, xdata=0x0) at afr-inode-read.c:136
#3  0x00007f85fe73fe6e in dht_access_cbk (frame=0x7f8602255434, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, 
    xdata=<value optimized out>) at dht-inode-read.c:516

.....
.....
.....
.....

#130898 0x00007f85fe96dbda in afr_access (frame=0x7f8602255184, this=<value optimized out>, loc=<value optimized out>, mask=<value optimized out>, xdata=0x0) at afr-inode-read.c:157
#130899 0x00007f85fe73fe6e in dht_access_cbk (frame=0x7f8602255434, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, xdata=<value optimized out>) at dht-inode-read.c:516
#130900 0x00007f85fe96dbda in afr_access (frame=0x7f86022554e0, this=<value optimized out>, loc=<value optimized out>, mask=<value optimized out>, xdata=0x0) at afr-inode-read.c:157
#130901 0x00007f85fe73fe6e in dht_access_cbk (frame=0x7f8602255434, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, xdata=<value optimized out>) at dht-inode-read.c:516
#130902 0x00007f85fe96dbda in afr_access (frame=0x7f86022556e4, this=<value optimized out>, loc=<value optimized out>, mask=<value optimized out>, xdata=0x0) at afr-inode-read.c:157
#130903 0x00007f85fe73fe6e in dht_access_cbk (frame=0x7f8602255434, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, xdata=<value optimized out>) at dht-inode-read.c:516
#130904 0x00007f85fe96dbda in afr_access (frame=0x7f8602255388, this=<value optimized out>, loc=<value optimized out>, mask=<value optimized out>, xdata=0x0) at afr-inode-read.c:157
#130905 0x00007f85fe73fe6e in dht_access_cbk (frame=0x7f8602255434, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, xdata=<value optimized out>) at dht-inode-read.c:516
#130906 0x00007f85fe96dbda in afr_access (frame=0x7f8602255638, this=<value optimized out>, loc=<value optimized out>, mask=<value optimized out>, xdata=0x0) at afr-inode-read.c:157
#130907 0x00007f85fe73d44d in dht_access (frame=0x7f8602255434, this=<value optimized out>, loc=0x7f85f7678414, mask=7, xdata=0x0) at dht-inode-read.c:593
#130908 0x00007f860340e228 in default_access (frame=0x7f8602255434, this=0x1cb12f0, loc=0x7f85f7678414, mask=7, xdata=<value optimized out>) at defaults.c:1066
#130909 0x00007f85fe2e91b4 in io_stats_access (frame=0x7f86022558e8, this=0x1cb2580, loc=0x7f85f7678414, mask=7, xdata=0x0) at io-stats.c:2344
#130910 0x00007f85fe0a6a31 in nfs_fop_access (nfsx=<value optimized out>, xl=0x1cb2580, nfu=<value optimized out>, loc=0x7f85f7678414, accesstest=<value optimized out>, cbk=<value optimized out>, local=0x7f85f767802c) at nfs-fops.c:467
#130911 0x00007f85fe0b9efb in nfs3_access_resume (carg=0x7f85f767802c) at nfs3.c:1545
#130912 0x00007f85fe0c0f0b in nfs3_fh_resolve_inode_done (cs=0x7f85f767802c, inode=<value optimized out>) at nfs3-helpers.c:3544
#130913 0x00007f85fe0c0fdb in nfs3_fh_resolve_inode (cs=0x7f85f767802c) at nfs3-helpers.c:3735
#130914 0x00007f85fe0c1085 in nfs3_fh_resolve_resume (cs=0x7f85f767802c) at nfs3-helpers.c:3767
#130915 0x00007f85fe0c1198 in nfs3_fh_resolve_root (cs=0x7f85f767802c) at nfs3-helpers.c:3821
#130916 0x00007f85fe0b9a61 in nfs3_access (req=0x7f85fdd1002c, fh=0x7fffd1d573b0, accbits=31) at nfs3.c:1582
#130917 0x00007f85fe0b9d95 in nfs3svc_access (req=0x7f85fdd1002c) at nfs3.c:1616
#130918 0x00007f86031e6655 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x1d39930) at rpcsvc.c:549
#130919 0x00007f86031e67a3 in rpcsvc_notify (trans=0x1d2b8d0, mydata=<value optimized out>, event=<value optimized out>, data=0x1d39930) at rpcsvc.c:643
#130920 0x00007f86031e78d8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:497
#130921 0x00007f85ffa1dbd6 in socket_event_poll_in (this=0x1d2b8d0) at socket.c:2119
#130922 0x00007f85ffa1f4ed in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1d2b8d0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2231
#130923 0x00007f860344d407 in event_dispatch_epoll_handler (event_pool=0x1c89eb0) at event-epoll.c:384
#130924 event_dispatch_epoll (event_pool=0x1c89eb0) at event-epoll.c:445
#130925 0x00000000004067c6 in main (argc=11, argv=0x7fffd1d58ba8) at glusterfsd.c:1962


The number of frames ( 131-k frames) in the stack explains itself i.e. STACK overflow.

Something goes wrong in afr_access() or dht_access_cbk() they ended up in a loop or so. afr_access() is failing somewhere (ret val < 0) and does a stack unwind (AFR_STACK_UNWIND()) but dht_access_cbk() again repeats.

I dont know much in AFR/DHT, may be Pranith or Shishir will be able to help.

Does not look like a NFS issue. 

Thanks,
Santosh

Comment 7 spandura 2013-08-05 11:37:31 UTC

Verified the fix on the build:
============================
glusterfs 3.4.0.15rhs built on Aug  4 2013 22:34:17


Bug is fixed.

Comment 8 Scott Haines 2013-09-23 22:35:42 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.