Bug 1139812

Summary:	DHT: Rebalance process crash after add-brick and `rebalance start' operation
Product:	[Community] GlusterFS	Reporter:	Shyamsundar <srangana>
Component:	distribute	Assignee:	Shyamsundar <srangana>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, gluster-bugs, saujain, sdharane, srangana, ssamanta, surs, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.0	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1139213
Clones:	1140549 1142409 1151343 1151397 (view as bug list)		Environment:
Last Closed:	2015-05-14 17:27:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1139213, 1140549, 1142409, 1142654, 1151343, 1151397

Description Shyamsundar 2014-09-09 16:59:08 UTC

+++ This bug was initially created as a clone of Bug #1139213 +++

Description of problem:

rebalance process crashed upon adding a brick and starting the volume.

How reproducible:

Always.

Steps to Reproduce:
1. Run multiple rename operations on the mount point.
2. Add a replica pair to the volume.
3. Start rebalance on the volume.

Actual results:

Rebalance fails and process crashes.


Additional info:

[2014-09-08 10:27:52.734274] E [MSGID: 109019] [dht-common.c:2780:dht_getxattr] 0-t0-dht: Layout is NULL
[2014-09-08 10:27:52.734300] E [MSGID: 109023] [dht-rebalance.c:1520:gf_defrag_migrate_data] 0-t0-dht: M
igrate file failed:Failed to get node-uuid for /scratch/scratch/renameracegvJKY2
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-09-08 10:27:52
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.28
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x349241ff06]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x349243a59f]
/lib64/libc.so.6[0x3392e329a0]
/usr/lib64/glusterfs/3.6.0.28/xlator/protocol/client.so(client3_3_lookup_cbk+0x89)[0x7f67f0bd2ca9]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x3492c0e9c5]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)[0x3492c0fe4f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x3492c0b668]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0x9275)[0x7f67f1015275]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0xac5d)[0x7f67f1016c5d]
/usr/lib64/libglusterfs.so.0[0x3492476367]
/usr/sbin/glusterfs(main+0x603)[0x407e93]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3392e1ed1d]
/usr/sbin/glusterfs[0x4049a9]


=======

(gdb) p *frame
$1 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67fa09c1c4, 
  next = 0x7f67fa09ba60, 
  prev = 0x7f67f9ed050c, 
  local = 0x0, 
  this = 0x22020b0, 
  ret = 0x7f67f0993ee0 <afr_lookup_cbk>, 
  ref_count = 0, 
  lock = 1, 
  cookie = 0x1, 
  complete = _gf_true, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x7f67f099f3d1 "afr_lookup", 
  wind_to = 0x7f67f0998528 "priv->children[i]->fops->lookup", 
  unwind_from = 0x7f67f0be6800 "client3_3_lookup_cbk", 
  unwind_to = 0x7f67f099d3a7 "afr_lookup_cbk"
}

(gdb) p *frame->next
$2 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67fa09c1c4, 
  next = 0x7f67fa09c1c4, 
  prev = 0x7f67fa09b454, 
  local = 0x0, 
  this = 0x2201180, 
  ret = 0x7f67f0993ee0 <afr_lookup_cbk>, 
  ref_count = 0, 
  lock = 1, 
  cookie = 0x0, 
  complete = _gf_true, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x7f67f099f3d1 "afr_lookup", 
  wind_to = 0x7f67f0998528 "priv->children[i]->fops->lookup", 
  unwind_from = 0x7f67f0be6800 "client3_3_lookup_cbk", 
  unwind_to = 0x7f67f099d3a7 "afr_lookup_cbk"
}

(gdb) p *frame->next->next
$3 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67fa09b3a8, 
  next = 0x7f67fa09b3a8, 
  prev = 0x7f67fa09ba60, 
  local = 0x7f67eb18c04c, 
  this = 0x2204ee0, 
  ret = 0x7f67f0711c20 <dht_lookup_cbk>, 
  ref_count = 0, 
  lock = 1, 
  cookie = 0x7f67fa09c1c4, 
  complete = _gf_false, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x7f67f072ac67 "dht_lookup", 
  wind_to = 0x7f67f0728a5a "hashed_subvol->fops->lookup", 
  unwind_from = 0x0, 
  unwind_to = 0x7f67f0728a76 "dht_lookup_cbk"
}

(gdb) p *frame->next->next->next
$4 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67f9ed050c, 
  next = 0x0, 
  prev = 0x7f67fa09c1c4, 
  local = 0x7f67eb6b304c, 
  this = 0x2205b30, 
  ret = 0x349245d0e0 <syncop_lookup_cbk>, 
  ref_count = 1, 
  lock = 1, 
  cookie = 0x7f67e9c12d10, 
  complete = _gf_false, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x3492483794 "syncop_lookup", 
  wind_to = 0x349248310a "subvol->fops->lookup", 
  unwind_from = 0x0, 
  unwind_to = 0x349248311f "syncop_lookup_cbk"
}

(gdb) bt
#0  client3_3_lookup_cbk (req=0x7f67ea660298, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7f67fa09b454) at client-rpc-fops.c:2710
#1  0x0000003492c0e9c5 in rpc_clnt_handle_reply (clnt=0x22ecd70, pollin=0x21f6360) at rpc-clnt.c:763
#2  0x0000003492c0fe4f in rpc_clnt_notify (trans=<value optimized out>, mydata=0x22ecda0, 
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:891
#3  0x0000003492c0b668 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:512
#4  0x00007f67f1015275 in socket_event_poll_in (this=0x22fc900) at socket.c:2127
#5  0x00007f67f1016c5d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, 
    data=0x22fc900, poll_in=1, poll_out=0, poll_err=0) at socket.c:2240
#6  0x0000003492476367 in event_dispatch_epoll_handler (event_pool=0x21d5d00) at event-epoll.c:384
#7  event_dispatch_epoll (event_pool=0x21d5d00) at event-epoll.c:445
#8  0x0000000000407e93 in main (argc=31, argv=0x7fff82c05998) at glusterfsd.c:2023
(gdb)

--- Additional comment from Saurabh on 2014-09-09 07:20:45 EDT ---

I also found the same issue while executing remove-brick operation and no other operation executing on the mount-point.

[root@nfs4 ~]# gluster volume remove-brick dist-rep 10.70.37.156:/bricks/d1r1-add3 10.70.37.81:/bricks/d1r2-add3 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.37.156                0        0Bytes             0             0             0               failed               0.00
                             10.70.37.81                0        0Bytes             0             0             0               failed               0.00


(gdb) bt
#0  client3_3_lookup_cbk (req=0x7f07451fb4e4, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f075635bd50)
    at client-rpc-fops.c:2710
#1  0x000000391ca0e9c5 in rpc_clnt_handle_reply (clnt=0x2543ea0, pollin=0x270dec0) at rpc-clnt.c:763
#2  0x000000391ca0fe4f in rpc_clnt_notify (trans=<value optimized out>, mydata=0x2543ed0, event=<value optimized out>, data=<value optimized out>)
    at rpc-clnt.c:891
#3  0x000000391ca0b668 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:512
#4  0x00007f074d2d3275 in socket_event_poll_in (this=0x2553aa0) at socket.c:2127
#5  0x00007f074d2d4c5d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x2553aa0, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2240
#6  0x000000391c276367 in event_dispatch_epoll_handler (event_pool=0x22c04d0) at event-epoll.c:384
#7  event_dispatch_epoll (event_pool=0x22c04d0) at event-epoll.c:445
#8  0x0000000000407e93 in main (argc=31, argv=0x7ffffe0a22e8) at glusterfsd.c:2023

--- Additional comment from Shyamsundar on 2014-09-09 12:49:23 EDT ---

Root caused this as follows:

In dht_lookup_everywhere_done

Line: 1194 we call DHT_STACK_UNWIND
and in the same if condition we go ahead and call, goto
unwind_hashed_and_cached; which at Line 1371 calls another UNWIND.

Pointers to the problem:
- If you see in the cores, the ref count for the frames become -1 which
means double dereference of sorts.

From (gdb) f 2  if the frame, frame->parent, frame->parent->parent is
inspected, we can see this ref count.

- In all the cores, the statement that caused the UNWIND is occuring at
"unwind_hashed_and_cached:" at dht_lookup_everywhere_done, and we can
get here only from that loop as mentioned above.

So the problem is a double UNWIND in a particular code path that causes this core dump.

BTW, bug #1139535 has the same root cause.

Comment 1 Shyamsundar 2014-09-09 17:00:57 UTC

Another symptom of the same issue:
==================================

Description of problem:
I tried to resolve the gfid mismatch for a file and then did "ls" for that same file. The ls command hung and nfs server reported crash.

gluster volume info,
[root@nfs4 ~]# gluster volume info dist-rep
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 9a80c514-ddd4-45cb-b2b6-d745fd438db8
Status: Started
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.156:/bricks/d1r1
Brick2: 10.70.37.81:/bricks/d1r2
Brick3: 10.70.37.50:/bricks/d2r1
Brick4: 10.70.37.95:/bricks/d2r2
Brick5: 10.70.37.156:/bricks/d3r1
Brick6: 10.70.37.81:/bricks/d3r2
Brick7: 10.70.37.50:/bricks/d4r1
Brick8: 10.70.37.95:/bricks/d4r2
Brick9: 10.70.37.156:/bricks/d5r1
Brick10: 10.70.37.81:/bricks/d5r2
Brick11: 10.70.37.50:/bricks/d6r1
Brick12: 10.70.37.95:/bricks/d6r2
Brick13: 10.70.37.156:/bricks/d1r1-add3
Brick14: 10.70.37.81:/bricks/d1r2-add3
Options Reconfigured:
features.quota-deem-statfs: on
features.quota: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

procedure folllowed for overcoming gfid mismatch is this,

1. find the gfid on the replica pair using getfattr command.

2. if the gfid is different for the file. then on brick of the pair 
   a. first delete the gfid entry from .glusterfs/indices/xattrop dir
   b. second delete the file itself from the same brick where step 2a is executed

3. check the gfid again using getfattr. Found that the gfid now matches


after above the above steps I have done ls <filename> on the mount-point.

How reproducible:
seen this time

Actual results:

nfs.log,

signal received: 11
time of crash: 
2014-09-08 20:11:48
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.28
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x391c21ff06]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x391c23a59f]
/lib64/libc.so.6[0x3c6be326b0]
/usr/lib64/glusterfs/3.6.0.28/xlator/nfs/server.so(nfs_fop_lookup_cbk+0x225)[0x7fc08287f2e5]
/usr/lib64/glusterfs/3.6.0.28/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x113)[0x7fc082ad5e73]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lookup_everywhere_done+0x668)[0x7fc082f1cba8]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lookup_everywhere_cbk+0x403)[0x7fc082f215b3]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/replicate.so(afr_lookup_cbk+0x558)[0x7fc0831a4438]
/usr/lib64/glusterfs/3.6.0.28/xlator/protocol/client.so(client3_3_lookup_cbk+0x647)[0x7fc0833e3267]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x391ca0e9c5]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)[0x391ca0fe4f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x391ca0b668]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0x9275)[0x7fc08839e275]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0xac5d)[0x7fc08839fc5d]
/usr/lib64/libglusterfs.so.0[0x391c276367]
/usr/sbin/glusterfs(main+0x603)[0x407e93]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3c6be1ed5d]
/usr/sbin/glusterfs[0x4049a9]


(gdb) bt
#0  nfs_fop_lookup_cbk (frame=0x7fc09085b740, cookie=0x1f38790, this=<value optimized out>, op_ret=<value optimized out>, 
    op_errno=<value optimized out>, inode=<value optimized out>, buf=0x7fc07fd7916c, xattr=0x2919ecc, postparent=0x7fc07fd7939c) at nfs-fops.c:427
#1  0x00007fc082ad5e73 in io_stats_lookup_cbk (frame=0x7fc090a2635c, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=0, 
    inode=0x0, buf=0x7fc07fd7916c, xdata=0x2919ecc, postparent=0x7fc07fd7939c) at io-stats.c:1505
#2  0x00007fc082f1cba8 in dht_lookup_everywhere_done (frame=0x7fc090a243c8, this=<value optimized out>) at dht-common.c:1370
#3  0x00007fc082f215b3 in dht_lookup_everywhere_cbk (frame=0x7fc090a243c8, cookie=<value optimized out>, this=0x1f34bc0, 
    op_ret=<value optimized out>, op_errno=<value optimized out>, inode=0x7fc081be3b98, buf=0x7fc07f36ba58, xattr=0x2919ecc, 
    postparent=0x7fc07f36bac8) at dht-common.c:1577
#4  0x00007fc0831a4438 in afr_lookup_done (frame=0x7fbfffffffc6, cookie=0x7fff5390a5f8, this=0x1f332e0, op_ret=<value optimized out>, 
    op_errno=43234112, inode=0x8, buf=0x7fff5390a850, xattr=0x2919ecc, postparent=0x7fff5390a7e0) at afr-common.c:2319
#5  afr_lookup_cbk (frame=0x7fbfffffffc6, cookie=0x7fff5390a5f8, this=0x1f332e0, op_ret=<value optimized out>, op_errno=43234112, inode=0x8, 
    buf=0x7fff5390a850, xattr=0x2919ecc, postparent=0x7fff5390a7e0) at afr-common.c:2560
#6  0x00007fc0833e3267 in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7fc090a37424) at client-rpc-fops.c:2767
#7  0x000000391ca0e9c5 in rpc_clnt_handle_reply (clnt=0x1feefd0, pollin=0x29091b0) at rpc-clnt.c:763
#8  0x000000391ca0fe4f in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1fef000, event=<value optimized out>, data=<value optimized out>)
    at rpc-clnt.c:891
#9  0x000000391ca0b668 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:512
#10 0x00007fc08839e275 in socket_event_poll_in (this=0x1ffebd0) at socket.c:2127
#11 0x00007fc08839fc5d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1ffebd0, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2240
#12 0x000000391c276367 in event_dispatch_epoll_handler (event_pool=0x1efb4d0) at event-epoll.c:384
#13 event_dispatch_epoll (event_pool=0x1efb4d0) at event-epoll.c:445
#14 0x0000000000407e93 in main (argc=11, argv=0x7fff5390bce8) at glusterfsd.c:2023


Expected results:
nfs should not have crashed.

Comment 2 Anand Avati 2014-09-09 17:34:33 UTC

REVIEW: http://review.gluster.org/8666 (cluster/dht: Fixed double UNWIND in lookup everywhere code) posted (#1) for review on master by Shyamsundar Ranganathan (srangana)

Comment 3 Anand Avati 2014-09-10 03:29:21 UTC

COMMIT: http://review.gluster.org/8666 committed in master by Vijay Bellur (vbellur) 
------
commit b3314ea6e820fb659255d0e6e9a32ea259b7526d
Author: Shyam <srangana>
Date:   Tue Sep 9 13:31:51 2014 -0400

    cluster/dht: Fixed double UNWIND in lookup everywhere code
    
    In dht_lookup_everywhere_done:
    Line: 1194 we call DHT_STACK_UNWIND
    and in the same if condition we go ahead and call, goto
    unwind_hashed_and_cached; which at Line 1371 calls another UNWIND.
    
    As is obvious, higher frames could cleanup their locals and on
    receiving the next unwind could cause a coredump of the process.
    
    Fixed the same by calling the required return post the first unwind
    
    Change-Id: Ic5d57da98255b8616a65b4caaedabeba9144fd49
    BUG: 1139812
    Signed-off-by: Shyam <srangana>
    Reviewed-on: http://review.gluster.org/8666
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Vijay Bellur <vbellur>
    Reviewed-by: Jeff Darcy <jdarcy>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: susant palai <spalai>

Comment 4 Niels de Vos 2015-05-14 17:27:36 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:35:35 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:37:57 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:43:42 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user