Bug 1139812
| Summary: | DHT: Rebalance process crash after add-brick and `rebalance start' operation | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Shyamsundar <srangana> | |
| Component: | distribute | Assignee: | Shyamsundar <srangana> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | mainline | CC: | bugs, gluster-bugs, saujain, sdharane, srangana, ssamanta, surs, vbellur | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.7.0 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1139213 | |||
| : | 1140549 1142409 1151343 1151397 (view as bug list) | Environment: | ||
| Last Closed: | 2015-05-14 17:27:36 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1139213, 1140549, 1142409, 1142654, 1151343, 1151397 | |||
|
Description
Shyamsundar
2014-09-09 16:59:08 UTC
Another symptom of the same issue:
==================================
Description of problem:
I tried to resolve the gfid mismatch for a file and then did "ls" for that same file. The ls command hung and nfs server reported crash.
gluster volume info,
[root@nfs4 ~]# gluster volume info dist-rep
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 9a80c514-ddd4-45cb-b2b6-d745fd438db8
Status: Started
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.156:/bricks/d1r1
Brick2: 10.70.37.81:/bricks/d1r2
Brick3: 10.70.37.50:/bricks/d2r1
Brick4: 10.70.37.95:/bricks/d2r2
Brick5: 10.70.37.156:/bricks/d3r1
Brick6: 10.70.37.81:/bricks/d3r2
Brick7: 10.70.37.50:/bricks/d4r1
Brick8: 10.70.37.95:/bricks/d4r2
Brick9: 10.70.37.156:/bricks/d5r1
Brick10: 10.70.37.81:/bricks/d5r2
Brick11: 10.70.37.50:/bricks/d6r1
Brick12: 10.70.37.95:/bricks/d6r2
Brick13: 10.70.37.156:/bricks/d1r1-add3
Brick14: 10.70.37.81:/bricks/d1r2-add3
Options Reconfigured:
features.quota-deem-statfs: on
features.quota: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
procedure folllowed for overcoming gfid mismatch is this,
1. find the gfid on the replica pair using getfattr command.
2. if the gfid is different for the file. then on brick of the pair
a. first delete the gfid entry from .glusterfs/indices/xattrop dir
b. second delete the file itself from the same brick where step 2a is executed
3. check the gfid again using getfattr. Found that the gfid now matches
after above the above steps I have done ls <filename> on the mount-point.
How reproducible:
seen this time
Actual results:
nfs.log,
signal received: 11
time of crash:
2014-09-08 20:11:48
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.28
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x391c21ff06]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x391c23a59f]
/lib64/libc.so.6[0x3c6be326b0]
/usr/lib64/glusterfs/3.6.0.28/xlator/nfs/server.so(nfs_fop_lookup_cbk+0x225)[0x7fc08287f2e5]
/usr/lib64/glusterfs/3.6.0.28/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x113)[0x7fc082ad5e73]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lookup_everywhere_done+0x668)[0x7fc082f1cba8]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lookup_everywhere_cbk+0x403)[0x7fc082f215b3]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/replicate.so(afr_lookup_cbk+0x558)[0x7fc0831a4438]
/usr/lib64/glusterfs/3.6.0.28/xlator/protocol/client.so(client3_3_lookup_cbk+0x647)[0x7fc0833e3267]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x391ca0e9c5]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)[0x391ca0fe4f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x391ca0b668]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0x9275)[0x7fc08839e275]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0xac5d)[0x7fc08839fc5d]
/usr/lib64/libglusterfs.so.0[0x391c276367]
/usr/sbin/glusterfs(main+0x603)[0x407e93]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3c6be1ed5d]
/usr/sbin/glusterfs[0x4049a9]
(gdb) bt
#0 nfs_fop_lookup_cbk (frame=0x7fc09085b740, cookie=0x1f38790, this=<value optimized out>, op_ret=<value optimized out>,
op_errno=<value optimized out>, inode=<value optimized out>, buf=0x7fc07fd7916c, xattr=0x2919ecc, postparent=0x7fc07fd7939c) at nfs-fops.c:427
#1 0x00007fc082ad5e73 in io_stats_lookup_cbk (frame=0x7fc090a2635c, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=0,
inode=0x0, buf=0x7fc07fd7916c, xdata=0x2919ecc, postparent=0x7fc07fd7939c) at io-stats.c:1505
#2 0x00007fc082f1cba8 in dht_lookup_everywhere_done (frame=0x7fc090a243c8, this=<value optimized out>) at dht-common.c:1370
#3 0x00007fc082f215b3 in dht_lookup_everywhere_cbk (frame=0x7fc090a243c8, cookie=<value optimized out>, this=0x1f34bc0,
op_ret=<value optimized out>, op_errno=<value optimized out>, inode=0x7fc081be3b98, buf=0x7fc07f36ba58, xattr=0x2919ecc,
postparent=0x7fc07f36bac8) at dht-common.c:1577
#4 0x00007fc0831a4438 in afr_lookup_done (frame=0x7fbfffffffc6, cookie=0x7fff5390a5f8, this=0x1f332e0, op_ret=<value optimized out>,
op_errno=43234112, inode=0x8, buf=0x7fff5390a850, xattr=0x2919ecc, postparent=0x7fff5390a7e0) at afr-common.c:2319
#5 afr_lookup_cbk (frame=0x7fbfffffffc6, cookie=0x7fff5390a5f8, this=0x1f332e0, op_ret=<value optimized out>, op_errno=43234112, inode=0x8,
buf=0x7fff5390a850, xattr=0x2919ecc, postparent=0x7fff5390a7e0) at afr-common.c:2560
#6 0x00007fc0833e3267 in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>,
myframe=0x7fc090a37424) at client-rpc-fops.c:2767
#7 0x000000391ca0e9c5 in rpc_clnt_handle_reply (clnt=0x1feefd0, pollin=0x29091b0) at rpc-clnt.c:763
#8 0x000000391ca0fe4f in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1fef000, event=<value optimized out>, data=<value optimized out>)
at rpc-clnt.c:891
#9 0x000000391ca0b668 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
at rpc-transport.c:512
#10 0x00007fc08839e275 in socket_event_poll_in (this=0x1ffebd0) at socket.c:2127
#11 0x00007fc08839fc5d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1ffebd0, poll_in=1, poll_out=0,
poll_err=0) at socket.c:2240
#12 0x000000391c276367 in event_dispatch_epoll_handler (event_pool=0x1efb4d0) at event-epoll.c:384
#13 event_dispatch_epoll (event_pool=0x1efb4d0) at event-epoll.c:445
#14 0x0000000000407e93 in main (argc=11, argv=0x7fff5390bce8) at glusterfsd.c:2023
Expected results:
nfs should not have crashed.
REVIEW: http://review.gluster.org/8666 (cluster/dht: Fixed double UNWIND in lookup everywhere code) posted (#1) for review on master by Shyamsundar Ranganathan (srangana) COMMIT: http://review.gluster.org/8666 committed in master by Vijay Bellur (vbellur) ------ commit b3314ea6e820fb659255d0e6e9a32ea259b7526d Author: Shyam <srangana> Date: Tue Sep 9 13:31:51 2014 -0400 cluster/dht: Fixed double UNWIND in lookup everywhere code In dht_lookup_everywhere_done: Line: 1194 we call DHT_STACK_UNWIND and in the same if condition we go ahead and call, goto unwind_hashed_and_cached; which at Line 1371 calls another UNWIND. As is obvious, higher frames could cleanup their locals and on receiving the next unwind could cause a coredump of the process. Fixed the same by calling the required return post the first unwind Change-Id: Ic5d57da98255b8616a65b4caaedabeba9144fd49 BUG: 1139812 Signed-off-by: Shyam <srangana> Reviewed-on: http://review.gluster.org/8666 Reviewed-by: N Balachandran <nbalacha> Reviewed-by: Vijay Bellur <vbellur> Reviewed-by: Jeff Darcy <jdarcy> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: susant palai <spalai> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |