Bug 1310677
Summary: | glusterd crashed when probing a node with firewall enabled on only one node | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | SATHEESARAN <sasundar> | |
Component: | glusterd | Assignee: | Satish Mohan <smohan> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | mainline | CC: | amukherj, bugs, mselvaga | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1314391 1315626 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-16 13:58:25 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1314391, 1315626 |
Description
SATHEESARAN
2016-02-22 13:49:29 UTC
[root@node2 ~]# gdb -c /core.9717 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. [New LWP 9720] [New LWP 9725] [New LWP 9718] [New LWP 9726] [New LWP 9719] [New LWP 9732] [New LWP 9724] [New LWP 9723] [New LWP 9717] [New LWP 9721] warning: core file may not match specified executable file. Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. Missing separate debuginfo for Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/17/a121b1f7bbb010f54735ffde3347b27b33884d [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24 24 1: LOCK (gdb) bt #0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24 #1 0x00007fdd8397a45d in __gf_free (free_ptr=0x7fdd6c000cb0) at mem-pool.c:316 #2 0x00007fdd8393ee55 in data_destroy (data=<optimized out>) at dict.c:235 #3 0x00007fdd83941b79 in dict_get_str (this=<optimized out>, key=<optimized out>, str=<optimized out>) at dict.c:2213 #4 0x00007fdd784adce9 in glusterd_xfer_cli_probe_resp (req=req@entry=0x7fdd85c6811c, op_ret=op_ret@entry=-1, op_errno=0, op_errstr=op_errstr@entry=0x0, hostname=0x7fdd6c000d80 "dhcp37-152", port=24007, dict=0x7fdd83c17be4) at glusterd-handler.c:3894 #5 0x00007fdd784aea57 in __glusterd_handle_cli_probe (req=req@entry=0x7fdd85c6811c) at glusterd-handler.c:1220 #6 0x00007fdd784a7540 in glusterd_big_locked_handler (req=0x7fdd85c6811c, actor_fn=0x7fdd784ae590 <__glusterd_handle_cli_probe>) at glusterd-handler.c:83 #7 0x00007fdd83988e32 in synctask_wrap (old_task=<optimized out>) at syncop.c:380 #8 0x00007fdd82047110 in ?? () from /usr/lib64/libc-2.17.so #9 0x0000000000000000 in ?? () Coredump error messages as seen in glusterd logs : <snip> The message "I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <dhcp37-152.lab.eng.blr.redhat.com> (<4d46cc7a-6d17-460e-82ba-7f5624436fb0>), in state <Accepted peer request>, has disconnected from glusterd." repeated 4 times between [2016-02-22 15:50:38.204058] and [2016-02-22 15:50:50.235773] [2016-02-22 15:50:51.106009] I [MSGID: 106487] [glusterd-handler.c:1178:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req dhcp37-152 24007 The message "I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: already stopped" repeated 4 times between [2016-02-22 15:50:16.093916] and [2016-02-22 15:50:16.093939] pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2016-02-22 15:50:51 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.6 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fdd83947012] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7fdd839634dd] /lib64/libc.so.6(+0x35670)[0x7fdd82035670] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7fdd827b4210] --------- (END) </snip> Console output from node1 ------------------------- [root@node1 ~]# gluster peer probe node2 peer probe: success. [root@node1 ~]# gluster peer status Number of Peers: 1 Hostname: node2 Uuid: df339e12-c30f-4a86-9977-ef4ac6d5a190 State: Accepted peer request (Connected) Console output from node2 ------------------------- [root@node2 ~]# gluster peer status Number of Peers: 1 Hostname: node1 Uuid: 4d46cc7a-6d17-460e-82ba-7f5624436fb0 State: Accepted peer request (Disconnected) [root@node2 ~]# gluster peer probe node1 peer probe: success. Host dhcp37-152 port 24007 already in peer list [root@node2 ~]# gluster peer status Connection failed. Please check if gluster daemon is operational. peer status: failed I could hit this issue consistently upstream patch for this bug is available: http://review.gluster.org/#/c/13546/ REVIEW: http://review.gluster.org/13546 (glusterd: glusterd was crashing when peer probing of disconnect node of cluster) posted (#2) for review on master by Gaurav Kumar Garg (ggarg) REVIEW: http://review.gluster.org/13546 (glusterd: upon peer probe glusterd should not return response to CLI two times) posted (#3) for review on master by Gaurav Kumar Garg (ggarg) REVIEW: http://review.gluster.org/13546 (glusterd:upon re-peer probe glusterd should not return response to CLI two times) posted (#4) for review on master by Gaurav Kumar Garg (ggarg) REVIEW: http://review.gluster.org/13546 (glusterd:upon re-peer probe glusterd should not return response to CLI two times) posted (#5) for review on master by Atin Mukherjee (amukherj) REVIEW: http://review.gluster.org/13546 (glusterd:upon re-peer probe glusterd should not return response to CLI two times) posted (#6) for review on master by Gaurav Kumar Garg (ggarg) COMMIT: http://review.gluster.org/13546 committed in master by Atin Mukherjee (amukherj) ------ commit f44232e6a18a4b79e680ea0b6322269b84fa6813 Author: Gaurav Kumar Garg <garg.gaurav52> Date: Mon Feb 29 15:48:58 2016 +0530 glusterd:upon re-peer probe glusterd should not return response to CLI two times If a node N1 and node N2 is part of the cluster and a node N2 try to reprobe node N1 when N1 is disconnected by any means (for eg: either server is down or glusterd is not running or there is a network outage, or firewall is blocking port number 24007 on which glusterd listen, etc.), then glusterd trying to send back two responses to CLI resulting into a double free and a glusterd crash. With this fix glusterd will send response to cli only once and prevent glusterd crash. Note: glusterd was crashing only when user has done first peer probe with hostname and re-probe with ip-address or vice-versa. Change-Id: I92012b147091cf9129f1fbc17834b3f4d7cb46a0 BUG: 1310677 Signed-off-by: Gaurav Kumar Garg <ggarg> Reviewed-on: http://review.gluster.org/13546 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Atin Mukherjee <amukherj> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |