Created attachment 1281161 [details] Comments cannot be longer than 65535 characters, hence attaching +++ This bug was initially created as a clone of Bug #1447523 +++ Description of problem: Issuing a peer probe results in a glusterd segmentation fault. Once in this state, if the peer is removed from /var/lib/glusterd/peers, glusterd will start. Probing a peer again leads to the same problem. Problematic peer entry: cat /var/lib/glusterd/peers/ip-10-0-50-25.us-west-1.compute.internal uuid=00000000-0000-0000-0000-000000000000 state=0 hostname1=ip-10-0-50-25.us-west-1.compute.internal Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level=TRACE --log-buf-size=0'. Program terminated with signal SIGSEGV, Segmentation fault. #0 x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58 58 ./md-unwind-support.h: No such file or directory. (gdb) bt #0 x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58 #1 uw_frame_state_for (context=context@entry=0x7ffe5d9a3b50, fs=fs@entry=0x7ffe5d9a3c40) at ../../../src/libgcc/unwind-dw2.c:1253 #2 0x00007f6371b2f6d8 in _Unwind_Backtrace (trace=0x7f6378bc2440 <backtrace_helper>, trace_argument=0x7ffe5d9a3e00) at ../../../src/libgcc/unwind.inc:290 #3 0x00007f6378bc25b6 in __GI___backtrace (array=array@entry=0x7ffe5d9a3e40, size=size@entry=200) at ../sysdeps/x86_64/backtrace.c:109 #4 0x00007f63796f3f42 in _gf_msg_backtrace_nomem (level=level@entry=GF_LOG_ALERT, stacksize=stacksize@entry=200) at logging.c:1094 #5 0x00007f63796fd494 in gf_print_trace (signum=11, ctx=0x7f637a3ac010) at common-utils.c:737 #6 <signal handler called> #7 0x00000001725cc6c8 in ?? () #8 0x0000000000000000 in ?? () Version-Release number of selected component (if applicable): $ glusterd --version glusterfs 3.8.11 from package glusterfs-server 3.8.11-ubuntu1~trusty1 How reproducible: 1:1 Steps to Reproduce: 1. Install gluster on Ubuntu 14.04 2. sudo /usr/sbin/gluster --log-level=TRACE peer probe ip-10-0-50-25.us-west-1.compute.internal Connection failed. Please check if gluster daemon is operational. Actual results: Glusterd crashes on peer probe. Expected results: Glusterd should not crash on peer probe. Additional info: There's another issue which may be related. I noticed that glusterd.info was not self-populating. As a workaround I issue 'gluster pool list' which triggers glusterd to generate and store a UUID: cat /var/lib/glusterd/glusterd.info UUID=ad7b8337-ec4d-4917-ad6b-ca0e4d0eba42 operating-version=30800 This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1293594 Gaurav, I can grant you access to EC2 instances that are in this state. Is that acceptable? If so, please send me your SSH public key. Please look at https://bugzilla.redhat.com/attachment.cgi?id=1276539 ? Check out Stacktrace, StacktraceSource, and ThreadStacktrace. --- Additional comment from Kaushal on 2017-05-15 03:46:47 EDT --- To make it easier to debug, please install the `glusterfs-dbg` package, which should provide better information in the backtraces. Also, try to start glusterd with debug logs, either directly by running `glusterd -LDEBUG` or by modifying the init script. Doing the above should help get better logs and stacktraces, which will help you get to the cause faster. --- Additional comment from Ben Werthmann on 2017-05-15 10:14:04 EDT --- Kaushal, 'glusterfs-dbg' is already installed and I've already modified the init scripts (upstart job in this case) to use DEBUG level logging. --- Additional comment from Gaurav Yadav on 2017-05-17 01:32:38 EDT --- Ben, Logs attached by you doesn't help much. I am not able to see the proper backtrace. In order to do RCA I need either reproducer or your host. Here is my SSH public key ssh-rsa ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWFZqzFVo7orVZx2ODZyok46VI6EqLg16uP2Z1pkMrEQGu50i3Ye16V5I63UMrHjDwdr4hxtvkW9UfhckQpgBwjsVg9xoyl9tuYt1h9au8G0hH2UL1XYWmbQt82N9VbeYGStg3n0VoefHNZ4LH/VINg0gBWtIK7iTQxWR6XOvs2QqOJnUnM+Fgu5b9kS9vPoDr93BxGLya2ijASkRxsi5dUN4qm7LgFX7Hsyh14G+BBouF5wDZ6frR/UPpqocBVJ5/n4f9OkhwMOShlkWm0m/JDcu6L0phL+Dqm9KxPHBEA/PFW3atjvJW70Iun+j1i72SCcMccQjHSPB6J5QYSeQb gyadav.eng.blr.redhat.com --- Additional comment from Ben Werthmann on 2017-05-17 12:09:50 EDT --- Gaurav, I've provided the connection info to you in a direct email. --- Additional comment from Ben Werthmann on 2017-05-19 20:37:46 EDT --- Running this command before peer probe reproduces this problem (leading to the backtrace handler problem) in all cases: sysctl net.ipv4.ip_local_reserved_ports="49152-49156" or sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156" The issue appears to be with parsing the contents of '/proc/sys/net/ipv4/ip_local_reserved_ports' here: https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-utils.c#L3038 This option appears to defer to the kernel for source port selection. Is there a known issue with kernel port selection? https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320 I'm going to build and test with the above configure option. --- Additional comment from Ben Werthmann on 2017-05-22 11:20:27 EDT --- (In reply to Ben Werthmann from comment #22) > Running this command before peer probe reproduces this problem (leading to > the backtrace handler problem) in all cases: > > sysctl net.ipv4.ip_local_reserved_ports="49152-49156" > > or > > sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156" > > The issue appears to be with parsing the contents of > '/proc/sys/net/ipv4/ip_local_reserved_ports' here: > > https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common- > utils.c#L3038 > > This option appears to defer to the kernel for source port selection. Is > there a known issue with kernel port selection? > > https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320 This option is not in 3.8. > > I'm going to build and test with the above configure option. --- Additional comment from Gaurav Yadav on 2017-05-22 12:34:54 EDT --- Thanks Ben for providing the additional info, It helped me in finding the root cause of the issue. While parsing the ports we are not handling MIN MAX range properly hence glusterd is crashing.
REVIEW: https://review.gluster.org/17359 (glusterd : Fix crash in glusterd while peer probing) posted (#1) for review on master by Gaurav Yadav (gyadav)
Can you please elaborate on your findings?
REVIEW: https://review.gluster.org/17359 (glusterd : Fix crash in glusterd while peer probing) posted (#2) for review on master by Gaurav Yadav (gyadav)
REVIEW: https://review.gluster.org/17359 (libglusterfs : Fix crash in glusterd while peer probing) posted (#3) for review on master by Gaurav Yadav (gyadav)
I built glusterfs without the backtrace handler by commenting out HAVE_BACKTRACE: /* define if found backtrace */ /* #undef HAVE_BACKTRACE */ Here's the "real" backtrace from a debug build: (gdb) bt #0 0x00007f9bf14fa941 in BIT_SET (array=0x7fffe8e7b0c0 "", index=4294967295) at common-utils.h:247 #1 0x00007f9bf15015d1 in gf_ports_reserved (blocked_port=0x2635e58 "49152", ports=0x7fffe8e7b0c0 "", ceiling=49152) at common-utils.c:3098 #2 0x00007f9bf1501248 in gf_process_reserved_ports (ports=0x7fffe8e7b0c0 "", ceiling=49152) at common-utils.c:3021 #3 0x00007f9bec3b7d20 in af_inet_bind_to_port_lt_ceiling (fd=9, sockaddr=0x2634230, sockaddr_len=16, ceiling=49152) at name.c:55 #4 0x00007f9bec3b8d53 in client_bind (this=0x2633f80, sockaddr=0x2634230, sockaddr_len=0x26342b0, sock=9) at name.c:478 #5 0x00007f9bec3b3f6a in socket_connect (this=0x2633f80, port=0) at socket.c:3232 #6 0x00007f9bf12b595f in rpc_transport_connect (this=0x2633f80, port=0) at rpc-transport.c:418 #7 0x00007f9bf12b8eb4 in rpc_clnt_reconnect (conn_ptr=0x2633d80) at rpc-clnt.c:407 #8 0x00007f9bf12ba789 in rpc_clnt_start (rpc=0x2633d50) at rpc-clnt.c:1196 #9 0x0000000000411599 in glusterfs_mgmt_init (ctx=0x25be010) at glusterfsd-mgmt.c:2429 #10 0x000000000040ab50 in glusterfs_volumes_init (ctx=0x25be010) at glusterfsd.c:2387 #11 0x000000000040b076 in main (argc=19, argv=0x7fffe8e7e588) at glusterfsd.c:2518
REVIEW: https://review.gluster.org/17359 (libglusterfs : Fix crash in glusterd while peer probing) posted (#4) for review on master by Gaurav Yadav (gyadav)
COMMIT: https://review.gluster.org/17359 committed in master by Jeff Darcy (jeff.us) ------ commit 23930326e0378edace9c8c41e8ae95931a2f68ba Author: Gaurav Yadav <gyadav> Date: Mon May 22 23:25:47 2017 +0530 libglusterfs : Fix crash in glusterd while peer probing glusterd crashes when port is being set explcitly to a range which is outside greater than short data type range. Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156" In above case glusterd crashes while parsing the port. With this fix glusterd will be able to handle port range between INT_MIN to INT_MAX Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1 BUG: 1454418 Signed-off-by: Gaurav Yadav <gyadav> Reviewed-on: https://review.gluster.org/17359 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Samikshan Bairagya <samikshan> Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Niels de Vos <ndevos> Reviewed-by: Jeff Darcy <jeff.us>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days