Bug 1454418 - Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
Summary: Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Gaurav Yadav
QA Contact:
URL:
Whiteboard:
Depends On: 1447523 1459759 1459760
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-22 16:57 UTC by Gaurav Yadav
Modified: 2023-09-14 03:57 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1447523
Environment:
Last Closed: 2017-09-05 17:31:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Comments cannot be longer than 65535 characters, hence attaching (331.88 KB, text/plain)
2017-05-22 16:57 UTC, Gaurav Yadav
no flags Details

Description Gaurav Yadav 2017-05-22 16:57:06 UTC
Created attachment 1281161 [details]
Comments cannot be longer than 65535 characters, hence attaching

+++ This bug was initially created as a clone of Bug #1447523 +++

Description of problem:

Issuing a peer probe results in a glusterd segmentation fault. Once in this state, if the peer is removed from /var/lib/glusterd/peers, glusterd will start.  Probing a peer again leads to the same problem.

Problematic peer entry:
cat /var/lib/glusterd/peers/ip-10-0-50-25.us-west-1.compute.internal 
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=ip-10-0-50-25.us-west-1.compute.internal


Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level=TRACE --log-buf-size=0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
58      ./md-unwind-support.h: No such file or directory.
(gdb) bt
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
#1  uw_frame_state_for (context=context@entry=0x7ffe5d9a3b50, fs=fs@entry=0x7ffe5d9a3c40) at ../../../src/libgcc/unwind-dw2.c:1253
#2  0x00007f6371b2f6d8 in _Unwind_Backtrace (trace=0x7f6378bc2440 <backtrace_helper>, trace_argument=0x7ffe5d9a3e00) at ../../../src/libgcc/unwind.inc:290
#3  0x00007f6378bc25b6 in __GI___backtrace (array=array@entry=0x7ffe5d9a3e40, size=size@entry=200) at ../sysdeps/x86_64/backtrace.c:109
#4  0x00007f63796f3f42 in _gf_msg_backtrace_nomem (level=level@entry=GF_LOG_ALERT, stacksize=stacksize@entry=200) at logging.c:1094
#5  0x00007f63796fd494 in gf_print_trace (signum=11, ctx=0x7f637a3ac010) at common-utils.c:737
#6  <signal handler called>
#7  0x00000001725cc6c8 in ?? ()
#8  0x0000000000000000 in ?? ()



Version-Release number of selected component (if applicable):

$ glusterd --version 
glusterfs 3.8.11

from package glusterfs-server 3.8.11-ubuntu1~trusty1

How reproducible:

1:1


Steps to Reproduce:
1. Install gluster on Ubuntu 14.04
2. sudo /usr/sbin/gluster --log-level=TRACE peer probe ip-10-0-50-25.us-west-1.compute.internal
Connection failed. Please check if gluster daemon is operational.

Actual results:

Glusterd crashes on peer probe.

Expected results:

Glusterd should not crash on peer probe.


Additional info:

There's another issue which may be related. I noticed that glusterd.info was not self-populating. As a workaround I issue 'gluster pool list' which triggers glusterd to generate and store a UUID:

cat /var/lib/glusterd/glusterd.info 
UUID=ad7b8337-ec4d-4917-ad6b-ca0e4d0eba42
operating-version=30800

This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1293594


Gaurav,

I can grant you access to EC2 instances that are in this state. Is that acceptable? If so, please send me your SSH public key.

Please look at https://bugzilla.redhat.com/attachment.cgi?id=1276539 ? Check out Stacktrace, StacktraceSource, and ThreadStacktrace.

--- Additional comment from Kaushal on 2017-05-15 03:46:47 EDT ---

To make it easier to debug, please install the `glusterfs-dbg` package, which should provide better information in the backtraces. Also, try to start glusterd with debug logs, either directly by running `glusterd -LDEBUG` or by modifying the init script.

Doing the above should help get better logs and stacktraces, which will help you get to the cause faster.

--- Additional comment from Ben Werthmann on 2017-05-15 10:14:04 EDT ---

Kaushal,

'glusterfs-dbg' is already installed and I've already modified the init scripts (upstart job in this case) to use DEBUG level logging.

--- Additional comment from Gaurav Yadav on 2017-05-17 01:32:38 EDT ---

Ben,

Logs attached by you doesn't help much. I am not able to see the proper backtrace.

In order to do RCA I need either reproducer or your host.

Here is my SSH public key

ssh-rsa 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWFZqzFVo7orVZx2ODZyok46VI6EqLg16uP2Z1pkMrEQGu50i3Ye16V5I63UMrHjDwdr4hxtvkW9UfhckQpgBwjsVg9xoyl9tuYt1h9au8G0hH2UL1XYWmbQt82N9VbeYGStg3n0VoefHNZ4LH/VINg0gBWtIK7iTQxWR6XOvs2QqOJnUnM+Fgu5b9kS9vPoDr93BxGLya2ijASkRxsi5dUN4qm7LgFX7Hsyh14G+BBouF5wDZ6frR/UPpqocBVJ5/n4f9OkhwMOShlkWm0m/JDcu6L0phL+Dqm9KxPHBEA/PFW3atjvJW70Iun+j1i72SCcMccQjHSPB6J5QYSeQb gyadav.eng.blr.redhat.com

--- Additional comment from Ben Werthmann on 2017-05-17 12:09:50 EDT ---

Gaurav,

I've provided the connection info to you in a direct email.

--- Additional comment from Ben Werthmann on 2017-05-19 20:37:46 EDT ---

Running this command before peer probe reproduces this problem (leading to the backtrace handler problem) in all cases:

sysctl net.ipv4.ip_local_reserved_ports="49152-49156"

or

sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"

The issue appears to be with parsing the contents of '/proc/sys/net/ipv4/ip_local_reserved_ports' here:

https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-utils.c#L3038

This option appears to defer to the kernel for source port selection. Is there a known issue with kernel port selection?

https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

I'm going to build and test with the above configure option.

--- Additional comment from Ben Werthmann on 2017-05-22 11:20:27 EDT ---

(In reply to Ben Werthmann from comment #22)
> Running this command before peer probe reproduces this problem (leading to
> the backtrace handler problem) in all cases:
> 
> sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
> 
> or
> 
> sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"
> 
> The issue appears to be with parsing the contents of
> '/proc/sys/net/ipv4/ip_local_reserved_ports' here:
> 
> https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-
> utils.c#L3038
> 
> This option appears to defer to the kernel for source port selection. Is
> there a known issue with kernel port selection?
> 
> https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

This option is not in 3.8.

> 
> I'm going to build and test with the above configure option.

--- Additional comment from Gaurav Yadav on 2017-05-22 12:34:54 EDT ---

Thanks Ben for providing the additional info, It helped me in finding the root cause of the issue.
While parsing the ports we are not handling MIN MAX range properly hence glusterd is crashing.

Comment 1 Worker Ant 2017-05-22 18:09:11 UTC
REVIEW: https://review.gluster.org/17359 (glusterd : Fix crash in glusterd while peer probing) posted (#1) for review on master by Gaurav Yadav (gyadav)

Comment 2 Ben Werthmann 2017-05-22 20:09:08 UTC
Can you please elaborate on your findings?

Comment 3 Worker Ant 2017-05-23 01:24:38 UTC
REVIEW: https://review.gluster.org/17359 (glusterd : Fix crash in glusterd while peer probing) posted (#2) for review on master by Gaurav Yadav (gyadav)

Comment 4 Worker Ant 2017-05-23 10:39:08 UTC
REVIEW: https://review.gluster.org/17359 (libglusterfs : Fix crash in glusterd while peer probing) posted (#3) for review on master by Gaurav Yadav (gyadav)

Comment 5 Ben Werthmann 2017-05-23 14:28:10 UTC
I built glusterfs without the backtrace handler by commenting out HAVE_BACKTRACE: 

/* define if found backtrace */
/* #undef HAVE_BACKTRACE */

Here's the "real" backtrace from a debug build:

(gdb) bt
#0  0x00007f9bf14fa941 in BIT_SET (array=0x7fffe8e7b0c0 "", index=4294967295) at common-utils.h:247
#1  0x00007f9bf15015d1 in gf_ports_reserved (blocked_port=0x2635e58 "49152", ports=0x7fffe8e7b0c0 "", ceiling=49152) at common-utils.c:3098
#2  0x00007f9bf1501248 in gf_process_reserved_ports (ports=0x7fffe8e7b0c0 "", ceiling=49152) at common-utils.c:3021
#3  0x00007f9bec3b7d20 in af_inet_bind_to_port_lt_ceiling (fd=9, sockaddr=0x2634230, sockaddr_len=16, ceiling=49152) at name.c:55
#4  0x00007f9bec3b8d53 in client_bind (this=0x2633f80, sockaddr=0x2634230, sockaddr_len=0x26342b0, sock=9) at name.c:478
#5  0x00007f9bec3b3f6a in socket_connect (this=0x2633f80, port=0) at socket.c:3232
#6  0x00007f9bf12b595f in rpc_transport_connect (this=0x2633f80, port=0) at rpc-transport.c:418
#7  0x00007f9bf12b8eb4 in rpc_clnt_reconnect (conn_ptr=0x2633d80) at rpc-clnt.c:407
#8  0x00007f9bf12ba789 in rpc_clnt_start (rpc=0x2633d50) at rpc-clnt.c:1196
#9  0x0000000000411599 in glusterfs_mgmt_init (ctx=0x25be010) at glusterfsd-mgmt.c:2429
#10 0x000000000040ab50 in glusterfs_volumes_init (ctx=0x25be010) at glusterfsd.c:2387
#11 0x000000000040b076 in main (argc=19, argv=0x7fffe8e7e588) at glusterfsd.c:2518

Comment 6 Worker Ant 2017-05-23 18:53:07 UTC
REVIEW: https://review.gluster.org/17359 (libglusterfs : Fix crash in glusterd while peer probing) posted (#4) for review on master by Gaurav Yadav (gyadav)

Comment 7 Worker Ant 2017-05-26 12:08:41 UTC
COMMIT: https://review.gluster.org/17359 committed in master by Jeff Darcy (jeff.us) 
------
commit 23930326e0378edace9c8c41e8ae95931a2f68ba
Author: Gaurav Yadav <gyadav>
Date:   Mon May 22 23:25:47 2017 +0530

    libglusterfs : Fix crash in glusterd while peer probing
    
    glusterd crashes when port is being set explcitly to a
    range which is outside greater than short data type range.
    Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
    In above case glusterd crashes while parsing the port.
    
    With this fix glusterd will be able to handle port range
    between INT_MIN to INT_MAX
    
    Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
    BUG: 1454418
    Signed-off-by: Gaurav Yadav <gyadav>
    Reviewed-on: https://review.gluster.org/17359
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: Niels de Vos <ndevos>
    Reviewed-by: Jeff Darcy <jeff.us>

Comment 8 Shyamsundar 2017-09-05 17:31:32 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 9 Red Hat Bugzilla 2023-09-14 03:57:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.