Bug 1459760 - Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
Summary: Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.10
Hardware: x86_64
OS: Linux
medium
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1447523
Blocks: 1454418 glusterfs-3.10.4
TreeView+ depends on / blocked
 
Reported: 2017-06-08 06:32 UTC by Gaurav Yadav
Modified: 2018-06-20 18:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1447523
Environment:
Last Closed: 2018-06-20 18:30:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Comments cannot be longer than 65535 characters, hence attaching (331.88 KB, text/plain)
2017-06-08 06:32 UTC, Gaurav Yadav
no flags Details

Description Gaurav Yadav 2017-06-08 06:32:01 UTC
Created attachment 1285987 [details]
Comments cannot be longer than 65535 characters, hence attaching

+++ This bug was initially created as a clone of Bug #1447523 +++

Description of problem:

ssuing a peer probe results in a glusterd segmentation fault. Once in this state, if the peer is removed from /var/lib/glusterd/peers, glusterd will start.  Probing a peer again leads to the same problem.

Problematic peer entry:
cat /var/lib/glusterd/peers/ip-10-0-50-25.us-west-1.compute.internal 
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=ip-10-0-50-25.us-west-1.compute.internal


Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level=TRACE --log-buf-size=0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
58      ./md-unwind-support.h: No such file or directory.
(gdb) bt
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
#1  uw_frame_state_for (context=context@entry=0x7ffe5d9a3b50, fs=fs@entry=0x7ffe5d9a3c40) at ../../../src/libgcc/unwind-dw2.c:1253
#2  0x00007f6371b2f6d8 in _Unwind_Backtrace (trace=0x7f6378bc2440 <backtrace_helper>, trace_argument=0x7ffe5d9a3e00) at ../../../src/libgcc/unwind.inc:290
#3  0x00007f6378bc25b6 in __GI___backtrace (array=array@entry=0x7ffe5d9a3e40, size=size@entry=200) at ../sysdeps/x86_64/backtrace.c:109
#4  0x00007f63796f3f42 in _gf_msg_backtrace_nomem (level=level@entry=GF_LOG_ALERT, stacksize=stacksize@entry=200) at logging.c:1094
#5  0x00007f63796fd494 in gf_print_trace (signum=11, ctx=0x7f637a3ac010) at common-utils.c:737
#6  <signal handler called>
#7  0x00000001725cc6c8 in ?? ()
#8  0x0000000000000000 in ?? ()



Version-Release number of selected component (if applicable):

$ glusterd --version 
glusterfs 3.8.11

from package glusterfs-server 3.8.11-ubuntu1~trusty1

How reproducible:

1:1


Steps to Reproduce:
1. Install gluster on Ubuntu 14.04
2. sudo /usr/sbin/gluster --log-level=TRACE peer probe ip-10-0-50-25.us-west-1.compute.internal
Connection failed. Please check if gluster daemon is operational.

Actual results:

Glusterd crashes on peer probe.

Expected results:

Glusterd should not crash on peer probe.


Additional info:

There's another issue which may be related. I noticed that glusterd.info was not self-populating. As a workaround I issue 'gluster pool list' which triggers glusterd to generate and store a UUID:

cat /var/lib/glusterd/glusterd.info 
UUID=ad7b8337-ec4d-4917-ad6b-ca0e4d0eba42
operating-version=30800

This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1293594


Gaurav,

I can grant you access to EC2 instances that are in this state. Is that acceptable? If so, please send me your SSH public key.

Please look at https://bugzilla.redhat.com/attachment.cgi?id=1276539 ? Check out Stacktrace, StacktraceSource, and ThreadStacktrace.

--- Additional comment from Kaushal on 2017-05-15 03:46:47 EDT ---

To make it easier to debug, please install the `glusterfs-dbg` package, which should provide better information in the backtraces. Also, try to start glusterd with debug logs, either directly by running `glusterd -LDEBUG` or by modifying the init script.

Doing the above should help get better logs and stacktraces, which will help you get to the cause faster.

--- Additional comment from Ben Werthmann on 2017-05-15 10:14:04 EDT ---

Kaushal,

'glusterfs-dbg' is already installed and I've already modified the init scripts (upstart job in this case) to use DEBUG level logging.

--- Additional comment from Gaurav Yadav on 2017-05-17 01:32:38 EDT ---

Ben,

Logs attached by you doesn't help much. I am not able to see the proper backtrace.

In order to do RCA I need either reproducer or your host.

Here is my SSH public key

ssh-rsa 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWFZqzFVo7orVZx2ODZyok46VI6EqLg16uP2Z1pkMrEQGu50i3Ye16V5I63UMrHjDwdr4hxtvkW9UfhckQpgBwjsVg9xoyl9tuYt1h9au8G0hH2UL1XYWmbQt82N9VbeYGStg3n0VoefHNZ4LH/VINg0gBWtIK7iTQxWR6XOvs2QqOJnUnM+Fgu5b9kS9vPoDr93BxGLya2ijASkRxsi5dUN4qm7LgFX7Hsyh14G+BBouF5wDZ6frR/UPpqocBVJ5/n4f9OkhwMOShlkWm0m/JDcu6L0phL+Dqm9KxPHBEA/PFW3atjvJW70Iun+j1i72SCcMccQjHSPB6J5QYSeQb gyadav.eng.blr.redhat.com

--- Additional comment from Ben Werthmann on 2017-05-17 12:09:50 EDT ---

Gaurav,

I've provided the connection info to you in a direct email.

--- Additional comment from Ben Werthmann on 2017-05-19 20:37:46 EDT ---

Running this command before peer probe reproduces this problem (leading to the backtrace handler problem) in all cases:

sysctl net.ipv4.ip_local_reserved_ports="49152-49156"

or

sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"

The issue appears to be with parsing the contents of '/proc/sys/net/ipv4/ip_local_reserved_ports' here:

https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-utils.c#L3038

This option appears to defer to the kernel for source port selection. Is there a known issue with kernel port selection?

https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

I'm going to build and test with the above configure option.

--- Additional comment from Ben Werthmann on 2017-05-22 11:20:27 EDT ---

(In reply to Ben Werthmann from comment #22)
> Running this command before peer probe reproduces this problem (leading to
> the backtrace handler problem) in all cases:
> 
> sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
> 
> or
> 
> sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"
> 
> The issue appears to be with parsing the contents of
> '/proc/sys/net/ipv4/ip_local_reserved_ports' here:
> 
> https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-
> utils.c#L3038
> 
> This option appears to defer to the kernel for source port selection. Is
> there a known issue with kernel port selection?
> 
> https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

This option is not in 3.8.

> 
> I'm going to build and test with the above configure option.

--- Additional comment from Gaurav Yadav on 2017-05-22 12:34:54 EDT ---

Thanks Ben for providing the additional info, It helped me in finding the root cause of the issue.
While parsing the ports we are not handling MIN MAX range properly hence glusterd is crashing.

Comment 1 Worker Ant 2017-06-09 05:50:26 UTC
REVIEW: https://review.gluster.org/17494 (libglusterfs : Fix crash in glusterd while peer probing) posted (#1) for review on release-3.10 by Gaurav Yadav (gyadav)

Comment 2 Worker Ant 2017-06-20 04:56:12 UTC
COMMIT: https://review.gluster.org/17494 committed in release-3.10 by Raghavendra Talur (rtalur) 
------
commit 89b55994bf84a489b10b3b40d3a6245681eb4c4c
Author: Gaurav Yadav <gyadav>
Date:   Mon May 22 23:25:47 2017 +0530

    libglusterfs : Fix crash in glusterd while peer probing
    
    glusterd crashes when port is being set explcitly to a
    range which is outside greater than short data type range.
    Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
    In above case glusterd crashes while parsing the port.
    
    With this fix glusterd will be able to handle port range
    between INT_MIN to INT_MAX
    
    > Reviewed-on: https://review.gluster.org/17359
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    >  Reviewed-by: Samikshan Bairagya <samikshan>
    > Reviewed-by: Atin Mukherjee <amukherj>
    > Reviewed-by: Niels de Vos <ndevos>
    > Reviewed-by: Jeff Darcy <jeff.us>
    
    Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
    BUG: 1459760
    Signed-off-by: Gaurav Yadav <gyadav>
    Reviewed-on: https://review.gluster.org/17494
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra Talur <rtalur>

Comment 4 Shyamsundar 2018-06-20 18:30:21 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.


Note You need to log in before you can comment on or make changes to this bug.