This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1459759 - Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
3.11
x86_64 Linux
medium Severity urgent
: ---
: ---
Assigned To: Gaurav Yadav
: Triaged
Depends On: 1447523
Blocks: 1454418
  Show dependency treegraph
 
Reported: 2017-06-08 02:30 EDT by Gaurav Yadav
Modified: 2017-06-28 14:32 EDT (History)
11 users (show)

See Also:
Fixed In Version: glusterfs-3.11.1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1447523
Environment:
Last Closed: 2017-06-28 14:32:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Comments cannot be longer than 65535 characters, hence attaching (331.88 KB, text/plain)
2017-06-08 02:30 EDT, Gaurav Yadav
no flags Details

  None (edit)
Description Gaurav Yadav 2017-06-08 02:30:33 EDT
Created attachment 1285986 [details]
Comments cannot be longer than 65535 characters, hence attaching

+++ This bug was initially created as a clone of Bug #1447523 +++

Description of problem:

Issuing a peer probe results in a glusterd segmentation fault. Once in this state, if the peer is removed from /var/lib/glusterd/peers, glusterd will start.  Probing a peer again leads to the same problem.

Problematic peer entry:
cat /var/lib/glusterd/peers/ip-10-0-50-25.us-west-1.compute.internal 
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=ip-10-0-50-25.us-west-1.compute.internal


Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level=TRACE --log-buf-size=0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
58      ./md-unwind-support.h: No such file or directory.
(gdb) bt
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50, context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
#1  uw_frame_state_for (context=context@entry=0x7ffe5d9a3b50, fs=fs@entry=0x7ffe5d9a3c40) at ../../../src/libgcc/unwind-dw2.c:1253
#2  0x00007f6371b2f6d8 in _Unwind_Backtrace (trace=0x7f6378bc2440 <backtrace_helper>, trace_argument=0x7ffe5d9a3e00) at ../../../src/libgcc/unwind.inc:290
#3  0x00007f6378bc25b6 in __GI___backtrace (array=array@entry=0x7ffe5d9a3e40, size=size@entry=200) at ../sysdeps/x86_64/backtrace.c:109
#4  0x00007f63796f3f42 in _gf_msg_backtrace_nomem (level=level@entry=GF_LOG_ALERT, stacksize=stacksize@entry=200) at logging.c:1094
#5  0x00007f63796fd494 in gf_print_trace (signum=11, ctx=0x7f637a3ac010) at common-utils.c:737
#6  <signal handler called>
#7  0x00000001725cc6c8 in ?? ()
#8  0x0000000000000000 in ?? ()



Version-Release number of selected component (if applicable):

$ glusterd --version 
glusterfs 3.8.11

from package glusterfs-server 3.8.11-ubuntu1~trusty1

How reproducible:

1:1


Steps to Reproduce:
1. Install gluster on Ubuntu 14.04
2. sudo /usr/sbin/gluster --log-level=TRACE peer probe ip-10-0-50-25.us-west-1.compute.internal
Connection failed. Please check if gluster daemon is operational.

Actual results:

Glusterd crashes on peer probe.

Expected results:

Glusterd should not crash on peer probe.


Additional info:

There's another issue which may be related. I noticed that glusterd.info was not self-populating. As a workaround I issue 'gluster pool list' which triggers glusterd to generate and store a UUID:

cat /var/lib/glusterd/glusterd.info 
UUID=ad7b8337-ec4d-4917-ad6b-ca0e4d0eba42
operating-version=30800

This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1293594


Gaurav,

I can grant you access to EC2 instances that are in this state. Is that acceptable? If so, please send me your SSH public key.

Please look at https://bugzilla.redhat.com/attachment.cgi?id=1276539 ? Check out Stacktrace, StacktraceSource, and ThreadStacktrace.

--- Additional comment from Kaushal on 2017-05-15 03:46:47 EDT ---

To make it easier to debug, please install the `glusterfs-dbg` package, which should provide better information in the backtraces. Also, try to start glusterd with debug logs, either directly by running `glusterd -LDEBUG` or by modifying the init script.

Doing the above should help get better logs and stacktraces, which will help you get to the cause faster.

--- Additional comment from Ben Werthmann on 2017-05-15 10:14:04 EDT ---

Kaushal,

'glusterfs-dbg' is already installed and I've already modified the init scripts (upstart job in this case) to use DEBUG level logging.

--- Additional comment from Gaurav Yadav on 2017-05-17 01:32:38 EDT ---

Ben,

Logs attached by you doesn't help much. I am not able to see the proper backtrace.

In order to do RCA I need either reproducer or your host.

Here is my SSH public key

ssh-rsa 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWFZqzFVo7orVZx2ODZyok46VI6EqLg16uP2Z1pkMrEQGu50i3Ye16V5I63UMrHjDwdr4hxtvkW9UfhckQpgBwjsVg9xoyl9tuYt1h9au8G0hH2UL1XYWmbQt82N9VbeYGStg3n0VoefHNZ4LH/VINg0gBWtIK7iTQxWR6XOvs2QqOJnUnM+Fgu5b9kS9vPoDr93BxGLya2ijASkRxsi5dUN4qm7LgFX7Hsyh14G+BBouF5wDZ6frR/UPpqocBVJ5/n4f9OkhwMOShlkWm0m/JDcu6L0phL+Dqm9KxPHBEA/PFW3atjvJW70Iun+j1i72SCcMccQjHSPB6J5QYSeQb gyadav@dhcp35-39.lab.eng.blr.redhat.com

--- Additional comment from Ben Werthmann on 2017-05-17 12:09:50 EDT ---

Gaurav,

I've provided the connection info to you in a direct email.

--- Additional comment from Ben Werthmann on 2017-05-19 20:37:46 EDT ---

Running this command before peer probe reproduces this problem (leading to the backtrace handler problem) in all cases:

sysctl net.ipv4.ip_local_reserved_ports="49152-49156"

or

sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"

The issue appears to be with parsing the contents of '/proc/sys/net/ipv4/ip_local_reserved_ports' here:

https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-utils.c#L3038

This option appears to defer to the kernel for source port selection. Is there a known issue with kernel port selection?

https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

I'm going to build and test with the above configure option.

--- Additional comment from Ben Werthmann on 2017-05-22 11:20:27 EDT ---

(In reply to Ben Werthmann from comment #22)
> Running this command before peer probe reproduces this problem (leading to
> the backtrace handler problem) in all cases:
> 
> sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
> 
> or
> 
> sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"
> 
> The issue appears to be with parsing the contents of
> '/proc/sys/net/ipv4/ip_local_reserved_ports' here:
> 
> https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-
> utils.c#L3038
> 
> This option appears to defer to the kernel for source port selection. Is
> there a known issue with kernel port selection?
> 
> https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

This option is not in 3.8.

> 
> I'm going to build and test with the above configure option.

--- Additional comment from Gaurav Yadav on 2017-05-22 12:34:54 EDT ---

Thanks Ben for providing the additional info, It helped me in finding the root cause of the issue.
While parsing the ports we are not handling MIN MAX range properly hence glusterd is crashing.
Comment 1 Worker Ant 2017-06-09 02:10:34 EDT
REVIEW: https://review.gluster.org/17496 (libglusterfs : Fix crash in glusterd while peer probing) posted (#1) for review on release-3.11 by Gaurav Yadav (gyadav@redhat.com)
Comment 2 Worker Ant 2017-06-13 10:20:36 EDT
COMMIT: https://review.gluster.org/17496 committed in release-3.11 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit 169a64f7066a5d079c60e816a81325094ed8ad74
Author: Gaurav Yadav <gyadav@redhat.com>
Date:   Mon May 22 23:25:47 2017 +0530

    libglusterfs : Fix crash in glusterd while peer probing
    
    glusterd crashes when port is being set explcitly to a
    range which is outside greater than short data type range.
    Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
    In above case glusterd crashes while parsing the port.
    
    With this fix glusterd will be able to handle port range
    between INT_MIN to INT_MAX
    
    > Reviewed-on: https://review.gluster.org/17359
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
    > Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
    > Reviewed-by: Niels de Vos <ndevos@redhat.com>
    > Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
    Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
    BUG: 1459759
    Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
    Reviewed-on: https://review.gluster.org/17496
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Comment 3 Shyamsundar 2017-06-28 14:32:26 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.1, please open a new bug report.

glusterfs-3.11.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-June/000074.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.