Bug 1433276 - glusterd crashes when peering an IP where the address is more than acceptable range (>255) OR with random hostnames
Summary: glusterd crashes when peering an IP where the address is more than acceptable...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: RHGS 3.3.0
Assignee: Atin Mukherjee
QA Contact: Ambarish
URL:
Whiteboard:
Depends On: 1433578 1434399
Blocks: 1417151 1440162 1449076
TreeView+ depends on / blocked
 
Reported: 2017-03-17 09:51 UTC by Nag Pavan Chilakam
Modified: 2017-09-21 04:57 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.8.4-19
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1433578 1440162 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:33:25 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Nag Pavan Chilakam 2017-03-17 09:51:56 UTC
Description of problem:
==============
when we try to peer probe a node where the IP addr has the range more than 255, the glusterd is crashing consistently(alteast 95% times, checked this on 5 different setups)
Issue a gluster peer probe 10.70.35.1221 ===> note that the last part is a 4 digit
glusterd crashes

This is consistent and can easily happen if the admin makes a typo mistake, which is quite possible


Check on 3.1.3 (3.7.9-10), i couldn't reproduce.
on 3.8.4-18, mention anything above 255 it crashes


Core details:
[root@dhcp35-138 ~]# file /core.30402 
/core.30402: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterd', platform: 'x86_64'
[root@dhcp35-138 ~]# gdb /usr/sbin/glusterd /core.30402
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 29703]
[New LWP 30405]
[New LWP 30403]
[New LWP 30404]
[New LWP 30406]
[New LWP 30402]
[New LWP 30607]
[New LWP 30608]
[New LWP 29704]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
314	        GF_ASSERT (GF_MEM_TRAILER_MAGIC ==
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 device-mapper-event-libs-1.02.135-1.el7_3.3.x86_64 device-mapper-libs-1.02.135-1.el7_3.3.x86_64 elfutils-libelf-0.166-2.el7.x86_64 elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-33.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libsepol-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lvm2-libs-2.02.166-1.el7_3.3.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-30.el7_3.7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 
#0  0x00007fd5da47aea5 in __gf_free (free_ptr=0x7fd5b5620040) at mem-pool.c:314
#1  0x00007fd5da21c9e7 in saved_frames_destroy (frames=<optimized out>) at rpc-clnt.c:388
#2  0x00007fd5da21e140 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7fd5b53a4390) at rpc-clnt.c:557
#3  0x00007fd5da21ec00 in rpc_clnt_handle_disconnect (conn=0x7fd5b53a4390, clnt=0x7fd5b53a4360) at rpc-clnt.c:900
#4  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd5b53a4390, event=<optimized out>, data=0x7fd5b5610f30)
    at rpc-clnt.c:953
#5  0x00007fd5da21a9f3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-transport.c:538
#6  0x00007fd5cc032b2d in socket_connect_error_cbk (opaque=0x7fd5b55b2070) at socket.c:2927
#7  0x00007fd5d92b5dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fd5d8bfa73d in clone () from /lib64/libc.so.6
(gdb) 









Version-Release number of selected component (if applicable):
===
3.8.4-18

How reproducible:
====
always(or say 95% times)

Steps to Reproduce:
1.setup a gluster node
2.issue a peer probe to say 10.70.35.x (where x is >255)
3.glusterd crashes

Comment 2 Ambarish 2017-03-17 09:59:36 UTC
I hit this on my setup as well just now .

[root@localhost bricks]# gluster peer probe 10.70.37.12345
peer probe: failed: Probe returned with Transport endpoint is not connected
[root@localhost bricks]# 


The weird thing is I see this file getting created with the wrong/random hostname :

[root@localhost peers]# ll -h /var/lib/glusterd/peers/
total 12K
-rw-------. 1 root root 73 Mar 17 05:52 02ef4e27-a38e-4e1e-8b75-a0657c2eae6b
-rw-------. 1 root root 75 Mar 17 05:52 10.70.37.12345     -----> BAD
-rw-------. 1 root root 94 Mar 17 05:52 f6384f3a-ab69-4757-8fc8-eda43bd17c2e
[root@localhost peers]# 


[root@localhost peers]# cat 10.70.37.12345 
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=10.70.37.12345
[root@localhost peers]# 


Peer Status fails on the crashed node as well :

[root@localhost peers]# gluster peer status
peer status: failed
[root@localhost peers]# 



Though it works fine on other nodes :

[root@localhost /]# gluster peer status
Number of Peers: 2

Hostname: 10.70.37.65
Uuid: 32095651-cbda-40e8-941c-6b75c260610e
State: Peer in Cluster (Connected)

Hostname: 10.70.37.116
Uuid: 02ef4e27-a38e-4e1e-8b75-a0657c2eae6b
State: Peer in Cluster (Connected)
[root@localhost /]#

Comment 3 Ambarish 2017-03-17 10:03:30 UTC
The issue is reproducible if I give peer probe "abcd" as well.

Samikshan shared a similar upstream BZ - https://bugzilla.redhat.com/show_bug.cgi?id=770048 ,which got later closed as WFM as noone could reproduce it.

But it's very very consistent now.

Comment 10 Atin Mukherjee 2017-03-24 10:58:43 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/101366

Comment 15 errata-xmlrpc 2017-09-21 04:33:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 16 errata-xmlrpc 2017-09-21 04:57:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.