Bug 1687641 - Brick process has coredumped, when starting glusterd
Summary: Brick process has coredumped, when starting glusterd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rpc
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.5.0
Assignee: Mohit Agrawal
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1687671 1687705 1688218 1696807
TreeView+ depends on / blocked
 
Reported: 2019-03-12 02:50 UTC by SATHEESARAN
Modified: 2019-12-23 16:19 UTC (History)
6 users (show)

Fixed In Version: glusterfs-6.0-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1687671 1687705 1688218 (view as bug list)
Environment:
Last Closed: 2019-10-30 12:20:22 UTC
Embargoed:


Attachments (Terms of Use)
glusterd.log (1.01 MB, application/octet-stream)
2019-03-12 03:07 UTC, SATHEESARAN
no flags Details
brick.log (352.22 KB, application/octet-stream)
2019-03-12 03:08 UTC, SATHEESARAN
no flags Details
sosreport (13.14 MB, application/octet-stream)
2019-03-12 06:04 UTC, SATHEESARAN
no flags Details
full_backtrace (7.40 KB, application/octet-stream)
2019-03-12 06:06 UTC, SATHEESARAN
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:20:46 UTC

Description SATHEESARAN 2019-03-12 02:50:55 UTC
Description of problem:
------------------------
This issue is seen in the RHHI setup. RHHI nodes has atleast 2 networks, one dedicated for VM traffic while the other for gluster traffic. 4 Gluster volume ( replica 3 & arbitrated replicate ) were created and up. When restarting the RHHI-V node post upgrade, interface corresponding to gluster network haven't picked up the IP. This issue is because of there is no BOOTPROTO parameter in the network configuration file.

Because of this, bricks haven't come up, but glusterd came up. When checking the brick status, it was down. Then the network was set up and restarted glusterd.
Bricks corresponding to 3 volumes came up, but one of the brick coredumped

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS-3.4.4 nightly ( glusterfs-3.12.2-45.el7rhgs )

How reproducible:
-----------------
1/1

Steps to Reproduce:
--------------------
1. With 2 network interfaces, set the network corresponding to gluster ( which is used for peer probe & volume creation) down
        Hint: Remote BOOTPROTO for DHCP network or use ONBOOT=no in the interface configuration file
2. Restart the node
3. Post reboot find that the gluster brick process should be down, with glusterd up
4. Fix the gluster network, to be up
5. Restart glusterd

Actual results:
---------------
Brick process coredumped

Expected results:
-----------------
All brick process should be up

Comment 1 SATHEESARAN 2019-03-12 02:51:52 UTC
Core was generated by `/usr/sbin/glusterfsd -s rhsqa-grafton12.lab.eng.blr.redhat.com --volfile-id vms'.
Program terminated with signal 11, Segmentation fault.
#0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65
65        unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex);
Missing separate debuginfos, use: debuginfo-install openssl-libs-1.0.2k-16.el7_6.1.x86_64
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65
#1  0x00007fa1bf2ef85d in server_rpc_notify (rpc=<optimized out>, xl=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at server.c:538
#2  0x00007fa1d4a94685 in rpcsvc_program_notify (listener=0x7fa1b8040bb0, event=event@entry=RPCSVC_EVENT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpcsvc.c:405
#3  0x00007fa1d4a98985 in rpcsvc_accept (new_trans=0x7fa1ac000c40, listen_trans=0x7fa1b803fd90, svc=<optimized out>) at rpcsvc.c:428
#4  rpcsvc_notify (trans=0x7fa1b803fd90, mydata=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at rpcsvc.c:999
#5  0x00007fa1d4a9aae3 in rpc_transport_notify (this=this@entry=0x7fa1b803fd90, event=event@entry=RPC_TRANSPORT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpc-transport.c:557
#6  0x00007fa1c98c1e77 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7fa1b803fd90, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0,
    event_thread_died=0 '\000') at socket.c:2946
#7  0x00007fa1d4d57870 in event_dispatch_epoll_handler (event=0x7fa1beae3e70, event_pool=0x555ec37eec00) at event-epoll.c:643
#8  event_dispatch_epoll_worker (data=0x7fa1b803d790) at event-epoll.c:759
#9  0x00007fa1d3b34dd5 in start_thread (arg=0x7fa1beae4700) at pthread_create.c:307
#10 0x00007fa1d33fbead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment 2 SATHEESARAN 2019-03-12 03:03:10 UTC
I will be attaching the required logs in the further logs
Notice that the time of crash is: 
2019-03-11 19:24:46

Comment 3 SATHEESARAN 2019-03-12 03:07:43 UTC
Created attachment 1543093 [details]
glusterd.log

Comment 4 SATHEESARAN 2019-03-12 03:08:31 UTC
Created attachment 1543094 [details]
brick.log

Comment 5 SATHEESARAN 2019-03-12 03:10:53 UTC
(In reply to SATHEESARAN from comment #2)
> I will be attaching the required logs in the further logs
> Notice that the time of crash is: 
> 2019-03-11 19:24:46

[2019-03-11 19:24:46.737551] I [rpcsvc.c:2582:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2019-03-11 19:24:46.737635] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-vmstore-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-03-11 19:24:46
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7fa1d4cf8b9d]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1d4d03114]
/lib64/libc.so.6(+0x36280)[0x7fa1d3334280]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fa1d3b36c30]
/usr/lib64/glusterfs/3.12.2/xlator/protocol/server.so(+0x985d)[0x7fa1bf2ef85d]
/lib64/libgfrpc.so.0(+0x7685)[0x7fa1d4a94685]
/lib64/libgfrpc.so.0(rpcsvc_notify+0x65)[0x7fa1d4a98985]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa1d4a9aae3]
/usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0xce77)[0x7fa1c98c1e77]
/lib64/libglusterfs.so.0(+0x8a870)[0x7fa1d4d57870]
/lib64/libpthread.so.0(+0x7dd5)[0x7fa1d3b34dd5]
/lib64/libc.so.6(clone+0x6d)[0x7fa1d33fbead]
---------

Comment 6 SATHEESARAN 2019-03-12 06:04:25 UTC
Created attachment 1543110 [details]
sosreport

Comment 7 SATHEESARAN 2019-03-12 06:06:24 UTC
Created attachment 1543111 [details]
full_backtrace

Comment 9 Sunil Kumar Acharya 2019-03-13 08:36:41 UTC
Upstream patch: https://review.gluster.org/#/c/glusterfs/+/22339/

Comment 16 SATHEESARAN 2019-07-03 13:12:52 UTC
Tested with RHVH 4.3.5 based on RHEL 7.7 with glusterfs-6.0-7

Tried for 3 times, the reboot, without BOOTPROTO=dhcp in network config file, 
so that everytime, one of the node in the cluster comes up with no network, but
still glusterd running, with no local bricks up.

When the network config files are updated and network service restarted, 
all processes came up normal with no issues.

Comment 19 errata-xmlrpc 2019-10-30 12:20:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.