1687671 – Brick process has coredumped, when starting glusterd

Bug 1687671 - Brick process has coredumped, when starting glusterd

Summary: Brick process has coredumped, when starting glusterd

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhhi
Sub Component:
Version:	rhhiv-1.6
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	RHHI-V 1.7
Assignee:	Sahina Bose
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1687641 1687705 1688218
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-12 06:04 UTC by SATHEESARAN
Modified:	2020-02-13 15:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1687641
Environment:
Last Closed:	2020-02-13 15:57:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:0508	0	None	None	None	2020-02-13 15:57:34 UTC

Description SATHEESARAN 2019-03-12 06:04:56 UTC

Description of problem:
------------------------
This issue is seen in the RHHI setup. RHHI nodes has atleast 2 networks, one dedicated for VM traffic while the other for gluster traffic. 4 Gluster volume ( replica 3 & arbitrated replicate ) were created and up. When restarting the RHHI-V node post upgrade, interface corresponding to gluster network haven't picked up the IP. This issue is because of there is no BOOTPROTO parameter in the network configuration file.

Because of this, bricks haven't come up, but glusterd came up. When checking the brick status, it was down. Then the network was set up and restarted glusterd.
Bricks corresponding to 3 volumes came up, but one of the brick coredumped

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS-3.4.4 nightly ( glusterfs-3.12.2-45.el7rhgs )

How reproducible:
-----------------
1/1

Steps to Reproduce:
--------------------
1. With 2 network interfaces, set the network corresponding to gluster ( which is used for peer probe & volume creation) down
        Hint: Remote BOOTPROTO for DHCP network or use ONBOOT=no in the interface configuration file
2. Restart the node
3. Post reboot find that the gluster brick process should be down, with glusterd up
4. Fix the gluster network, to be up
5. Restart glusterd

Actual results:
---------------
Brick process coredumped

Expected results:
-----------------
All brick process should be up

Comment 1 SATHEESARAN 2019-03-12 06:05:27 UTC

[2019-03-11 19:24:46.737551] I [rpcsvc.c:2582:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2019-03-11 19:24:46.737635] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-vmstore-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-03-11 19:24:46
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7fa1d4cf8b9d]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1d4d03114]
/lib64/libc.so.6(+0x36280)[0x7fa1d3334280]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fa1d3b36c30]
/usr/lib64/glusterfs/3.12.2/xlator/protocol/server.so(+0x985d)[0x7fa1bf2ef85d]
/lib64/libgfrpc.so.0(+0x7685)[0x7fa1d4a94685]
/lib64/libgfrpc.so.0(rpcsvc_notify+0x65)[0x7fa1d4a98985]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa1d4a9aae3]
/usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0xce77)[0x7fa1c98c1e77]
/lib64/libglusterfs.so.0(+0x8a870)[0x7fa1d4d57870]
/lib64/libpthread.so.0(+0x7dd5)[0x7fa1d3b34dd5]
/lib64/libc.so.6(clone+0x6d)[0x7fa1d33fbead]
---------

Comment 2 SATHEESARAN 2019-03-12 06:05:39 UTC

Core was generated by `/usr/sbin/glusterfsd -s rhsqa-grafton12.lab.eng.blr.redhat.com --volfile-id vms'.
Program terminated with signal 11, Segmentation fault.
#0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65
65        unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex);
Missing separate debuginfos, use: debuginfo-install openssl-libs-1.0.2k-16.el7_6.1.x86_64
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65
#1  0x00007fa1bf2ef85d in server_rpc_notify (rpc=<optimized out>, xl=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at server.c:538
#2  0x00007fa1d4a94685 in rpcsvc_program_notify (listener=0x7fa1b8040bb0, event=event@entry=RPCSVC_EVENT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpcsvc.c:405
#3  0x00007fa1d4a98985 in rpcsvc_accept (new_trans=0x7fa1ac000c40, listen_trans=0x7fa1b803fd90, svc=<optimized out>) at rpcsvc.c:428
#4  rpcsvc_notify (trans=0x7fa1b803fd90, mydata=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at rpcsvc.c:999
#5  0x00007fa1d4a9aae3 in rpc_transport_notify (this=this@entry=0x7fa1b803fd90, event=event@entry=RPC_TRANSPORT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpc-transport.c:557
#6  0x00007fa1c98c1e77 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7fa1b803fd90, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0,
    event_thread_died=0 '\000') at socket.c:2946
#7  0x00007fa1d4d57870 in event_dispatch_epoll_handler (event=0x7fa1beae3e70, event_pool=0x555ec37eec00) at event-epoll.c:643
#8  event_dispatch_epoll_worker (data=0x7fa1b803d790) at event-epoll.c:759
#9  0x00007fa1d3b34dd5 in start_thread (arg=0x7fa1beae4700) at pthread_create.c:307
#10 0x00007fa1d33fbead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment 3 SATHEESARAN 2019-03-20 03:09:02 UTC

This bug is not accepted for RHGS 3.4.4, and removing the acks

Comment 4 Yaniv Kaul 2019-04-22 06:53:15 UTC

Is there a full core dump available for inspection?

Comment 5 SATHEESARAN 2019-04-26 16:42:26 UTC

(In reply to Yaniv Kaul from comment #4)
> Is there a full core dump available for inspection?

All the logs are attached to the dependent gluster bug.

Looks like the gluster bug is already ON_QA

Comment 7 SATHEESARAN 2019-05-02 12:44:25 UTC

(In reply to SATHEESARAN from comment #5)
> (In reply to Yaniv Kaul from comment #4)
> > Is there a full core dump available for inspection?
> 
> All the logs are attached to the dependent gluster bug.
> 
> Looks like the gluster bug is already ON_QA

Though the fix is available in RHGS 3.5 interim build, this RHHI bug is lacking pm_ack, moving the state of this bug to POST.

Once all three acks are in place, it can be moved to ON_QA

Comment 11 SATHEESARAN 2020-01-09 06:52:36 UTC

Tested with RHV 4.3.8 with RHGS 3.5.1 ( glusterfs-6.0-25.el7rhgs )

1. Shutdown all the HC hosts
2. Start the nodes.

After all the HC nodes are up, and after 10 mins, check for glusterd and glusterfsd (brick) processes are up and running.
No issues are seen

Comment 13 errata-xmlrpc 2020-02-13 15:57:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0508

Note You need to log in before you can comment on or make changes to this bug.