1724891 – [RHHI-V] glusterd crashes after upgrade and unable to start it again

Bug 1724891 - [RHHI-V] glusterd crashes after upgrade and unable to start it again

Summary: [RHHI-V] glusterd crashes after upgrade and unable to start it again

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhhi
Sub Component:
Version:	rhhiv-1.7
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHHI-V 1.7
Assignee:	Sahina Bose
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1724885
Blocks:
TreeView+	depends on / blocked

Reported:	2019-06-28 03:21 UTC by SATHEESARAN
Modified:	2020-02-13 15:57 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1724885
Environment:
Last Closed:	2020-02-13 15:57:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:0508	0	None	None	None	2020-02-13 15:57:37 UTC

Description SATHEESARAN 2019-06-28 03:21:00 UTC

Description of problem:
-----------------------
RHHI-V 1.6 async uses glusterfs-3.12.2-47.el7rhgs + RHEL 7.6 + RHVH 4.3.3 async2
When upgrading to RHVH 4.3.5 ( with RHEL 7.7 based RHVH ), glusterd crashed on reboot of the host and denies to start from thereon

Brief update on the upgrade procedure for clarity
1. RHVH node is nothing but the strimmed version of RHEL
2. Upgrade in RHVH happens via image update, and reboot happens after upgrade automatically
3. Latest image doesn't contain glusterfs-6.0-6, so image is first updated and rebooted, then glusterfs packages are updated from glusterfs-3.12.2-47.2 to glusterfs-6.0-6. Note that earlier glusterfs package was glusterfs-3.12.2-47 then upgraded to glusterfs-3.12.2-47.2, then upgraded to glusterfs-6.0-6. No op-version changes happened so far.

Version-Release number of selected component (if applicable):
---------------------------------------------------------------
RHVH 4.3.5 based on RHEL 7.7
glusterfs-6.0-6

How reproducible:
-----------------
4/4

Steps to Reproduce:
-------------------
1. Upgrade all the RHVH 4.3.3 nodes to RHV 4.3.5 based on RHEL 7.7 from RHV Manager UI.
Initial version of gluster here is: glusterfs-3.12.2-47.el7rhgs
Observation: Upgrade successful on all the nodes, reboot successful

2. Upgrade glusterfs packages from glusterfs-3.12.2-47.2 to glusterfs-6.0-6 on one of the node and reboot

Actual results:
----------------
glusterd crashed on the node and never starts up again

Expected results:
-----------------
glusterd should not crash

--- Additional comment from SATHEESARAN on 2019-06-28 02:56:36 UTC ---

Here is the snippet from glusterd.log

<snip>
[2019-06-28 02:55:05.340989] I [MSGID: 106487] [glusterd-handler.c:1498:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2019-06-28 02:55:06.899818] E [MSGID: 101005] [dict.c:2852:dict_serialized_length_lk] 0-dict: value->len (-1162167622) < 0 [Invalid argument]
[2019-06-28 02:55:06.899848] E [MSGID: 106130] [glusterd-handler.c:2633:glusterd_op_commit_send_resp] 0-management: failed to get serialized length of dict
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-06-28 02:55:06
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.0
/lib64/libglusterfs.so.0(+0x27240)[0x7f420fbd4240]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f420fbdec64]
/lib64/libc.so.6(+0x363f0)[0x7f420e2103f0]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f420ea14d00]
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f420fc004cc]
/lib64/libglusterfs.so.0(+0x1b889)[0x7f420fbc8889]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x478f8)[0x7f4203d0f8f8]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x44514)[0x7f4203d0c514]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d19e)[0x7f4203ce519e]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24dce)[0x7f4203cecdce]
/lib64/libglusterfs.so.0(+0x66610)[0x7f420fc13610]
/lib64/libc.so.6(+0x48180)[0x7f420e222180]
</snip>

Comment 5 SATHEESARAN 2019-06-28 03:25:03 UTC

All the relevant logs are available as part of the dependent bugs - BZ 1724885

Comment 6 SATHEESARAN 2019-07-17 07:25:40 UTC

Tested with RHVH 4.3.5 based on RHEL 7.7
1. Upgrade was triggered from RHGS 3.4.4 async ( glusterfs-3.12.2-47.2 ) to RHGS 3.5.0 interim ( glusterfs-6.0-7 )
No crashes observed

Comment 8 Yaniv Kaul 2019-11-25 10:17:08 UTC

Why was it moved to NEW again?

Comment 9 SATHEESARAN 2019-11-27 10:35:32 UTC

(In reply to Yaniv Kaul from comment #8)
> Why was it moved to NEW again?

I just wanted to remove the inflight tracker and accidentally changed the state

Comment 14 errata-xmlrpc 2020-02-13 15:57:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0508

Note You need to log in before you can comment on or make changes to this bug.