Bug 1724891

Summary: [RHHI-V] glusterd crashes after upgrade and unable to start it again
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: rhhiAssignee: Sahina Bose <sabose>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhhiv-1.7CC: amukherj, godas, rhs-bugs, seamurph, storage-qa-internal, vbellur
Target Milestone: ---Keywords: Regression
Target Release: RHHI-V 1.7   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1724885 Environment:
Last Closed: 2020-02-13 15:57:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1724885    
Bug Blocks:    

Description SATHEESARAN 2019-06-28 03:21:00 UTC
Description of problem:
-----------------------
RHHI-V 1.6 async uses glusterfs-3.12.2-47.el7rhgs + RHEL 7.6 + RHVH 4.3.3 async2
When upgrading to RHVH 4.3.5 ( with RHEL 7.7 based RHVH ), glusterd crashed on reboot of the host and denies to start from thereon

Brief update on the upgrade procedure for clarity
1. RHVH node is nothing but the strimmed version of RHEL
2. Upgrade in RHVH happens via image update, and reboot happens after upgrade automatically
3. Latest image doesn't contain glusterfs-6.0-6, so image is first updated and rebooted, then glusterfs packages are updated from glusterfs-3.12.2-47.2 to glusterfs-6.0-6. Note that earlier glusterfs package was glusterfs-3.12.2-47 then upgraded to glusterfs-3.12.2-47.2, then upgraded to glusterfs-6.0-6. No op-version changes happened so far.

Version-Release number of selected component (if applicable):
---------------------------------------------------------------
RHVH 4.3.5 based on RHEL 7.7
glusterfs-6.0-6

How reproducible:
-----------------
4/4

Steps to Reproduce:
-------------------
1. Upgrade all the RHVH 4.3.3 nodes to RHV 4.3.5 based on RHEL 7.7 from RHV Manager UI.
Initial version of gluster here is: glusterfs-3.12.2-47.el7rhgs
Observation: Upgrade successful on all the nodes, reboot successful

2. Upgrade glusterfs packages from glusterfs-3.12.2-47.2 to glusterfs-6.0-6 on one of the node and reboot

Actual results:
----------------
glusterd crashed on the node and never starts up again

Expected results:
-----------------
glusterd should not crash

--- Additional comment from SATHEESARAN on 2019-06-28 02:56:36 UTC ---

Here is the snippet from glusterd.log

<snip>
[2019-06-28 02:55:05.340989] I [MSGID: 106487] [glusterd-handler.c:1498:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2019-06-28 02:55:06.899818] E [MSGID: 101005] [dict.c:2852:dict_serialized_length_lk] 0-dict: value->len (-1162167622) < 0 [Invalid argument]
[2019-06-28 02:55:06.899848] E [MSGID: 106130] [glusterd-handler.c:2633:glusterd_op_commit_send_resp] 0-management: failed to get serialized length of dict
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-06-28 02:55:06
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.0
/lib64/libglusterfs.so.0(+0x27240)[0x7f420fbd4240]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f420fbdec64]
/lib64/libc.so.6(+0x363f0)[0x7f420e2103f0]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f420ea14d00]
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f420fc004cc]
/lib64/libglusterfs.so.0(+0x1b889)[0x7f420fbc8889]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x478f8)[0x7f4203d0f8f8]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x44514)[0x7f4203d0c514]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d19e)[0x7f4203ce519e]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24dce)[0x7f4203cecdce]
/lib64/libglusterfs.so.0(+0x66610)[0x7f420fc13610]
/lib64/libc.so.6(+0x48180)[0x7f420e222180]
</snip>

Comment 5 SATHEESARAN 2019-06-28 03:25:03 UTC
All the relevant logs are available as part of the dependent bugs - BZ 1724885

Comment 6 SATHEESARAN 2019-07-17 07:25:40 UTC
Tested with RHVH 4.3.5 based on RHEL 7.7
1. Upgrade was triggered from RHGS 3.4.4 async ( glusterfs-3.12.2-47.2 ) to RHGS 3.5.0 interim ( glusterfs-6.0-7 )
No crashes observed

Comment 8 Yaniv Kaul 2019-11-25 10:17:08 UTC
Why was it moved to NEW again?

Comment 9 SATHEESARAN 2019-11-27 10:35:32 UTC
(In reply to Yaniv Kaul from comment #8)
> Why was it moved to NEW again?

I just wanted to remove the inflight tracker and accidentally changed the state

Comment 14 errata-xmlrpc 2020-02-13 15:57:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0508