Bug 1722131

Summary:	[In-service] Post upgrade glusterd is crashing with a backtrace on the upgraded node while issuing gluster volume status from non-upgraded nodes
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Bala Konda Reddy M <bmekala>
Component:	glusterd	Assignee:	Sanju <srakonde>
Status:	CLOSED ERRATA	QA Contact:	Bala Konda Reddy M <bmekala>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.5	CC:	amukherj, pasik, rhs-bugs, srakonde, storage-qa-internal, vbellur, vdas
Target Milestone:	---
Target Release:	RHGS 3.5.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-6.0-7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1723658 (view as bug list)		Environment:
Last Closed:	2019-10-30 12:22:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1723658, 1728126, 1728127
Bug Blocks:	1696809, 1707246

Description Bala Konda Reddy M 2019-06-19 14:11:57 UTC

Description of problem:
During In-service upgrade, glusterd on upgraded node crashed with a backtrace, when 'gluster vol status' command is issued from non-upgraded nodes.
Upgrade scenario is from 3.4.4 live(glusterfs-3.12.2-47.2.el7rhgs.x86_64) to 3.5.0(glusterfs-6.0-6.el7rhgs.x86_64)latest build


Version-Release number of selected component (if applicable):
glusterfs-6.0-6.el7rhgs.x86_64

How reproducible:
3/3

Steps to Reproduce:
1. On three nodes cluster(N1, N2, N3), Create 2020 volumes of replicate(1X3) and started them (Brick-mux enabled)
2. Mounted 3 volumes and running continuous IO from 3 different clients.
3. Upgraded node N1.
4. While heal is going on node N1, Ran 'gluster volume status' on node N2 which is yet to upgrade.
 

Actual results:
glusterd crashed with a backtrace no backtrace seen. 

[2019-06-19 11:13:56.506826] I [MSGID: 106499] [glusterd-handler.c:4497:__glusterd_handle_status_volume] 0-management: Received status volume req for volume testvol_-997
[2019-06-19 11:13:56.512662] I [MSGID: 106499] [glusterd-handler.c:4497:__glusterd_handle_status_volume] 0-management: Received status volume req for volume testvol_-998
[2019-06-19 11:13:56.518409] I [MSGID: 106499] [glusterd-handler.c:4497:__glusterd_handle_status_volume] 0-management: Received status volume req for volume testvol_-999
[2019-06-19 11:14:37.732442] E [MSGID: 101005] [dict.c:2852:dict_serialized_length_lk] 0-dict: value->len (-1162167622) < 0 [Invalid argument]
[2019-06-19 11:14:37.732483] E [MSGID: 106130] [glusterd-handler.c:2633:glusterd_op_commit_send_resp] 0-management: failed to get serialized length of dict
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-06-19 11:14:37
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.0
/lib64/libglusterfs.so.0(+0x27240)[0x7f7b5c38a240]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f7b5c394c64]
/lib64/libc.so.6(+0x363f0)[0x7f7b5a9c63f0]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f7b5b1cad00]
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f7b5c3b64cc]
/lib64/libglusterfs.so.0(+0x1b889)[0x7f7b5c37e889]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x478f8)[0x7f7b504c58f8]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x44514)[0x7f7b504c2514]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d19e)[0x7f7b5049b19e]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24dce)[0x7f7b504a2dce]
/lib64/libglusterfs.so.0(+0x66610)[0x7f7b5c3c9610]
/lib64/libc.so.6(+0x48180)[0x7f7b5a9d8180]
---------

Expected results:
glusterd should not crash

Additional info:
Will attach the sosreports of the upgraded node where crash is seen.

Comment 18 errata-xmlrpc 2019-10-30 12:22:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249