Bug 1724885 - [RHHI-V] glusterd crashes after upgrade and unable to start it again
Summary: [RHHI-V] glusterd crashes after upgrade and unable to start it again
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.5
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.5.0
Assignee: Sanju
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1696809 1724891
TreeView+ depends on / blocked
 
Reported: 2019-06-28 02:55 UTC by SATHEESARAN
Modified: 2019-10-30 12:22 UTC (History)
6 users (show)

Fixed In Version: glusterfs-6.0-7
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1724891 (view as bug list)
Environment:
Last Closed: 2019-10-30 12:22:00 UTC
Embargoed:


Attachments (Terms of Use)
glusterd.log (957.11 KB, application/octet-stream)
2019-06-28 03:03 UTC, SATHEESARAN
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:22:22 UTC

Description SATHEESARAN 2019-06-28 02:55:42 UTC
Description of problem:
-----------------------
RHHI-V 1.6 async uses glusterfs-3.12.2-47.el7rhgs + RHEL 7.6 + RHVH 4.3.3 async2
When upgrading to RHVH 4.3.5 ( with RHEL 7.7 based RHVH ), glusterd crashed on reboot of the host and denies to start from thereon

Brief update on the upgrade procedure for clarity
1. RHVH node is nothing but the strimmed version of RHEL
2. Upgrade in RHVH happens via image update, and reboot happens after upgrade automatically
3. Latest image doesn't contain glusterfs-6.0-6, so image is first updated and rebooted, then glusterfs packages are updated from glusterfs-3.12.2-47.2 to glusterfs-6.0-6. Note that earlier glusterfs package was glusterfs-3.12.2-47 then upgraded to glusterfs-3.12.2-47.2, then upgraded to glusterfs-6.0-6. No op-version changes happened so far.

Version-Release number of selected component (if applicable):
---------------------------------------------------------------
RHVH 4.3.5 based on RHEL 7.7
glusterfs-6.0-6

How reproducible:
-----------------
4/4

Steps to Reproduce:
-------------------
1. Upgrade all the RHVH 4.3.3 nodes to RHV 4.3.5 based on RHEL 7.7 from RHV Manager UI.
Initial version of gluster here is: glusterfs-3.12.2-47.el7rhgs
Observation: Upgrade successful on all the nodes, reboot successful

2. Upgrade glusterfs packages from glusterfs-3.12.2-47.2 to glusterfs-6.0-6 on one of the node and reboot

Actual results:
----------------
glusterd crashed on the node and never starts up again

Expected results:
-----------------
glusterd should not crash

Comment 1 SATHEESARAN 2019-06-28 02:56:36 UTC
Here is the snippet from glusterd.log

<snip>
[2019-06-28 02:55:05.340989] I [MSGID: 106487] [glusterd-handler.c:1498:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2019-06-28 02:55:06.899818] E [MSGID: 101005] [dict.c:2852:dict_serialized_length_lk] 0-dict: value->len (-1162167622) < 0 [Invalid argument]
[2019-06-28 02:55:06.899848] E [MSGID: 106130] [glusterd-handler.c:2633:glusterd_op_commit_send_resp] 0-management: failed to get serialized length of dict
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-06-28 02:55:06
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.0
/lib64/libglusterfs.so.0(+0x27240)[0x7f420fbd4240]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f420fbdec64]
/lib64/libc.so.6(+0x363f0)[0x7f420e2103f0]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f420ea14d00]
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f420fc004cc]
/lib64/libglusterfs.so.0(+0x1b889)[0x7f420fbc8889]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x478f8)[0x7f4203d0f8f8]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x44514)[0x7f4203d0c514]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d19e)[0x7f4203ce519e]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24dce)[0x7f4203cecdce]
/lib64/libglusterfs.so.0(+0x66610)[0x7f420fc13610]
/lib64/libc.so.6(+0x48180)[0x7f420e222180]
</snip>

Comment 3 SATHEESARAN 2019-06-28 03:03:45 UTC
Created attachment 1585392 [details]
glusterd.log

Comment 6 SATHEESARAN 2019-06-28 03:18:17 UTC
Backtrace:

[root@rhsqa-grafton10 ccpp-2019-06-28-07:19:50-20703]# gdb /usr/sbin/glusterd coredump 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 20703]
[New LWP 20743]
[New LWP 20700]
[New LWP 20704]
[New LWP 20744]
[New LWP 20699]
[New LWP 20705]
[New LWP 20701]
[New LWP 20702]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  __GI___pthread_mutex_lock (mutex=0x559b4f2ebed8) at ../nptl/pthread_mutex_lock.c:65
65	  unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex);
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=0x559b4f2ebed8) at ../nptl/pthread_mutex_lock.c:65
#1  0x00007f9fb786c4cc in __gf_free (free_ptr=0x7f9f98035300) at mem-pool.c:328
#2  0x00007f9fb7834889 in dict_destroy (this=<optimized out>) at dict.c:687
#3  0x00007f9fb78349a5 in dict_unref (this=<optimized out>) at dict.c:739
#4  0x00007f9fab97b8f8 in glusterd_op_ac_commit_op (event=<optimized out>, ctx=<optimized out>) at glusterd-op-sm.c:5907
#5  0x00007f9fab978514 in glusterd_op_sm () at glusterd-op-sm.c:8210
#6  0x00007f9fab95119e in __glusterd_handle_commit_op (req=req@entry=0x7f9f9c00d748) at glusterd-handler.c:1176
#7  0x00007f9fab958dce in glusterd_big_locked_handler (req=0x7f9f9c00d748, actor_fn=0x7f9fab951010 <__glusterd_handle_commit_op>) at glusterd-handler.c:83
#8  0x00007f9fb787f610 in synctask_wrap () at syncop.c:367
#9  0x00007f9fb5e8e180 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) t a a bt

Thread 9 (Thread 0x7f9fade19700 (LWP 20702)):
#0  0x00007f9fb5f0b84d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f9fb5f0b6e4 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f9fb786c6ad in pool_sweeper (arg=<optimized out>) at mem-pool.c:454
#3  0x00007f9fb667eea5 in start_thread (arg=0x7f9fade19700) at pthread_create.c:307
#4  0x00007f9fb5f448cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 8 (Thread 0x7f9fae61a700 (LWP 20701)):
#0  0x00007f9fb66863c1 in do_sigwait (sig=0x7f9fae619e1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:60
#1  __sigwait (set=set@entry=0x7f9fae619e20, sig=sig@entry=0x7f9fae619e1c) at ../sysdeps/unix/sysv/linux/sigwait.c:95
#2  0x0000556bcccd243b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2370
#3  0x00007f9fb667eea5 in start_thread (arg=0x7f9fae61a700) at pthread_create.c:307
#4  0x00007f9fb5f448cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 7 (Thread 0x7f9fac616700 (LWP 20705)):
#0  0x00007f9fb5f3b993 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f9fb78c0994 in runner (arg=0x556bce259710) at ../../contrib/timer-wheel/timer-wheel.c:186
#2  0x00007f9fb667eea5 in start_thread (arg=0x7f9fac616700) at pthread_create.c:307
#3  0x00007f9fb5f448cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 6 (Thread 0x7f9fb7d3e780 (LWP 20699)):
#0  0x00007f9fb6680017 in pthread_join (threadid=140323627988736, thread_return=thread_return@entry=0x0) at pthread_join.c:90
#1  0x00007f9fb78a5608 in event_dispatch_epoll (event_pool=0x556bce24d7e0) at event-epoll.c:846
#2  0x0000556bcccce9b5 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2866

Thread 5 (Thread 0x7f9fa3fff700 (LWP 20744)):
#0  0x00007f9fb5f44ea3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f9fb78a61f0 in event_dispatch_epoll_worker (data=0x556bce2c8b80) at event-epoll.c:751
#2  0x00007f9fb667eea5 in start_thread (arg=0x7f9fa3fff700) at pthread_create.c:307
#3  0x00007f9fb5f448cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x7f9face17700 (LWP 20704)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f9fb78819a0 in syncenv_task (proc=proc@entry=0x556bce255970) at syncop.c:612
#2  0x00007f9fb7882850 in syncenv_processor (thdata=0x556bce255970) at syncop.c:679
#3  0x00007f9fb667eea5 in start_thread (arg=0x7f9face17700) at pthread_create.c:307
#4  0x00007f9fb5f448cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7f9faee1b700 (LWP 20700)):
#0  0x00007f9fb6685e9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f9fb784f426 in gf_timer_proc (data=0x556bce254e20) at timer.c:194
#2  0x00007f9fb667eea5 in start_thread (arg=0x7f9faee1b700) at pthread_create.c:307
#3  0x00007f9fb5f448cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7f9fa8c34700 (LWP 20743)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f9faba226ab in hooks_worker (args=<optimized out>) at glusterd-hooks.c:527
#2  0x00007f9fb667eea5 in start_thread (arg=0x7f9fa8c34700) at pthread_create.c:307
#3  0x00007f9fb5f448cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7f9fad618700 (LWP 20703)):
---Type <return> to continue, or q <return> to quit---
#0  __GI___pthread_mutex_lock (mutex=0x559b4f2ebed8) at ../nptl/pthread_mutex_lock.c:65
#1  0x00007f9fb786c4cc in __gf_free (free_ptr=0x7f9f98035300) at mem-pool.c:328
#2  0x00007f9fb7834889 in dict_destroy (this=<optimized out>) at dict.c:687
#3  0x00007f9fb78349a5 in dict_unref (this=<optimized out>) at dict.c:739
#4  0x00007f9fab97b8f8 in glusterd_op_ac_commit_op (event=<optimized out>, ctx=<optimized out>) at glusterd-op-sm.c:5907
#5  0x00007f9fab978514 in glusterd_op_sm () at glusterd-op-sm.c:8210
#6  0x00007f9fab95119e in __glusterd_handle_commit_op (req=req@entry=0x7f9f9c00d748) at glusterd-handler.c:1176
#7  0x00007f9fab958dce in glusterd_big_locked_handler (req=0x7f9f9c00d748, actor_fn=0x7f9fab951010 <__glusterd_handle_commit_op>) at glusterd-handler.c:83
#8  0x00007f9fb787f610 in synctask_wrap () at syncop.c:367
#9  0x00007f9fb5e8e180 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()

Comment 11 SATHEESARAN 2019-06-28 10:59:25 UTC
Hi Atin,

Repeated the same testing with glusterfs-6.0-7 and the issue got resolved.
You can move this bug to ON_QA

Comment 14 SATHEESARAN 2019-07-03 11:59:06 UTC
Tested with RHVH 4.3.5 based on RHEL 7.7
1. Upgrade was triggered from RHGS 3.4.4 async ( glusterfs-3.12.2-47.2 ) to RHGS 3.5.0 interim ( glusterfs-6.0-7 )
No crashes observed

Comment 16 errata-xmlrpc 2019-10-30 12:22:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.