Description of problem: ----------------------- RHHI-V 1.6 async uses glusterfs-3.12.2-47.el7rhgs + RHEL 7.6 + RHVH 4.3.3 async2 When upgrading to RHVH 4.3.5 ( with RHEL 7.7 based RHVH ), glusterd crashed on reboot of the host and denies to start from thereon Brief update on the upgrade procedure for clarity 1. RHVH node is nothing but the strimmed version of RHEL 2. Upgrade in RHVH happens via image update, and reboot happens after upgrade automatically 3. Latest image doesn't contain glusterfs-6.0-6, so image is first updated and rebooted, then glusterfs packages are updated from glusterfs-3.12.2-47.2 to glusterfs-6.0-6. Note that earlier glusterfs package was glusterfs-3.12.2-47 then upgraded to glusterfs-3.12.2-47.2, then upgraded to glusterfs-6.0-6. No op-version changes happened so far. Version-Release number of selected component (if applicable): --------------------------------------------------------------- RHVH 4.3.5 based on RHEL 7.7 glusterfs-6.0-6 How reproducible: ----------------- 4/4 Steps to Reproduce: ------------------- 1. Upgrade all the RHVH 4.3.3 nodes to RHV 4.3.5 based on RHEL 7.7 from RHV Manager UI. Initial version of gluster here is: glusterfs-3.12.2-47.el7rhgs Observation: Upgrade successful on all the nodes, reboot successful 2. Upgrade glusterfs packages from glusterfs-3.12.2-47.2 to glusterfs-6.0-6 on one of the node and reboot Actual results: ---------------- glusterd crashed on the node and never starts up again Expected results: ----------------- glusterd should not crash --- Additional comment from SATHEESARAN on 2019-06-28 02:56:36 UTC --- Here is the snippet from glusterd.log <snip> [2019-06-28 02:55:05.340989] I [MSGID: 106487] [glusterd-handler.c:1498:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2019-06-28 02:55:06.899818] E [MSGID: 101005] [dict.c:2852:dict_serialized_length_lk] 0-dict: value->len (-1162167622) < 0 [Invalid argument] [2019-06-28 02:55:06.899848] E [MSGID: 106130] [glusterd-handler.c:2633:glusterd_op_commit_send_resp] 0-management: failed to get serialized length of dict pending frames: frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-06-28 02:55:06 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.0 /lib64/libglusterfs.so.0(+0x27240)[0x7f420fbd4240] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f420fbdec64] /lib64/libc.so.6(+0x363f0)[0x7f420e2103f0] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f420ea14d00] /lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f420fc004cc] /lib64/libglusterfs.so.0(+0x1b889)[0x7f420fbc8889] /usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x478f8)[0x7f4203d0f8f8] /usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x44514)[0x7f4203d0c514] /usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d19e)[0x7f4203ce519e] /usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24dce)[0x7f4203cecdce] /lib64/libglusterfs.so.0(+0x66610)[0x7f420fc13610] /lib64/libc.so.6(+0x48180)[0x7f420e222180] </snip>
All the relevant logs are available as part of the dependent bugs - BZ 1724885
Tested with RHVH 4.3.5 based on RHEL 7.7 1. Upgrade was triggered from RHGS 3.4.4 async ( glusterfs-3.12.2-47.2 ) to RHGS 3.5.0 interim ( glusterfs-6.0-7 ) No crashes observed
Why was it moved to NEW again?
(In reply to Yaniv Kaul from comment #8) > Why was it moved to NEW again? I just wanted to remove the inflight tracker and accidentally changed the state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0508