Description of problem: ------------------------ This issue is seen in the RHHI setup. RHHI nodes has atleast 2 networks, one dedicated for VM traffic while the other for gluster traffic. 4 Gluster volume ( replica 3 & arbitrated replicate ) were created and up. When restarting the RHHI-V node post upgrade, interface corresponding to gluster network haven't picked up the IP. This issue is because of there is no BOOTPROTO parameter in the network configuration file. Because of this, bricks haven't come up, but glusterd came up. When checking the brick status, it was down. Then the network was set up and restarted glusterd. Bricks corresponding to 3 volumes came up, but one of the brick coredumped Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHGS-3.4.4 nightly ( glusterfs-3.12.2-45.el7rhgs ) How reproducible: ----------------- 1/1 Steps to Reproduce: -------------------- 1. With 2 network interfaces, set the network corresponding to gluster ( which is used for peer probe & volume creation) down Hint: Remote BOOTPROTO for DHCP network or use ONBOOT=no in the interface configuration file 2. Restart the node 3. Post reboot find that the gluster brick process should be down, with glusterd up 4. Fix the gluster network, to be up 5. Restart glusterd Actual results: --------------- Brick process coredumped Expected results: ----------------- All brick process should be up
[2019-03-11 19:24:46.737551] I [rpcsvc.c:2582:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 [2019-03-11 19:24:46.737635] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-vmstore-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction pending frames: frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-03-11 19:24:46 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7fa1d4cf8b9d] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1d4d03114] /lib64/libc.so.6(+0x36280)[0x7fa1d3334280] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fa1d3b36c30] /usr/lib64/glusterfs/3.12.2/xlator/protocol/server.so(+0x985d)[0x7fa1bf2ef85d] /lib64/libgfrpc.so.0(+0x7685)[0x7fa1d4a94685] /lib64/libgfrpc.so.0(rpcsvc_notify+0x65)[0x7fa1d4a98985] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa1d4a9aae3] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0xce77)[0x7fa1c98c1e77] /lib64/libglusterfs.so.0(+0x8a870)[0x7fa1d4d57870] /lib64/libpthread.so.0(+0x7dd5)[0x7fa1d3b34dd5] /lib64/libc.so.6(clone+0x6d)[0x7fa1d33fbead] ---------
Core was generated by `/usr/sbin/glusterfsd -s rhsqa-grafton12.lab.eng.blr.redhat.com --volfile-id vms'. Program terminated with signal 11, Segmentation fault. #0 __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65 65 unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex); Missing separate debuginfos, use: debuginfo-install openssl-libs-1.0.2k-16.el7_6.1.x86_64 (gdb) bt #0 __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65 #1 0x00007fa1bf2ef85d in server_rpc_notify (rpc=<optimized out>, xl=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at server.c:538 #2 0x00007fa1d4a94685 in rpcsvc_program_notify (listener=0x7fa1b8040bb0, event=event@entry=RPCSVC_EVENT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpcsvc.c:405 #3 0x00007fa1d4a98985 in rpcsvc_accept (new_trans=0x7fa1ac000c40, listen_trans=0x7fa1b803fd90, svc=<optimized out>) at rpcsvc.c:428 #4 rpcsvc_notify (trans=0x7fa1b803fd90, mydata=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at rpcsvc.c:999 #5 0x00007fa1d4a9aae3 in rpc_transport_notify (this=this@entry=0x7fa1b803fd90, event=event@entry=RPC_TRANSPORT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpc-transport.c:557 #6 0x00007fa1c98c1e77 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7fa1b803fd90, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2946 #7 0x00007fa1d4d57870 in event_dispatch_epoll_handler (event=0x7fa1beae3e70, event_pool=0x555ec37eec00) at event-epoll.c:643 #8 event_dispatch_epoll_worker (data=0x7fa1b803d790) at event-epoll.c:759 #9 0x00007fa1d3b34dd5 in start_thread (arg=0x7fa1beae4700) at pthread_create.c:307 #10 0x00007fa1d33fbead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
This bug is not accepted for RHGS 3.4.4, and removing the acks
Is there a full core dump available for inspection?
(In reply to Yaniv Kaul from comment #4) > Is there a full core dump available for inspection? All the logs are attached to the dependent gluster bug. Looks like the gluster bug is already ON_QA
(In reply to SATHEESARAN from comment #5) > (In reply to Yaniv Kaul from comment #4) > > Is there a full core dump available for inspection? > > All the logs are attached to the dependent gluster bug. > > Looks like the gluster bug is already ON_QA Though the fix is available in RHGS 3.5 interim build, this RHHI bug is lacking pm_ack, moving the state of this bug to POST. Once all three acks are in place, it can be moved to ON_QA
Tested with RHV 4.3.8 with RHGS 3.5.1 ( glusterfs-6.0-25.el7rhgs ) 1. Shutdown all the HC hosts 2. Start the nodes. After all the HC nodes are up, and after 10 mins, check for glusterd and glusterfsd (brick) processes are up and running. No issues are seen
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0508