Description of problem: ------------------------ This issue is seen in the RHHI setup. RHHI nodes has atleast 2 networks, one dedicated for VM traffic while the other for gluster traffic. 4 Gluster volume ( replica 3 & arbitrated replicate ) were created and up. When restarting the RHHI-V node post upgrade, interface corresponding to gluster network haven't picked up the IP. This issue is because of there is no BOOTPROTO parameter in the network configuration file. Because of this, bricks haven't come up, but glusterd came up. When checking the brick status, it was down. Then the network was set up and restarted glusterd. Bricks corresponding to 3 volumes came up, but one of the brick coredumped Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHGS-3.4.4 nightly ( glusterfs-3.12.2-45.el7rhgs ) How reproducible: ----------------- 1/1 Steps to Reproduce: -------------------- 1. With 2 network interfaces, set the network corresponding to gluster ( which is used for peer probe & volume creation) down Hint: Remote BOOTPROTO for DHCP network or use ONBOOT=no in the interface configuration file 2. Restart the node 3. Post reboot find that the gluster brick process should be down, with glusterd up 4. Fix the gluster network, to be up 5. Restart glusterd Actual results: --------------- Brick process coredumped Expected results: ----------------- All brick process should be up
Core was generated by `/usr/sbin/glusterfsd -s rhsqa-grafton12.lab.eng.blr.redhat.com --volfile-id vms'. Program terminated with signal 11, Segmentation fault. #0 __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65 65 unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex); Missing separate debuginfos, use: debuginfo-install openssl-libs-1.0.2k-16.el7_6.1.x86_64 (gdb) bt #0 __GI___pthread_mutex_lock (mutex=mutex@entry=0x40) at ../nptl/pthread_mutex_lock.c:65 #1 0x00007fa1bf2ef85d in server_rpc_notify (rpc=<optimized out>, xl=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at server.c:538 #2 0x00007fa1d4a94685 in rpcsvc_program_notify (listener=0x7fa1b8040bb0, event=event@entry=RPCSVC_EVENT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpcsvc.c:405 #3 0x00007fa1d4a98985 in rpcsvc_accept (new_trans=0x7fa1ac000c40, listen_trans=0x7fa1b803fd90, svc=<optimized out>) at rpcsvc.c:428 #4 rpcsvc_notify (trans=0x7fa1b803fd90, mydata=<optimized out>, event=<optimized out>, data=0x7fa1ac000c40) at rpcsvc.c:999 #5 0x00007fa1d4a9aae3 in rpc_transport_notify (this=this@entry=0x7fa1b803fd90, event=event@entry=RPC_TRANSPORT_ACCEPT, data=data@entry=0x7fa1ac000c40) at rpc-transport.c:557 #6 0x00007fa1c98c1e77 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7fa1b803fd90, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2946 #7 0x00007fa1d4d57870 in event_dispatch_epoll_handler (event=0x7fa1beae3e70, event_pool=0x555ec37eec00) at event-epoll.c:643 #8 event_dispatch_epoll_worker (data=0x7fa1b803d790) at event-epoll.c:759 #9 0x00007fa1d3b34dd5 in start_thread (arg=0x7fa1beae4700) at pthread_create.c:307 #10 0x00007fa1d33fbead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
I will be attaching the required logs in the further logs Notice that the time of crash is: 2019-03-11 19:24:46
Created attachment 1543093 [details] glusterd.log
Created attachment 1543094 [details] brick.log
(In reply to SATHEESARAN from comment #2) > I will be attaching the required logs in the further logs > Notice that the time of crash is: > 2019-03-11 19:24:46 [2019-03-11 19:24:46.737551] I [rpcsvc.c:2582:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 [2019-03-11 19:24:46.737635] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-vmstore-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction pending frames: frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-03-11 19:24:46 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7fa1d4cf8b9d] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1d4d03114] /lib64/libc.so.6(+0x36280)[0x7fa1d3334280] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fa1d3b36c30] /usr/lib64/glusterfs/3.12.2/xlator/protocol/server.so(+0x985d)[0x7fa1bf2ef85d] /lib64/libgfrpc.so.0(+0x7685)[0x7fa1d4a94685] /lib64/libgfrpc.so.0(rpcsvc_notify+0x65)[0x7fa1d4a98985] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa1d4a9aae3] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0xce77)[0x7fa1c98c1e77] /lib64/libglusterfs.so.0(+0x8a870)[0x7fa1d4d57870] /lib64/libpthread.so.0(+0x7dd5)[0x7fa1d3b34dd5] /lib64/libc.so.6(clone+0x6d)[0x7fa1d33fbead] ---------
Created attachment 1543110 [details] sosreport
Created attachment 1543111 [details] full_backtrace
Upstream patch: https://review.gluster.org/#/c/glusterfs/+/22339/
Tested with RHVH 4.3.5 based on RHEL 7.7 with glusterfs-6.0-7 Tried for 3 times, the reboot, without BOOTPROTO=dhcp in network config file, so that everytime, one of the node in the cluster comes up with no network, but still glusterd running, with no local bricks up. When the network config files are updated and network service restarted, all processes came up normal with no issues.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249