Description of problem: ======================== on my non functional setup, one of the server node got rebooted(unable to find the cause), and post that all bricks were online except one brick. I checked the brick logs and found that a backtrace and hence not online. Unfortunately, I didn't find any cores [2019-03-17 23:55:53.369644] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-03-17 23:56:06.340888] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2019-03-17 23:56:06.341009] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 [2019-03-17 23:56:06.341053] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 [2019-03-17 23:56:06.341168] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 6 [2019-03-17 23:56:06.341197] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 5 [2019-03-17 23:56:06.341268] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 7 [2019-03-17 23:56:06.341747] I [rpcsvc.c:2582:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 [2019-03-17 23:56:06.341865] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-rpcx3-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction pending frames: frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 [2019-03-17 23:56:06.343189] W [socket.c:3973:reconfigure] 0-rpcx3-quota: disabling non-blocking IO time of crash: 2019-03-17 23:56:06 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 [2019-03-17 23:56:06.343330] I [socket.c:2489:socket_event_handler] 0-transport: EPOLLERR - disconnecting now /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7fccf0cf9b9d] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fccf0d04114] /lib64/libc.so.6(+0x36280)[0x7fccef336280] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fccefb37c30] /usr/lib64/glusterfs/3.12.2/xlator/protocol/server.so(+0x985d)[0x7fccdb57885d] /lib64/libgfrpc.so.0(+0x7685)[0x7fccf0a95685] /lib64/libgfrpc.so.0(rpcsvc_notify+0x65)[0x7fccf0a99985] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccf0a9bae3] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0xce77)[0x7fcce58c3e77] /lib64/libglusterfs.so.0(+0x8a870)[0x7fccf0d58870] /lib64/libpthread.so.0(+0x7dd5)[0x7fccefb35dd5] /lib64/libc.so.6(clone+0x6d)[0x7fccef3fdead] --------- Version-Release number of selected component (if applicable): ==================== 3.12.2-43 How reproducible: =============== hit it once on my system setup for rpc tests Steps to Reproduce: ==================== more details @ https://docs.google.com/spreadsheets/d/17Yf9ZRWnWOpbRyFQ2ZYxAAlp9I_yarzKZdjN8idBJM0/edit#gid=1472913705 1. was running system tests for about 3 weeks 2. In current state , a rebalance is still going on for about last >2+ weeks (refer bz#1686425 ) 3. apart from the above, I set client and server event threads to 8 as part of https://bugzilla.redhat.com/show_bug.cgi?id=1409568#c31 4. IOs going on from clients are as below: a) 4 clients: just appending to a file whose name as same as host name(all different) b) another client: only on this client, I remounted the volume after setting event threads. From this client running IOs as explained in https://bugzilla.redhat.com/show_bug.cgi?id=1409568#c31 and previous comments c) from another 2 clients: reunning below IOs 2109.lookup (Detached) --->find *|xargs stat from root of mount 1074.top (Detached)--->top and free o/p every minute captured to a file on mount, in append mode 1058.rm-rf (Detached) -->removal of old untarred linux directories 801.kernel (Detached) --->linux untar into new directories, on same parent dir as above du -sh -->on root of volume from only one of the clients, not yet over even after a week Additional info: =============== [root@rhs-client19 glusterfs]# gluster v status Status of volume: rpcx3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3 49152 0 Y 10824 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3 49152 0 Y 5232 Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3 49152 0 Y 10898 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3 49153 0 Y 5253 Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3 49153 0 Y 10904 Brick rhs-client38.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3 N/A N/A N N/A Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3 49154 0 Y 10998 Brick rhs-client38.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3 49153 0 Y 8999 Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3 49153 0 Y 10826 Brick rhs-client38.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3-newb 49154 0 Y 8984 Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick2/rpcx3-newb 49155 0 Y 29805 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick3/rpcx3-newb 49155 0 Y 30021 Brick rhs-client19.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3-newb 49156 0 Y 29826 Brick rhs-client25.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3-newb 49156 0 Y 30042 Brick rhs-client32.lab.eng.blr.redhat.com:/ gluster/brick1/rpcx3-newb 49156 0 Y 1636 Snapshot Daemon on localhost 49154 0 Y 10872 Self-heal Daemon on localhost N/A N/A Y 29849 Quota Daemon on localhost N/A N/A Y 29860 Snapshot Daemon on rhs-client25.lab.eng.blr .redhat.com 49154 0 Y 9833 Self-heal Daemon on rhs-client25.lab.eng.bl r.redhat.com N/A N/A Y 30065 Quota Daemon on rhs-client25.lab.eng.blr.re dhat.com N/A N/A Y 30076 Snapshot Daemon on rhs-client38.lab.eng.blr .redhat.com 49155 0 Y 9214 Self-heal Daemon on rhs-client38.lab.eng.bl r.redhat.com N/A N/A Y 8958 Quota Daemon on rhs-client38.lab.eng.blr.re dhat.com N/A N/A Y 8969 Snapshot Daemon on rhs-client32.lab.eng.blr .redhat.com 49155 0 Y 11221 Self-heal Daemon on rhs-client32.lab.eng.bl r.redhat.com N/A N/A Y 1658 Quota Daemon on rhs-client32.lab.eng.blr.re dhat.com N/A N/A Y 1668 Task Status of Volume rpcx3 ------------------------------------------------------------------------------ Task : Rebalance ID : 2cd252ed-3202-4c7f-99bd-6326058c797f Status : in progress [root@rhs-client19 glusterfs]# gluster v info Volume Name: rpcx3 Type: Distributed-Replicate Volume ID: f7532c65-63d0-4e4a-a5b5-c95238635eff Status: Started Snapshot Count: 0 Number of Bricks: 5 x 3 = 15 Transport-type: tcp Bricks: Brick1: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3 Brick2: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3 Brick3: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3 Brick4: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3 Brick5: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3 Brick6: rhs-client38.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3 Brick7: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3 Brick8: rhs-client38.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3 Brick9: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3 Brick10: rhs-client38.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3-newb Brick11: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick2/rpcx3-newb Brick12: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick3/rpcx3-newb Brick13: rhs-client19.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3-newb Brick14: rhs-client25.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3-newb Brick15: rhs-client32.lab.eng.blr.redhat.com:/gluster/brick1/rpcx3-newb Options Reconfigured: client.event-threads: 8 server.event-threads: 8 cluster.rebal-throttle: aggressive diagnostics.client-log-level: INFO performance.client-io-threads: off nfs.disable: on transport.address-family: inet diagnostics.latency-measurement: on diagnostics.count-fop-hits: on features.uss: enable features.quota: on features.inode-quota: on features.quota-deem-statfs: on You have new mail in /var/spool/mail/root ######################### sosreports and logs to follow
Has anyone looked at this?
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days