Created attachment 1612173 [details] bricks log file Description of problem: The gluster client is somehow crash after running for a period (1~2 weeks), I would like to know is it a bug exist in GlusterFS 6.1 or other problem. Firstly, The glusterfs is installed by using the rpm from http://mirror.centos.org/centos/7/storage/x86_64/gluster-4.0/ client rpm: glusterfs-6.1-1.el7.x86_64.rpm server rpm: glusterfs-server-6.1-1.el7.x86_64.rpm Below is the structure of the GlusterFS in our servers: Volume Name: k8s-volume Type: Replicate Volume ID: d5d673d6-a1bb-4d14-bc91-1ceab7ad761d Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: hk2hp057:/gfs/k8s_pv Brick2: hk2hp058:/gfs/k8s_pv Brick3: hk2hp059:/gfs/k8s_pv Options Reconfigured: auth.reject: 172.20.117.144 auth.allow: 172.20.117.* transport.address-family: inet nfs.disable: on performance.client-io-threads: on As shown above, there are total three servers using the "Replicated Volume". It is created three replicas in pursuit of better reliability and data redundancy with regard to the official recommendation. Version-Release number of selected component (if applicable): However, the gluster process go down after running for a period. it is easily to see by executing the "gluster" command - $ gluster volume status Status of volume: k8s-volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick hk2hp057:/gfs/k8s_pv 49152 0 Y 25141 Brick hk2hp058:/gfs/k8s_pv N/A N/A N N/A Brick hk2hp059:/gfs/k8s_pv 49152 0 Y 6031 Self-heal Daemon on localhost N/A N/A Y 6065 Self-heal Daemon on hk2hp057 N/A N/A Y 25150 Self-heal Daemon on hk2hp059 N/A N/A Y 6048 Task Status of Volume k8s-volume ------------------------------------------------------------------------------ There are no active volume tasks the brick under "hk2hp058" is offline. When i check the log file under "/var/log/glusterfs/bricks/", the message as below: [2019-09-04 07:33:50.004661] W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-k8s-volume-posix: posix set mdata failed, No ctime : /gfs/k8s_pv/.glusterfs/dd/56/dd56eca4-e7e8-4208-b589-763a428408e1 gfid:dd56eca4-e7e8-4208-b589-763a428408e1 [2019-09-04 07:33:50.004776] W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-k8s-volume-posix: posix set mdata failed, No ctime : /gfs/k8s_pv/.glusterfs/dd/56/dd56eca4-e7e8-4208-b589-763a428408e1 gfid:dd56eca4-e7e8-4208-b589-763a428408e1 pending frames: patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-09-04 10:53:02 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.1 /lib64/libglusterfs.so.0(+0x26db0)[0x7fa1fa2d1db0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1fa2dc7b4] /lib64/libc.so.6(+0x36340)[0x7fa1f8911340] /usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa4cc)[0x7fa1ee6964cc] /lib64/libglusterfs.so.0(+0x8c286)[0x7fa1fa337286] /lib64/libpthread.so.0(+0x7dd5)[0x7fa1f9111dd5] /lib64/libc.so.6(clone+0x6d)[0x7fa1f89d902d] --------- The glusterfs process is resumed after manually restart the process - "glusterd". [root@hk2hp058 /var/log/glusterfs/bricks] systemctl restart glusterd [root@hk2hp058 /var/log/glusterfs/bricks] gluster volume status Status of volume: k8s-volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick hk2hp057:/gfs/k8s_pv 49152 0 Y 25141 Brick hk2hp058:/gfs/k8s_pv 49152 0 Y 18814 Brick hk2hp059:/gfs/k8s_pv 49152 0 Y 6031 Self-heal Daemon on localhost N/A N/A Y 18846 Self-heal Daemon on hk2hp059 N/A N/A Y 6048 Self-heal Daemon on hk2hp057 N/A N/A Y 25150 Task Status of Volume k8s-volume ------------------------------------------------------------------------------ There are no active volume tasks Thus, is it a bug or other reason lead to this issue?
Created attachment 1612174 [details] glusterd.log
Created attachment 1612175 [details] glustershd.log
Do you have a coredump? If yes, please install the debuginfo, and send send the bt for the core.
Thank you for your reply. Unfortunately, I cannot find any coredump file in the servers with pattern "core.*"
In that case please use the steps in https://github.com/gluster/glusterfs/blob/master/extras/devel-tools/print-backtrace.sh and the backtrace in the log file to get more information as to what happened.
(In reply to Nithya Balachandran from comment #5) > In that case please use the steps in > https://github.com/gluster/glusterfs/blob/master/extras/devel-tools/print- > backtrace.sh and the backtrace in the log file to get more information as > to what happened. Thank you for your suggestion, however, I have encountered the error while running the corresponding script. Is there something wrong? [hk2hp060@projadm /home/projadm/user/joechan/gfs] ./print-backtrace_v2.sh /home/projadm/user/joechan/gfs/glusterfs-debuginfo-6.1-1.el7.x86_64.rpm glusterfs/bricks/20190912.log eu-addr2line --functions --exe=/home/projadm/user/joechan/gfs/glusterfs-debuginfo-6.1-1.el7.x86_64/usr/lib/debug/usr/sbin/glusterfsd.debug /usr/sbin/glusterfsd eu-addr2line: cannot find symbol '/usr/sbin/glusterfsd' /usr/sbin/glusterfsd eu-addr2line --functions --exe=/home/projadm/user/joechan/gfs/glusterfs-debuginfo-6.1-1.el7.x86_64/usr/lib/debug/usr/sbin/glusterfsd.debug /usr/sbin/glusterfsd eu-addr2line: cannot find symbol '/usr/sbin/glusterfsd' /usr/sbin/glusterfsd
(In reply to joe.chan from comment #6) > (In reply to Nithya Balachandran from comment #5) > > In that case please use the steps in > > https://github.com/gluster/glusterfs/blob/master/extras/devel-tools/print- > > backtrace.sh and the backtrace in the log file to get more information as > > to what happened. > > Thank you for your suggestion, however, I have encountered the error while > running the corresponding script. Is there something wrong? > > [hk2hp060@projadm /home/projadm/user/joechan/gfs] ./print-backtrace_v2.sh > /home/projadm/user/joechan/gfs/glusterfs-debuginfo-6.1-1.el7.x86_64.rpm > glusterfs/bricks/20190912.log > eu-addr2line --functions > --exe=/home/projadm/user/joechan/gfs/glusterfs-debuginfo-6.1-1.el7.x86_64/ > usr/lib/debug/usr/sbin/glusterfsd.debug /usr/sbin/glusterfsd > eu-addr2line: cannot find symbol '/usr/sbin/glusterfsd' > /usr/sbin/glusterfsd > eu-addr2line --functions > --exe=/home/projadm/user/joechan/gfs/glusterfs-debuginfo-6.1-1.el7.x86_64/ > usr/lib/debug/usr/sbin/glusterfsd.debug /usr/sbin/glusterfsd > eu-addr2line: cannot find symbol '/usr/sbin/glusterfsd' > /usr/sbin/glusterfsd Sorry, it is a mistake. Below is the backtrace in log file: [hk2hp060@projadm /home/projadm/user/joechan/gfs] ./print-backtrace.sh /home/projadm/user/joechan/gfs/glusterfs-debuginfo-6.1-1.el7.x86_64.rpm glusterfs/bricks/20190912.log /usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa4cc)[0x7f5061bdf4cc] socket_is_connected inlined at /usr/src/debug/glusterfs-6.1/rpc/rpc-transport/socket/src/socket.c:2887 in socket_event_handler /usr/src/debug/glusterfs-6.1/rpc/rpc-transport/socket/src/socket.c:2619
Can you confirm that you have copied the stacktrace alone to a different file and run the script on thta file? You should be seeing one line per line in the stacktrace.
(In reply to Nithya Balachandran from comment #8) > Can you confirm that you have copied the stacktrace alone to a different > file and run the script on thta file? You should be seeing one line per line > in the stacktrace. Yes it is confirmed that the procedure is correct. stacktrace from log: /lib64/libglusterfs.so.0(+0x26db0)[0x7f506d81adb0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f506d8257b4] /lib64/libc.so.6(+0x36340)[0x7f506be5a340] /usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa4cc)[0x7f5061bdf4cc] /lib64/libglusterfs.so.0(+0x8c286)[0x7f506d880286] /lib64/libpthread.so.0(+0x7dd5)[0x7f506c65add5] /lib64/libc.so.6(clone+0x6d)[0x7f506bf2202d] only one line while running the script: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa4cc)[0x7f5061bdf4cc] socket_is_connected inlined at /usr/src/debug/glusterfs-6.1/rpc/rpc-transport/socket/src/socket.c:2887 in socket_event_handler /usr/src/debug/glusterfs-6.1/rpc/rpc-transport/socket/src/socket.c:2619
attached more debug result by manually execute the "eu-addr2line" command: /lib64/libglusterfs.so.0(+0x26db0)[0x7f506d81adb0] _gf_msg_backtrace_nomem /usr/src/debug/glusterfs-6.1/libglusterfs/src/logging.c:1124 /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f506d8257b4] sprintf inlined at /usr/src/debug/glusterfs-6.1/libglusterfs/src/common-utils.c:954 in gf_print_trace /usr/include/bits/stdio2.h:33 /lib64/libglusterfs.so.0(+0x8c286)[0x7f506d880286] event_dispatch_epoll_handler inlined at /usr/src/debug/glusterfs-6.1/libglusterfs/src/event-epoll.c:761 in event_dispatch_epoll_worker /usr/src/debug/glusterfs-6.1/libglusterfs/src/event-epoll.c:648 I am not able to find the "/lib64/libc.so.6" and "/lib64/libpthread.so.0" under "glusterfs-debuginfo-6.1-1.el7.x86_64/usr/lib/"
libc and libpthread are std libraries and not part of gluster. I've moving this to the rpc component for someone to look at.
Created attachment 1618042 [details] 2 coredump files attached the coredump files
Program terminated with signal 11, Segmentation fault. #0 socket_is_connected (this=0x7f90d80018a0) at socket.c:2619 2619 if (priv->use_ssl) { Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libacl-2.2.51-14.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7_6.1.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 socket_is_connected (this=0x7f90d80018a0) at socket.c:2619 #1 socket_event_handler (fd=5, idx=11, gen=16, data=0x7f90d80018a0, poll_in=1, poll_out=0, poll_err=0, event_thread_died=0 '\000') at socket.c:2887 #2 0x00007f90ec34c286 in event_dispatch_epoll_handler (event=0x7f90d7ffee70, event_pool=0x5632b0727560) at event-epoll.c:648 #3 event_dispatch_epoll_worker (data=0x5632b077a420) at event-epoll.c:761 #4 0x00007f90eb126dd5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f90ea9ee02d in clone () from /lib64/libc.so.6
@joe, do you have any port scanning tool that could try to probe gluster ports ?
(In reply to Xavi Hernandez from comment #14) > @joe, do you have any port scanning tool that could try to probe gluster > ports ? Yes, I have use the security scanner called Rapid7 to ensure the security hardening of the whole system. I think the root cause of this crash is most likely due to the scanning as the case can be reproduced. Thank you.
REVIEW: https://review.gluster.org/23861 (socket: fix error handling) posted (#1) for review on master by Xavi Hernandez
I've found the problem. I've sent a patch to fix it. As soon as it is accepted I will backport it so that it will be fixed in next release.
REVISION POSTED: https://review.gluster.org/23861 (socket: fix error handling) posted (#2) for review on master by Xavi Hernandez
Thank you for your support, however, I have blocked the corresponding port and only allow the GlusterFS member to access for the current workaround.
REVIEW: https://review.gluster.org/23872 (socket: fix error handling) posted (#1) for review on release-6 by Xavi Hernandez
REVIEW: https://review.gluster.org/23872 (socket: fix error handling) merged (#2) on release-6 by Xavi Hernandez
*** Bug 1740413 has been marked as a duplicate of this bug. ***
*** Bug 1739884 has been marked as a duplicate of this bug. ***