Description of problem: When glusterd starts, typing the command "gluster peer probe invalid_hostname" produces different results on different machines, with some machines glusterd crashing and producing core files, and some machines glusterd processes with many more child threads. Version-Release number of selected component (if applicable): release-6 How reproducible: Steps to Reproduce: Case 1 1.glusterd 2.gluster peer probe invalid_hostname Case 2 1.glusterd 2.gluster peer probe invalid_hostname 3.gluster peer probe invalid_hostname 4.gluster peer probe invalid_hostname(Do it a few more times) 5.ps -aux|grep glusterd 6.gdb attach glusterd-pid 7.info thr (You'll see a lot of "__lll_lock_wait()" child threads) Actual results: Case 1 [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". Core was generated by `glusterd'. Program terminated with signal 11, Segmentation fault. #0 0x00007fef4bd208ff in rpc_clnt_handle_disconnect (conn=0x7fef34007890, clnt=0x7fef34007860) at rpc-clnt.c:832 832 if (!conn->rpc_clnt->disabled && (conn->reconnect == NULL)) { Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.166-2.el7.x86_64 elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 openssl-libs-1.0.1e-60.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-30.el7.x86_64 userspace-rcu-0.7.16-1.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007fef4bd208ff in rpc_clnt_handle_disconnect (conn=0x7fef34007890, clnt=0x7fef34007860) at rpc-clnt.c:832 #1 rpc_clnt_notify (trans=0x7fef34007be0, mydata=0x7fef34007890, event=<optimized out>, data=<optimized out>) at rpc-clnt.c:878 #2 0x00007fef4bd1d4e3 in rpc_transport_notify (this=<optimized out>, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=<optimized out>) at rpc-transport.c:542 #3 0x00007fef3f3634d7 in socket_connect_error_cbk (opaque=0x7fef34007190) at socket.c:3239 #4 0x00007fef4adb6dc5 in start_thread () from /usr/lib64/libpthread.so.0 #5 0x00007fef4a6fb73d in clone () from /usr/lib64/libc.so.6 (gdb) p conn->rpc_clnt $1 = (struct rpc_clnt *) 0x14860 (gdb) p conn->rpc_clnt->disabled Cannot access memory at address 0x149a0 Case 2 (gdb) info thr Id Target Id Frame 16 Thread 0x7ff384f45700 (LWP 18259) "glfs_timer" 0x00007ff38c728bdd in nanosleep () from /usr/lib64/libpthread.so.0 15 Thread 0x7ff384744700 (LWP 18260) "glfs_sigwait" 0x00007ff38c729101 in sigwait () from /usr/lib64/libpthread.so.0 14 Thread 0x7ff383f43700 (LWP 18261) "glfs_memsweep" 0x00007ff38c02d66d in nanosleep () from /usr/lib64/libc.so.6 13 Thread 0x7ff383742700 (LWP 18262) "glfs_sproc0" 0x00007ff38c725a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 12 Thread 0x7ff382f41700 (LWP 18263) "glfs_sproc1" 0x00007ff38c725a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 11 Thread 0x7ff382740700 (LWP 18264) "glusterd" 0x00007ff38c05dba3 in select () from /usr/lib64/libc.so.6 10 Thread 0x7ff37f2c1700 (LWP 18290) "glfs_gdhooks" 0x00007ff38c7256d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 9 Thread 0x7ff37eac0700 (LWP 18291) "glfs_epoll000" 0x00007ff38c066d13 in epoll_wait () from /usr/lib64/libc.so.6 8 Thread 0x7ff37d216700 (LWP 18306) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 7 Thread 0x7ff37ca15700 (LWP 18307) "glfs_scleanup" 0x00007ff38c060bf9 in syscall () from /usr/lib64/libc.so.6 6 Thread 0x7ff367fff700 (LWP 18315) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 5 Thread 0x7ff3677fe700 (LWP 18323) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 4 Thread 0x7ff366ffd700 (LWP 18331) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 3 Thread 0x7ff3667fc700 (LWP 18339) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 2 Thread 0x7ff365ffb700 (LWP 18347) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 * 1 Thread 0x7ff38de22480 (LWP 18258) "glusterd" 0x00007ff38c722ef7 in pthread_join () from /usr/lib64/libpthread.so.0 Expected results: Additional info:
I don't see this happening anymore with the latest master head after a couple of tries as well. So, closing this as current release.