Description of problem: Glusterd do not response any request to its 24007 port From glusterd process gdb, it can be seen the thread8 stucked at LOCK($iobref->lock), and thread9 stuck at waiting for signal priv->notify.cond which only thread8 will notify. So glusterd cannot response any request like mount command, cli request, glusterfsd startup. Thread 9 (Thread 0x7f0855a25700 (LWP 1991)): #0 0x00007f085c0485bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f085702edab in socket_event_poll_err (this=0x7f084c045bf0, gen=7, idx=7) at socket.c:1201 #2 0x00007f085703399c in socket_event_handler (fd=13, idx=7, gen=7, data=0x7f084c045bf0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480 #3 0x00007f085d2f65e9 in event_dispatch_epoll_handler (event_pool=0xf1db00, event=0x7f0855a24e84) at event-epoll.c:587 #4 0x00007f085d2f68c0 in event_dispatch_epoll_worker (data=0xf2c0c0) at event-epoll.c:663 #5 0x00007f085c0425da in start_thread () from /lib64/libpthread.so.0 #6 0x00007f085b918eaf in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f0856226700 (LWP 1990)): #0 0x00007f085c04b85c in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f085c04e657 in __lll_lock_elision () from /lib64/libpthread.so.0 #2 0x00007f085d2c0ae6 in iobref_unref (iobref=0x7f08480033e0) at iobuf.c:944 #3 0x00007f085d046f29 in rpc_transport_pollin_destroy (pollin=0x7f0848047f10) at rpc-transport.c:123 #4 0x00007f0857033319 in socket_event_poll_in (this=0x7f084c045bf0, notify_handled=_gf_true) at socket.c:2322 #5 0x00007f0857033932 in socket_event_handler (fd=13, idx=7, gen=7, data=0x7f084c045bf0, poll_in=1, poll_out=0, poll_err=0) The way to reproduce: Restart gluster server and clients in the same time, the reproduce ratio once in several hundred restarts analysis The function socket_event_poll_in could can be called for same socket fd in the same time, that could be the reason why thread8 stuck at the place "LOCK($iobref->lock)" and iobref free in another thread just now.
Mohit or Raghavendra G will be looking into this. I believe this is the same issue which has been highlighted in the user ML yesterday.
I have tested my patch and it seems after thousands of restart, no such error happen again. I want to commit my correction
REVIEW: https://review.gluster.org/22535 (fix glusterd stuck during restart) posted (#1) for review on master by None
Is this still relevant?
Currently do not found in Gluster7 yet
As per comment 6, the issue is not reproducible in the latest release(glusterfs-7) so i am closing the bug for now.Please reopen it if you face the same issue again.