Bug 1697756 - Glusterd do not response any request through its 24007 port
Summary: Glusterd do not response any request through its 24007 port
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-09 05:19 UTC by Hunang Shujun
Modified: 2020-02-24 04:39 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-02-24 04:39:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 22535 0 None Abandoned rpc:fix glusterd stuck during restart 2019-09-24 19:32:09 UTC

Description Hunang Shujun 2019-04-09 05:19:13 UTC
Description of problem:
Glusterd do not response any request to its 24007 port

From glusterd process gdb, it can be seen the thread8 stucked at LOCK($iobref->lock), and thread9 stuck at waiting for signal priv->notify.cond which only thread8 will notify.  
So glusterd cannot response any request like mount command, cli request, glusterfsd startup.

Thread 9 (Thread 0x7f0855a25700 (LWP 1991)):
#0  0x00007f085c0485bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f085702edab in socket_event_poll_err (this=0x7f084c045bf0, gen=7, idx=7) at socket.c:1201
#2  0x00007f085703399c in socket_event_handler (fd=13, idx=7, gen=7, data=0x7f084c045bf0, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:2480
#3  0x00007f085d2f65e9 in event_dispatch_epoll_handler (event_pool=0xf1db00, event=0x7f0855a24e84) at event-epoll.c:587
#4  0x00007f085d2f68c0 in event_dispatch_epoll_worker (data=0xf2c0c0) at event-epoll.c:663
#5  0x00007f085c0425da in start_thread () from /lib64/libpthread.so.0
#6  0x00007f085b918eaf in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f0856226700 (LWP 1990)):
#0  0x00007f085c04b85c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f085c04e657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x00007f085d2c0ae6 in iobref_unref (iobref=0x7f08480033e0) at iobuf.c:944
#3  0x00007f085d046f29 in rpc_transport_pollin_destroy (pollin=0x7f0848047f10) at rpc-transport.c:123
#4  0x00007f0857033319 in socket_event_poll_in (this=0x7f084c045bf0, notify_handled=_gf_true) at socket.c:2322
#5  0x00007f0857033932 in socket_event_handler (fd=13, idx=7, gen=7, data=0x7f084c045bf0, poll_in=1, poll_out=0, poll_err=0)

The way to reproduce:  
Restart gluster server and clients in the same time, the reproduce ratio once in several hundred restarts 

analysis
The function socket_event_poll_in could can be called for same socket fd in the same time, that could be the reason why thread8 stuck at the place "LOCK($iobref->lock)" and iobref free in another thread just now.

Comment 2 Atin Mukherjee 2019-04-09 08:26:22 UTC
Mohit or Raghavendra G will be looking into this. I believe this is the same issue which has been highlighted in the user ML yesterday.

Comment 3 Hunang Shujun 2019-04-09 09:00:15 UTC
I have tested my patch and it seems after thousands of restart, no such error happen again. I want to commit my correction

Comment 4 Worker Ant 2019-04-09 09:41:46 UTC
REVIEW: https://review.gluster.org/22535 (fix glusterd stuck during restart) posted (#1) for review on master by None

Comment 5 Yaniv Kaul 2019-12-31 07:28:21 UTC
Is this still relevant?

Comment 6 Hunang Shujun 2020-01-07 02:02:05 UTC
Currently do not found in Gluster7 yet

Comment 8 Mohit Agrawal 2020-02-24 04:39:00 UTC
As per comment 6, the issue is not reproducible in the latest release(glusterfs-7) so i am closing the bug for now.Please reopen it if you face the same issue again.


Note You need to log in before you can comment on or make changes to this bug.