Bug 1421938 - systemic testing: seeing lot of ping time outs which would lead to splitbrains
Summary: systemic testing: seeing lot of ping time outs which would lead to splitbrains
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard: rpc-ping-timeout
: 1254138 (view as bug list)
Depends On:
Blocks: 1415178 1427387 1427390
TreeView+ depends on / blocked
 
Reported: 2017-02-14 06:35 UTC by Krutika Dhananjay
Modified: 2018-09-20 17:09 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1415178
: 1427387 1427390 (view as bug list)
Environment:
Last Closed: 2017-05-30 18:43:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Krutika Dhananjay 2017-02-14 06:35:35 UTC
+++ This bug was initially created as a clone of Bug #1415178 +++

Description of problem:
=====================
I am seeing lot of ping time outs on my systemic setup clients as below
[2017-01-20 08:07:53.591156] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-6: server 10.70.35.156:49153 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 08:07:53.595238] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-6: socket disconnected
[2017-01-20 08:32:58.781026] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-2: server 10.70.35.156:49152 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 08:32:58.818598] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-2: socket disconnected
[2017-01-20 08:35:31.801622] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-7: server 10.70.37.154:49153 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 08:35:31.802349] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-7: socket disconnected
[2017-01-20 08:39:22.835813] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-7: server 10.70.37.154:49153 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 08:39:22.841445] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-7: socket disconnected
[2017-01-20 08:52:28.949400] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-3: server 10.70.37.154:49152 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 08:52:28.950309] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-3: socket disconnected
[2017-01-20 09:11:15.090477] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-7: server 10.70.37.154:49153 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 09:11:15.091079] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-7: socket disconnected
[2017-01-20 09:15:03.122268] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-7: server 10.70.37.154:49153 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 09:15:03.125457] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-7: socket disconnected
[2017-01-20 09:17:28.141896] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-6: server 10.70.35.156:49153 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 09:17:28.146266] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-6: socket disconnected
[2017-01-20 09:36:10.260003] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-2: server 10.70.35.156:49152 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 09:36:10.329709] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-2: socket disconnected
[2017-01-20 09:39:02.283174] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-3: server 10.70.37.154:49152 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 09:39:02.283875] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-3: socket disconnected
[2017-01-20 09:42:15.311267] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-systemic-client-7: server 10.70.37.154:49153 has not responded in the last 42 seconds, disconnecting.
[2017-01-20 09:42:15.311701] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-systemic-client-7: socket disconnected



This would eventually lead to splitbrains which is not a good place to be in


Version-Release number of selected component (if applicable):
==========
3.8.4-12

--- Additional comment from Raghavendra G on 2017-01-23 00:17:07 EST ---

I've some queries:

* what was the disk utilization during testing?
* Can you please get profile info of bricks? I need to see what operations are being performed and their frequency.
* Is it possible to get strace of brick process when tests are being run? You can use following cmdline:
strace -ff -T -p <pid-of-brick> -o <path-where-you-want-the-strace-output-saved>

Following is a possible hypothesis

<hypothesis>

server not able to read ping request.

Me, Kritika and Pranith were about an issue where they had hit ping timer expiry in replicated setups when disk usage was high. The reason for this as Pranith pointed out was,
1. posix has some fops (like posix_xattrop, posix_fxattrop) which do syscalls after holding a lock on inode (inode->lock).
2. During high disk usage scenarios, syscall latencies were high (sometimes >= ping-timeout value)
3. Before being handed over to a new thread at io-threads xlator, a fop gets executed in one of the threads that reads incoming messages from socket. This execution path includes some translators like protocol/server, index, quota-enforcer, marker. And these translators might access inode-ctx which involves locking inode (inode->lock). Due to this locking latency of syscall gets transferred to poller thread. Since poller thread is waiting on inode->lock, it won't be able to read ping requests from network in-time resulting in ping-timer expiry.

</hypothesis>

--- Additional comment from Raghavendra G on 2017-01-25 06:59:41 EST ---

I went through strace output and volume profile. Though volume profile shows large max latencies for fops like xattrop/fxattrop, syscall latencies shown by strace is not too much to cause a ping timeout. As to large xattrop/fxattrop latency, the fops are serialized and when there too many (f)xattrops on the same file, the last fop can have a latency which is cumulative of all the (f)xattrops in the waiting queue before it. So, its not really an issue.

Nag also informed me that the disk utilization was not high when ping-timeout happened. So, I doubt whether its high disk utilization that caused this ping-timeout.

ATM, I don't have an RCA, but we can experiment with two things:
1. Increase the poller threads to 8 on bricks
2. Use fix [1]

Note that both the above two steps need to be done before starting tests.

[1] https://review.gluster.org/15046

--- Additional comment from Mohit Agrawal on 2017-01-27 03:13:56 EST ---

Hi,

 Can you please confirm about reproducer steps? As per doc It seems below are the steps 

 1) Run the script to create deep directory structure (5 Level) from two different client
 2) Run untar/tar linux.tar command operation 50 times in directory 

Regards
Mohit Agrawal

--- Additional comment from nchilaka on 2017-01-31 01:25:22 EST ---

yes, along with that taking of statedumps at regular intervals for both clients and servers.

--- Additional comment from Raghavendra G on 2017-01-31 23:39:39 EST ---

Nag,

Let me know once the setup becomes free. We would like to gather more diagnostic information. Machine and login details would help us to continue debugging the issue.

regards,
Raghavendra

--- Additional comment from Raghavendra G on 2017-02-01 07:07:40 EST ---

Another thing to verify is whether the vms running bricks and client themselves are pausing. If they are pausing for more than ping-timeout duration, that can be a simple cause for ping-timer-expiry.

--- Additional comment from Mohit Agrawal on 2017-02-02 00:21:13 EST ---

Hi,


I tried to reproduce the issue as step mentioned in previous comment but i did not get success.
I have checked on the setup (server 10.70.35.20), vm is accesible but it seems all resources(cpu,memory) are consumed and even i am not able to run any linux command on vm.

On the client side below message are continuously coming in logs  

[2017-02-02 05:26:23.661141] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-systemic-client-4: changing port to 49153 (from 0)
[2017-02-02 05:26:23.667104] E [socket.c:2309:socket_connect_finish] 0-systemic-client-4: connection to 10.70.35.20:49153 failed (Connection refused)


Ping timeout issue could be happen if brick process is not able to respond within time period (42 second) and brick is not able to respond because sufficient resources are not available on vm.


Regards
Mohit Agrawal

--- Additional comment from nchilaka on 2017-02-09 01:17:08 EST ---

i had shared the setup.
hence removing needinfo

Comment 1 Worker Ant 2017-02-14 06:36:01 UTC
REVIEW: https://review.gluster.org/16462 (storage/posix: Execute syscalls in xattrop under different locks) posted (#4) for review on master by Krutika Dhananjay (kdhananj)

Comment 2 Worker Ant 2017-02-27 08:26:45 UTC
COMMIT: https://review.gluster.org/16462 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit b5c26a462caf97bfc5380c81092f5c331ccaf1ae
Author: Krutika Dhananjay <kdhananj>
Date:   Mon Jan 23 17:40:40 2017 +0530

    storage/posix: Execute syscalls in xattrop under different locks
    
    ... and not inode->lock. This is to prevent the epoll thread from
    *potentially* being blocked on this lock in the worst case for
    extended period elsewhere in the brick stack, while the syscalls
    in xattrop are being performed under the same lock by a different
    thread. This could potentially lead to ping-timeout, if the only
    available epoll thread is busy waiting on the inode->lock, thereby
    preventing it from picking up the ping request from the client(s).
    
    Also removed some unused functions.
    
    Change-Id: I2054a06701ecab11aed1c04e80ee57bbe2e52564
    BUG: 1421938
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/16462
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 3 Raghavendra G 2017-02-28 04:30:44 UTC
Moving back to POST as we've to RCA whether there are other reasons why ping-timeout might've happened.

Comment 4 Worker Ant 2017-02-28 09:14:52 UTC
REVIEW: https://review.gluster.org/16785 (storage/posix: Use more granular mutex locks for atomic writes) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

Comment 5 Worker Ant 2017-02-28 12:13:36 UTC
REVIEW: https://review.gluster.org/16787 (Revert "storage/posix: Execute syscalls in xattrop under different locks") posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 6 Worker Ant 2017-03-02 10:16:48 UTC
REVIEW: https://review.gluster.org/16785 (storage/posix: Use more granular mutex locks for atomic writes) posted (#3) for review on master by Krutika Dhananjay (kdhananj)

Comment 7 Worker Ant 2017-03-03 09:47:59 UTC
COMMIT: https://review.gluster.org/16785 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 843945aed2a4b99a4fd1492b68b18ee80c5c994c
Author: Krutika Dhananjay <kdhananj>
Date:   Tue Feb 28 14:27:51 2017 +0530

    storage/posix: Use more granular mutex locks for atomic writes
    
    Change-Id: I7a5167de77fabf19c5151775b553913a1af5a765
    BUG: 1421938
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/16785
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra Bhat <raghavendra>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 8 Worker Ant 2017-03-08 06:42:51 UTC
REVIEW: https://review.gluster.org/16869 (storage/posix: Use granular mutex locks for pgfid update syscalls) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 9 Worker Ant 2017-03-08 08:17:33 UTC
REVIEW: https://review.gluster.org/16869 (storage/posix: Use granular mutex locks for pgfid update syscalls) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

Comment 10 Worker Ant 2017-03-09 05:00:19 UTC
REVIEW: https://review.gluster.org/16869 (storage/posix: Use granular mutex locks for pgfid update syscalls) posted (#3) for review on master by Krutika Dhananjay (kdhananj)

Comment 11 Worker Ant 2017-03-13 03:03:07 UTC
COMMIT: https://review.gluster.org/16869 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit fc97b1dc38ad46302d51a988cda18335f88514a2
Author: Krutika Dhananjay <kdhananj>
Date:   Tue Feb 28 15:52:49 2017 +0530

    storage/posix: Use granular mutex locks for pgfid update syscalls
    
    Change-Id: Ie5d635951c483d858dc4be2a90fb24b8b5f4f02d
    BUG: 1421938
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/16869
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 12 Worker Ant 2017-04-25 05:28:48 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 13 Worker Ant 2017-04-25 10:00:21 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 14 Worker Ant 2017-04-28 12:08:27 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 15 Worker Ant 2017-04-30 06:58:00 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#4) for review on master by Raghavendra G (rgowdapp)

Comment 16 Worker Ant 2017-04-30 09:09:15 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#5) for review on master by Raghavendra G (rgowdapp)

Comment 17 Worker Ant 2017-05-01 07:57:28 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#6) for review on master by Raghavendra G (rgowdapp)

Comment 18 Worker Ant 2017-05-01 08:04:16 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#7) for review on master by Raghavendra G (rgowdapp)

Comment 19 Worker Ant 2017-05-01 08:08:10 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#8) for review on master by Raghavendra G (rgowdapp)

Comment 20 Worker Ant 2017-05-01 08:29:20 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#9) for review on master by Raghavendra G (rgowdapp)

Comment 21 Worker Ant 2017-05-01 13:53:43 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#10) for review on master by Raghavendra G (rgowdapp)

Comment 22 Worker Ant 2017-05-16 04:42:59 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#11) for review on master by Raghavendra G (rgowdapp)

Comment 23 Worker Ant 2017-05-16 04:43:09 UTC
REVIEW: https://review.gluster.org/17304 (tests/afr: mark tests/basic/afr/add-brick-self-heal.t as bad) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 24 Worker Ant 2017-05-16 08:18:35 UTC
REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#12) for review on master by Raghavendra G (rgowdapp)

Comment 25 Shyamsundar 2017-05-30 18:43:01 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 26 Worker Ant 2017-07-18 10:45:33 UTC
COMMIT: https://review.gluster.org/17105 committed in master by Jeff Darcy (jeff.us) 
------
commit 2e72b24707f1886833db0b09e48b3f48b8d68d37
Author: Raghavendra G <rgowdapp>
Date:   Tue Apr 25 10:43:07 2017 +0530

    program/GF-DUMP: Shield ping processing from traffic to Glusterfs
    Program
    
    Since poller thread bears the brunt of execution till the request is
    handed over to io-threads, poller thread experiencies lock
    contention(s) in the control flow till io-threads, which slows it
    down. This delay invariably affects reading ping requests from network
    and responding to them, resulting in increased ping latencies, which
    sometimes results in a ping-timer-expiry on client leading to
    disconnect of transport. So, this patch aims to free up poller thread
    from executing code of Glusterfs Program. We do this by making
    
    * Glusterfs Program registering itself asking rpcsvc to execute its
      actors in its own threads.
    * GF-DUMP Program registering itself asking rpcsvc to _NOT_ execute
      its actors in its own threads. Otherwise program's ownthreads become
      bottleneck in processing ping traffic. This means that poller thread
      reads a ping packet, invokes its actor and hands the response msg to
      transport queue.
    
    Change-Id: I526268c10bdd5ef93f322a4f95385137550a6a49
    Signed-off-by: Raghavendra G <rgowdapp>
    BUG: 1421938
    Reviewed-on: https://review.gluster.org/17105
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Jeff Darcy <jeff.us>

Comment 27 Shyamsundar 2017-09-05 17:25:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 28 Raghavendra G 2017-09-06 11:02:05 UTC
*** Bug 1254138 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.