Bug 1421938
Summary: | systemic testing: seeing lot of ping time outs which would lead to splitbrains | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Krutika Dhananjay <kdhananj> | |
Component: | rpc | Assignee: | Raghavendra G <rgowdapp> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | bugs, jbyers, jcall, kdhananj, mchangir, moagrawa, nchilaka, olim, pasik, rcyriac, rgowdapp, rhinduja, rhs-bugs, rkavunga | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | rpc-ping-timeout | |||
Fixed In Version: | glusterfs-3.12.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1415178 | |||
: | 1427387 1427390 (view as bug list) | Environment: | ||
Last Closed: | 2017-05-30 18:43:01 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1415178, 1427387, 1427390 |
Description
Krutika Dhananjay
2017-02-14 06:35:35 UTC
REVIEW: https://review.gluster.org/16462 (storage/posix: Execute syscalls in xattrop under different locks) posted (#4) for review on master by Krutika Dhananjay (kdhananj) COMMIT: https://review.gluster.org/16462 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit b5c26a462caf97bfc5380c81092f5c331ccaf1ae Author: Krutika Dhananjay <kdhananj> Date: Mon Jan 23 17:40:40 2017 +0530 storage/posix: Execute syscalls in xattrop under different locks ... and not inode->lock. This is to prevent the epoll thread from *potentially* being blocked on this lock in the worst case for extended period elsewhere in the brick stack, while the syscalls in xattrop are being performed under the same lock by a different thread. This could potentially lead to ping-timeout, if the only available epoll thread is busy waiting on the inode->lock, thereby preventing it from picking up the ping request from the client(s). Also removed some unused functions. Change-Id: I2054a06701ecab11aed1c04e80ee57bbe2e52564 BUG: 1421938 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: https://review.gluster.org/16462 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> CentOS-regression: Gluster Build System <jenkins.org> Moving back to POST as we've to RCA whether there are other reasons why ping-timeout might've happened. REVIEW: https://review.gluster.org/16785 (storage/posix: Use more granular mutex locks for atomic writes) posted (#2) for review on master by Krutika Dhananjay (kdhananj) REVIEW: https://review.gluster.org/16787 (Revert "storage/posix: Execute syscalls in xattrop under different locks") posted (#1) for review on master by Atin Mukherjee (amukherj) REVIEW: https://review.gluster.org/16785 (storage/posix: Use more granular mutex locks for atomic writes) posted (#3) for review on master by Krutika Dhananjay (kdhananj) COMMIT: https://review.gluster.org/16785 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 843945aed2a4b99a4fd1492b68b18ee80c5c994c Author: Krutika Dhananjay <kdhananj> Date: Tue Feb 28 14:27:51 2017 +0530 storage/posix: Use more granular mutex locks for atomic writes Change-Id: I7a5167de77fabf19c5151775b553913a1af5a765 BUG: 1421938 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: https://review.gluster.org/16785 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra> Reviewed-by: Pranith Kumar Karampuri <pkarampu> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> REVIEW: https://review.gluster.org/16869 (storage/posix: Use granular mutex locks for pgfid update syscalls) posted (#1) for review on master by Krutika Dhananjay (kdhananj) REVIEW: https://review.gluster.org/16869 (storage/posix: Use granular mutex locks for pgfid update syscalls) posted (#2) for review on master by Krutika Dhananjay (kdhananj) REVIEW: https://review.gluster.org/16869 (storage/posix: Use granular mutex locks for pgfid update syscalls) posted (#3) for review on master by Krutika Dhananjay (kdhananj) COMMIT: https://review.gluster.org/16869 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit fc97b1dc38ad46302d51a988cda18335f88514a2 Author: Krutika Dhananjay <kdhananj> Date: Tue Feb 28 15:52:49 2017 +0530 storage/posix: Use granular mutex locks for pgfid update syscalls Change-Id: Ie5d635951c483d858dc4be2a90fb24b8b5f4f02d BUG: 1421938 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: https://review.gluster.org/16869 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#1) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#2) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#3) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#4) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#5) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#6) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#7) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#8) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#9) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#10) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#11) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17304 (tests/afr: mark tests/basic/afr/add-brick-self-heal.t as bad) posted (#1) for review on master by Raghavendra G (rgowdapp) REVIEW: https://review.gluster.org/17105 (program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program) posted (#12) for review on master by Raghavendra G (rgowdapp) This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report. glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html [2] https://www.gluster.org/pipermail/gluster-users/ COMMIT: https://review.gluster.org/17105 committed in master by Jeff Darcy (jeff.us) ------ commit 2e72b24707f1886833db0b09e48b3f48b8d68d37 Author: Raghavendra G <rgowdapp> Date: Tue Apr 25 10:43:07 2017 +0530 program/GF-DUMP: Shield ping processing from traffic to Glusterfs Program Since poller thread bears the brunt of execution till the request is handed over to io-threads, poller thread experiencies lock contention(s) in the control flow till io-threads, which slows it down. This delay invariably affects reading ping requests from network and responding to them, resulting in increased ping latencies, which sometimes results in a ping-timer-expiry on client leading to disconnect of transport. So, this patch aims to free up poller thread from executing code of Glusterfs Program. We do this by making * Glusterfs Program registering itself asking rpcsvc to execute its actors in its own threads. * GF-DUMP Program registering itself asking rpcsvc to _NOT_ execute its actors in its own threads. Otherwise program's ownthreads become bottleneck in processing ping traffic. This means that poller thread reads a ping packet, invokes its actor and hands the response msg to transport queue. Change-Id: I526268c10bdd5ef93f322a4f95385137550a6a49 Signed-off-by: Raghavendra G <rgowdapp> BUG: 1421938 Reviewed-on: https://review.gluster.org/17105 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Jeff Darcy <jeff.us> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/ *** Bug 1254138 has been marked as a duplicate of this bug. *** |