Bug 1354439
Summary: | nfs client I/O stuck post IP failover | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Soumya Koduri <skoduri> | ||||||||||
Component: | common-ha | Assignee: | Soumya Koduri <skoduri> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | mainline | CC: | akhakhar, bugs, jthottan, kkeithle, mzywusko, ndevos, nlevinki, rnalakka, sankarshan, skoduri, storage-qa-internal | ||||||||||
Target Milestone: | --- | Keywords: | Triaged, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | All | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | glusterfs-3.9.0 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | 1278336 | ||||||||||||
: | 1363722 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2017-03-27 18:24:03 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 1302545, 1303037 | ||||||||||||
Bug Blocks: | 1278336, 1330218, 1363722 | ||||||||||||
Attachments: |
|
Description
Soumya Koduri
2016-07-11 10:19:39 UTC
REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock resource agents to tickle packets post failover(/back)) posted (#3) for review on master by soumya k (skoduri) REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock resource agents to tickle packets post failover(/back)) posted (#4) for review on master by soumya k (skoduri) REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock RA to tickle packets post failover(/back)) posted (#5) for review on master by soumya k (skoduri) REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock RA to tickle packets post failover(/back)) posted (#6) for review on master by soumya k (skoduri) Created attachment 1186659 [details]
continuous-io.sh
Script to continuously generate I/O on a v3 mount point.
Created attachment 1186660 [details]
portblock_test.sh
Script to do failovers and failback in a loop (for about 100 iterations) between two servers.
Created attachment 1186661 [details]
test_results_withfix
Test results with fix.
Created attachment 1186668 [details]
test_results_withoutfix
Test results without fix applied.
To verify the portblock RA introduced, below are the tests performed. A 2-node nfs-ganesha HA setup is used. On the client machine: Attached 'continuous.sh' script is ran which continuously generates I/O on a v3 mount(since grace doesn't affect v3 clients) of VIPA configured on one of the servers. portblock_test.sh - This script triggeres failover & failback between two nodes for about 100 iterations. After VIP is successfully failed-over/failed-back, there is a sleep of 10sec for the I/O to continue for sometime. That means if there is no I/O generated between two iterations, that resembles I/O getting stuck As can be seen from the test results attached (test_results_withoutfix), I/O got stuck in between few iterations without the fix Tue Aug 2 11:28:11 IST 2016 43.7 Tue Aug 2 11:27:27 IST 2016 - Loop4 Starting Failover from 10.70.43.7 Tue Aug 2 11:27:38 IST 2016 - Loop5 Completed Failover from 10.70.43.7 Tue Aug 2 11:28:00 IST 2016 - Loop5 Starting Failback to 10.70.43.7 Tue Aug 2 11:28:10 IST 2016 - Loop6 Tue Aug 2 11:28:11 IST 2016 But that wasn't the case with the fix applied (test_results_withfix) REVIEW: http://review.gluster.org/14878 (commn-HA: Add portblock RA to tickle packets post failover(/back)) posted (#7) for review on master by soumya k (skoduri) COMMIT: http://review.gluster.org/14878 committed in master by Niels de Vos (ndevos) ------ commit ea6a1ebe931e49464eb17205b94f5c87765cf696 Author: Soumya Koduri <skoduri> Date: Fri Jul 8 12:30:25 2016 +0530 commn-HA: Add portblock RA to tickle packets post failover(/back) Portblock resource-agents are used to send tickle ACKs so as to reset the oustanding tcp connections. This can be used to reduce the time taken by the NFS clients to reconnect post IP failover/failback. Two new resource agents (nfs_block and nfs_unblock) of type ocf:portblock with action block & unblock are created for each Virtual-IP (cluster_ip-1). These resource agents along with cluster_ip-1 RA are grouped in the order of block->IP->unblock and also the entire group maintains same colocation rules so that they reside on the same node at any given point of time. The contents of tickle_dir are of the following format - * A file is created for each of the VIPs used in the ganesha cluster. * Each of those files contain entries about clients connected as below: SourceIP:port_num DestinationIP:port_num Hence when one server failsover, connections of the clients connected to other VIPs are not affected. Note: During testing I observed that tickle ACKs are sent during failback but not during failover, though I/O successfully resumed post failover. Also added a dependency on portblock RA for glusterfs-ganesha package as it may not be available (as part of resource-agents package) in all the distributions. Change-Id: Icad6169449535f210d9abe302c2a6971a0a96d6f BUG: 1354439 Signed-off-by: Soumya Koduri <skoduri> Reviewed-on: http://review.gluster.org/14878 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Kaleb KEITHLEY <kkeithle> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Niels de Vos <ndevos> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report. glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html [2] https://www.gluster.org/pipermail/gluster-users/ |