Bug 1384748 - iSCSI failover time is too long when a gateway is shutdown
Summary: iSCSI failover time is too long when a gateway is shutdown
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RBD
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 2.1
Assignee: Mike Christie
QA Contact: Hemanth Kumar
URL:
Whiteboard:
Depends On:
Blocks: 1379890
TreeView+ depends on / blocked
 
Reported: 2016-10-14 04:33 UTC by Paul Cuzner
Modified: 2017-07-30 15:31 UTC (History)
3 users (show)

Fixed In Version: ceph-iscsi-config-1.3-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-22 19:32:47 UTC
Target Upstream Version:


Attachments (Terms of Use)
failover and failback effect seen from the client (54.98 KB, image/png)
2016-10-14 04:33 UTC, Paul Cuzner
no flags Details
Failover and failback with a RHEL client (34.81 KB, image/png)
2016-10-14 06:29 UTC, Paul Cuzner
no flags Details
windows event records (68.00 KB, application/octet-stream)
2016-10-14 06:57 UTC, Paul Cuzner
no flags Details
Failover on N/W Failure (362.06 KB, image/png)
2016-11-10 21:32 UTC, Hemanth Kumar
no flags Details
Failover on Reboot (180.91 KB, image/png)
2016-11-10 21:36 UTC, Hemanth Kumar
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2815 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update 2017-03-22 02:06:33 UTC

Description Paul Cuzner 2016-10-14 04:33:44 UTC
Created attachment 1210338 [details]
failover  and failback effect seen from the client

Description of problem:
Using a Windows 2012r2 client, connected to a 2 gateway environment that has a single LUN. When the gateway owning the LUN is powered off, iops pause on the client for over a minute. This is too long for most applications.

Version-Release number of selected component (if applicable):


How reproducible:
Each test shows the same outcome

Steps to Reproduce:
1. Windows 2012r2 (maybe RHEL too?) client using one LUN 
2. use fio to generate load on the LUN from the client
3. poweroff the gateway node that 'owns' the lun


Actual results:
failover time is > 1 minute

Expected results:
Path failover should complete within 20-30 seconds

Additional info:
there may be spme best practice changes for Windows that we need to adopt
re: https://social.technet.microsoft.com/Forums/office/en-US/dfa3c7ea-3ee4-4a5b-95f1-74be87c7e75a/mpio-failover-time?forum=winservergen

Comment 2 Mike Christie 2016-10-14 05:16:16 UTC
If linux is slow to failover too, could you attach the kernel logs from the initiator, and gw machines?

If only windows is slow, could you just attach the windows system event viewer logs. Open "Event Viewer" -> "Windows Logs -> "System".  Just save those events as a .evtx and attach here.

Comment 3 Mike Christie 2016-10-14 05:17:50 UTC
Also if only windows is slow, could you add the output of


Get-MPIOSetting

from powershell.

Comment 4 Paul Cuzner 2016-10-14 06:29:19 UTC
Created attachment 1210362 [details]
Failover and failback with a RHEL client

Comment 5 Paul Cuzner 2016-10-14 06:56:50 UTC
PS C:\Users\Administrator> get-MPIOSetting


PathVerificationState     : Disabled
PathVerificationPeriod    : 30
PDORemovePeriod           : 20
RetryCount                : 3
RetryInterval             : 1
UseCustomPathRecoveryTime : Disabled
CustomPathRecoveryTime    : 40
DiskTimeoutValue          : 60

Also added the event records

looks like the disktimeout is the big reason - defaulting to 60 seconds?

Comment 6 Paul Cuzner 2016-10-14 06:57:24 UTC
Created attachment 1210382 [details]
windows event records

Comment 7 Paul Cuzner 2016-10-14 06:58:24 UTC
FYI - Linux (RHEL) failover was fine

Comment 8 Paul Cuzner 2016-10-15 00:02:38 UTC
Issue could not be reproduce in Mike's lab - so is likely environmental in nature. Mike investigating further.

Comment 9 Mike Christie 2016-10-18 01:23:33 UTC
It looks like Paul is hitting the worst case where the command times out. The initiator does not detect the iscsi level connection failure, so we have to go through scsi level recovery. The timeouts we hit are:

DiskTimeout (60 secs) + SRB timeout (15) + Task Management (20) + Link Down (15)

I think we can safely lower the Disk and SRB Timeout. We cannot control the task management timeout. I think Link Down is pretty low already.

DiskTimeout (25 secs) + SRB timeout (5) + Task Management (20) + Link Down (15)

This still puts us at 65 seconds.

We can get it sub 60 seconds by also setting EnableNOPOut = 1. This will detect the iSCSI target is down, so we do not have to go through the scsi task management process.


To set the disk timeout set:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\

DiskTimeout = 25.

To set the iscsi settings set:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance Number>\Parameters

EnableNOPOut = 1
SRBTimeoutDelta = 5

Comment 10 Mike Christie 2016-10-18 01:35:06 UTC
(In reply to Mike Christie from comment #9)
> We can get it sub 60 seconds by also setting EnableNOPOut = 1. This will
> detect the iSCSI target is down, so we do not have to go through the scsi
> task management process.

It looks like this is not always true. With IO in flight the worst case timeout is going to be 65 secs with the values suggested in the previous comment.

Comment 12 Mike Christie 2016-10-18 17:19:45 UTC
Just a update.

Hemanth, there are actually two bugs you will want to test for.

1. For a clean shutdown using reboot, the target does not cleanly shutdown connections, so we can end up hitting the DiskTimeout. If we cleanly shutdown conns, we can get the failover time to around 20 - 30 seconds which is closer to linux.

I am working on patch for this.

2. For the unclean shutdown, the worst case seems to be the DiskTimeout expiring documented in comment #9.

2.A for general DiskTimeout errors, we should lower the values to the ones in comment #9.

2.B Those settings will help the unclean shutdown case, but I am also looking for possibly a TCP timer to maybe help.

I will update the initiator setup doc for 2 A and B.

Comment 14 Mike Christie 2016-10-21 20:45:41 UTC
Patch was merged in version ceph-iscsi-config-1.3-1.el7cp.

Comment 16 Hemanth Kumar 2016-11-10 21:32:04 UTC
Created attachment 1219556 [details]
Failover on N/W Failure

Hi Paul,

Failed the primary GW Node's N/W and the Failover happened within 15 secs..
Failover is not talking 60sec now..

Refer the attachment for the Performance monitor stats on Windows..

Will update the same after reboot..

Comment 17 Hemanth Kumar 2016-11-10 21:36:47 UTC
Created attachment 1219557 [details]
Failover on Reboot

Rebooted the primary GW Node and the Failover happened within 30 secs..

Refer the attached screenshot

Comment 18 Hemanth Kumar 2016-11-10 21:37:36 UTC
Moving to Verified..

Comment 20 errata-xmlrpc 2016-11-22 19:32:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html


Note You need to log in before you can comment on or make changes to this bug.