1384748 – iSCSI failover time is too long when a gateway is shutdown

Bug 1384748 - iSCSI failover time is too long when a gateway is shutdown

Summary: iSCSI failover time is too long when a gateway is shutdown

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	2.1
Assignee:	Mike Christie
QA Contact:	Hemanth Kumar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1379890
TreeView+	depends on / blocked

Reported:	2016-10-14 04:33 UTC by Paul Cuzner
Modified:	2022-02-21 18:17 UTC (History)
CC List:	3 users (show)
Fixed In Version:	ceph-iscsi-config-1.3-1.el7cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-22 19:32:47 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
failover and failback effect seen from the client (54.98 KB, image/png) 2016-10-14 04:33 UTC, Paul Cuzner	no flags	Details
Failover and failback with a RHEL client (34.81 KB, image/png) 2016-10-14 06:29 UTC, Paul Cuzner	no flags	Details
windows event records (68.00 KB, application/octet-stream) 2016-10-14 06:57 UTC, Paul Cuzner	no flags	Details
Failover on N/W Failure (362.06 KB, image/png) 2016-11-10 21:32 UTC, Hemanth Kumar	no flags	Details
Failover on Reboot (180.91 KB, image/png) 2016-11-10 21:36 UTC, Hemanth Kumar	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:2815	0	normal	SHIPPED_LIVE	Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update	2017-03-22 02:06:33 UTC

Description Paul Cuzner 2016-10-14 04:33:44 UTC

Created attachment 1210338 [details]
failover  and failback effect seen from the client

Description of problem:
Using a Windows 2012r2 client, connected to a 2 gateway environment that has a single LUN. When the gateway owning the LUN is powered off, iops pause on the client for over a minute. This is too long for most applications.

Version-Release number of selected component (if applicable):


How reproducible:
Each test shows the same outcome

Steps to Reproduce:
1. Windows 2012r2 (maybe RHEL too?) client using one LUN 
2. use fio to generate load on the LUN from the client
3. poweroff the gateway node that 'owns' the lun


Actual results:
failover time is > 1 minute

Expected results:
Path failover should complete within 20-30 seconds

Additional info:
there may be spme best practice changes for Windows that we need to adopt
re: https://social.technet.microsoft.com/Forums/office/en-US/dfa3c7ea-3ee4-4a5b-95f1-74be87c7e75a/mpio-failover-time?forum=winservergen

Comment 2 Mike Christie 2016-10-14 05:16:16 UTC

If linux is slow to failover too, could you attach the kernel logs from the initiator, and gw machines?

If only windows is slow, could you just attach the windows system event viewer logs. Open "Event Viewer" -> "Windows Logs -> "System".  Just save those events as a .evtx and attach here.

Comment 3 Mike Christie 2016-10-14 05:17:50 UTC

Also if only windows is slow, could you add the output of


Get-MPIOSetting

from powershell.

Comment 4 Paul Cuzner 2016-10-14 06:29:19 UTC

Created attachment 1210362 [details]
Failover and failback with a RHEL client

Comment 5 Paul Cuzner 2016-10-14 06:56:50 UTC

PS C:\Users\Administrator> get-MPIOSetting


PathVerificationState     : Disabled
PathVerificationPeriod    : 30
PDORemovePeriod           : 20
RetryCount                : 3
RetryInterval             : 1
UseCustomPathRecoveryTime : Disabled
CustomPathRecoveryTime    : 40
DiskTimeoutValue          : 60

Also added the event records

looks like the disktimeout is the big reason - defaulting to 60 seconds?

Comment 6 Paul Cuzner 2016-10-14 06:57:24 UTC

Created attachment 1210382 [details]
windows event records

Comment 7 Paul Cuzner 2016-10-14 06:58:24 UTC

FYI - Linux (RHEL) failover was fine

Comment 8 Paul Cuzner 2016-10-15 00:02:38 UTC

Issue could not be reproduce in Mike's lab - so is likely environmental in nature. Mike investigating further.

Comment 9 Mike Christie 2016-10-18 01:23:33 UTC

It looks like Paul is hitting the worst case where the command times out. The initiator does not detect the iscsi level connection failure, so we have to go through scsi level recovery. The timeouts we hit are:

DiskTimeout (60 secs) + SRB timeout (15) + Task Management (20) + Link Down (15)

I think we can safely lower the Disk and SRB Timeout. We cannot control the task management timeout. I think Link Down is pretty low already.

DiskTimeout (25 secs) + SRB timeout (5) + Task Management (20) + Link Down (15)

This still puts us at 65 seconds.

We can get it sub 60 seconds by also setting EnableNOPOut = 1. This will detect the iSCSI target is down, so we do not have to go through the scsi task management process.


To set the disk timeout set:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\

DiskTimeout = 25.

To set the iscsi settings set:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance Number>\Parameters

EnableNOPOut = 1
SRBTimeoutDelta = 5

Comment 10 Mike Christie 2016-10-18 01:35:06 UTC

(In reply to Mike Christie from comment #9)
> We can get it sub 60 seconds by also setting EnableNOPOut = 1. This will
> detect the iSCSI target is down, so we do not have to go through the scsi
> task management process.

It looks like this is not always true. With IO in flight the worst case timeout is going to be 65 secs with the values suggested in the previous comment.

Comment 12 Mike Christie 2016-10-18 17:19:45 UTC

Just a update.

Hemanth, there are actually two bugs you will want to test for.

1. For a clean shutdown using reboot, the target does not cleanly shutdown connections, so we can end up hitting the DiskTimeout. If we cleanly shutdown conns, we can get the failover time to around 20 - 30 seconds which is closer to linux.

I am working on patch for this.

2. For the unclean shutdown, the worst case seems to be the DiskTimeout expiring documented in comment #9.

2.A for general DiskTimeout errors, we should lower the values to the ones in comment #9.

2.B Those settings will help the unclean shutdown case, but I am also looking for possibly a TCP timer to maybe help.

I will update the initiator setup doc for 2 A and B.

Comment 14 Mike Christie 2016-10-21 20:45:41 UTC

Patch was merged in version ceph-iscsi-config-1.3-1.el7cp.

Comment 16 Hemanth Kumar 2016-11-10 21:32:04 UTC

Created attachment 1219556 [details]
Failover on N/W Failure

Hi Paul,

Failed the primary GW Node's N/W and the Failover happened within 15 secs..
Failover is not talking 60sec now..

Refer the attachment for the Performance monitor stats on Windows..

Will update the same after reboot..

Comment 17 Hemanth Kumar 2016-11-10 21:36:47 UTC

Created attachment 1219557 [details]
Failover on Reboot

Rebooted the primary GW Node and the Failover happened within 30 secs..

Refer the attached screenshot

Comment 18 Hemanth Kumar 2016-11-10 21:37:36 UTC

Moving to Verified..

Comment 20 errata-xmlrpc 2016-11-22 19:32:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html

Note You need to log in before you can comment on or make changes to this bug.