645343 – ISCSI/multipath hang - must propagate SCSI device deletion to DM mpath

Bug 645343 - ISCSI/multipath hang - must propagate SCSI device deletion to DM mpath

Summary: ISCSI/multipath hang - must propagate SCSI device deletion to DM mpath

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.5
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Mike Snitzer
QA Contact:	Gris Ge
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	674932 708914 (view as bug list)
Depends On:
Blocks:	456503 621241 669411
TreeView+	depends on / blocked

Reported:	2010-10-21 11:04 UTC by Menny Hamburger
Modified:	2018-11-14 15:36 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	669411 (view as bug list)
Environment:	multipath/iscsi over Dell MD32xxi storage (RDAC)
Last Closed:	2011-07-21 09:48:14 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch to offline devices before stopping iscsi (495 bytes, patch) 2010-11-14 11:39 UTC, Menny Hamburger	no flags	Details \| Diff
Call pg_init_done directly from mpath code when device is removed (3.17 KB, patch) 2010-12-12 08:37 UTC, Menny Hamburger	no flags	Details \| Diff
call pg_init_done directly from mpath when H/W handler does not (2.81 KB, patch) 2010-12-13 07:19 UTC, Menny Hamburger	no flags	Details \| Diff
missing pg_init_done - a more general implementation (2.47 KB, patch) 2010-12-14 09:52 UTC, Menny Hamburger	no flags	Details \| Diff
Propagate SCSI device deletion to multipath layer - SCSI H/W handler part (2.43 KB, patch) 2010-12-15 08:38 UTC, Menny Hamburger	no flags	Details \| Diff
Handle device deletion by multipath (687 bytes, patch) 2010-12-15 08:43 UTC, Menny Hamburger	no flags	Details \| Diff
Updated patch - same as the one applied over Vanilla (704 bytes, patch) 2011-01-13 07:34 UTC, Menny Hamburger	no flags	Details \| Diff
Show Obsolete (6) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:1065	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update	2011-07-21 09:21:37 UTC

Description Menny Hamburger 2010-10-21 11:04:15 UTC

multipath -ll output when all is OK:
lun1 (360026b900034975f000037f64c870ba7) dm-0 DELL,MD32xxi
[size=9.1T][features=2 pg_init_retries 1][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=400][active]
 \_ 12:0:0:0  sdc 8:32   [active][ready]
 \_ 6:0:0:0   sde 8:64   [active][ready]
 \_ 11:0:0:0  sdg 8:96   [active][ready]
 \_ 9:0:0:0   sdi 8:128  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 10:0:0:0  sdb 8:16   [active][ghost]
 \_ 5:0:0:0   sdd 8:48   [active][ghost]
 \_ 7:0:0:0   sdf 8:80   [active][ghost]
 \_ 8:0:0:0   sdh 8:112  [active][ghost]
lun0 (360026b90002fc58a00002a544c870c82) dm-1 DELL,MD32xxi
[size=9.1T][features=2 pg_init_retries 1][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=400][active]
 \_ 10:0:0:1  sdr 65:16  [active][ready]
 \_ 5:0:0:1   sds 65:32  [active][ready]
 \_ 8:0:0:1   sdt 65:48  [active][ready]
 \_ 7:0:0:1   sdu 65:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:0:1   sdj 8:144  [active][ghost]
 \_ 9:0:0:1   sdk 8:160  [active][ghost]
 \_ 11:0:0:1  sdl 8:176  [active][ghost]
 \_ 12:0:0:1  sdm 8:192  [active][ghost]


Scenario:
1) Do I/O on the multipath device (lunN)
2) /etc/init.d/iscsi stop

Expected Result:
After retries the device fails and the application writing receives an error.

Actual Result:
The application writing to device gets stuck forever - need to reboot the machine

To test this we turned off queue_if_no_path by setting no_path retry to 0.
The problem still happens.

This is or device configuration in multipath.conf:

        device {
                vendor                  "DELL"
                product                 "(MD32xx|MD32xxi|MD3000|MD3000i)"
                path_grouping_policy    group_by_prio
                prio                    rdac
                prio_callout            "/sbin/mpath_prio_rdac /dev/%n"
                polling_interval        5
                path_checker            rdac
                path_selector           "round-robin 0"
                hardware_handler        "1 rdac"
                failback                immediate
                no_path_retry           0
                features                "2 pg_init_retries 1"
                rr_min_io               10
        }

iscsid configuration timeouts/retries:
node.session.timeo.replacement_timeout = 20
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 20
node.session.initial_login_retry_max = 8

Comment 1 Menny Hamburger 2010-11-08 15:20:39 UTC

2.6.28+ kernels have an option to abort all requests on a request queue (blk_abort_queue). This function is called in dm-mpath.c from inside deactivate_path. This, together with generic device timeout handling like what needs to be backported in order to fix this issue.

Comment 2 Menny Hamburger 2010-11-14 11:39:59 UTC

Created attachment 460359 [details]
Patch to offline devices before stopping iscsi

I have traced the problem to device handler detach code.
When doing iscsi stop, the device is removed together with the device handler.
the multipath layer knows nothing about the device handler removal, and continues to allow I/O to be stacked up on the device request queue until 
multipathd fails the path. The result is a I/O hang - the queued requests will never be services, and need to reboot the machine to make things work again.
Although the queue abort functionality will solve this issue, it looks quite complex to apply on the current RHEL5 code base.
I have attached a patch to the iscsi script, which offlines all the devices currently associated with iscsi prior to loging out and removing the devices.
udevsettle, udevtrigger are necassary to ensure that the device removal is propagated to user space.

Comment 3 Menny Hamburger 2010-12-12 08:37:32 UTC

Created attachment 468193 [details]
Call pg_init_done directly from mpath code when device is removed

When scsi_dh_activate returns SCSI_DH_NOSYS the H/W handler callback is not called, pg_init_done is not called in the multipath layer and pending I/O is requeued forever; this situation causes all userland processes currently performing I/O on the device to I/O hang. A similar situation occurs when the device has transitioned to SDEV_CANCEL/SDEV_DEL and the device
handler data had not yet been deleted.

The easiest way to reproduce this is in an ISCSI environment:
> dd if=/dev/dm-0 of=/dev/zero bs=8k count=10000 &
> /etc/init.d/iscsi stop
In this example, dd will I/O hang forever and the only way to release it will be to reboot the machine

This patch calls pg_init_done directly from the mpath code when the scsi_dh_activate call returns either SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED. Additional code was added in scsi_dh_rdac to make sure pg_init_done will not be invoked twice (by both scsi_dh_rdac and mpath), resulting in an undesirable situation where pg_init_in_progess becomes negative.

Note:
When running an upstream kernel, the above scenario may not occur because the request queue is aborted in dm-mpath.c:fail_path.
This patch makes sure the problem does not occur at all, rather than handling it when it does. In addition, it seems too risky to apply request queue abort functionality on RHEL5 at this stage.

Comment 4 Menny Hamburger 2010-12-13 07:19:21 UTC

Created attachment 468313 [details]
call pg_init_done directly from mpath when H/W handler does not

Comment 5 Menny Hamburger 2010-12-14 09:52:21 UTC

Created attachment 468570 [details]
missing pg_init_done - a more general implementation

This can replace 468313, however I'm leaving the old one here since this is more general and may require testing other H/W handlers and perhaps other dm devices.

The test for error on activate is moved into the scsi_dh layer.

Comment 6 Menny Hamburger 2010-12-15 08:38:45 UTC

Created attachment 468793 [details]
Propagate SCSI device deletion to multipath layer - SCSI H/W handler part

dm-devel requested that that patch be devided into two separate fixes.
This is the SCSI H/W handlerpart of the fix

Comment 7 Menny Hamburger 2010-12-15 08:43:03 UTC

Created attachment 468797 [details]
Handle device deletion by multipath

This is the multipath part of the fix.

Comment 8 Mike Snitzer 2010-12-15 16:16:19 UTC

(In reply to comment #1)
> 2.6.28+ kernels have an option to abort all requests on a request queue
> (blk_abort_queue). This function is called in dm-mpath.c from inside
> deactivate_path. This, together with generic device timeout handling like what
> needs to be backported in order to fix this issue.

FYI, the mpath call to blk_abort_queue has proven problematic and will be reverted upstream (and in RHEL6).  So it is good that you have a solution that doesn't rely on it.

Comment 9 Mike Snitzer 2010-12-15 16:22:29 UTC

(In reply to comment #7)
> Created attachment 468797 [details]
> Handle device deletion by multipath
> 
> This is the multipath part of the fix.

As I shared on dm-devel:

"Babu already pointed out that you don't need this mpath change as the default case already performs fail_path()."

Or are you saying that the block you're hooking SCSI_DH_DEV_OFFLINED off of (SCSI_DH_NOSYS) provides valuable functionality (via !m->hw_handler_name conditional and printk)?

Comment 10 Mike Snitzer 2010-12-15 16:27:28 UTC

Moving back to NEW and assigning bug to myself.  "MODIFIED" is reserved for a internal Red Hat procedures.

A bug flows like this:
1) change gets accepted upstream
2) patch is posted to an internal Red Hat list for inclusion in appropriate kernel(s), bug transitions to POST
3) once patch is reviewed/ack'd internally and accepted the patch with transition to MODIFIED
4) the bug then transitions to ON_QA

Comment 11 Menny Hamburger 2010-12-15 20:14:34 UTC

Thanks you for the thorough explanation.

Your comment is correct - the default case in pg_init_done does call fail_path.
Adding this together with SCSI_DH_NOSYS case is therefore not required.

M.

Comment 12 Menny Hamburger 2011-01-13 07:34:21 UTC

Created attachment 473259 [details]
Updated patch - same as the one applied over Vanilla

Comment 13 Mike Snitzer 2011-01-13 15:41:24 UTC

And here is a reference to the patch that was submitted to linux-scsi:
http://marc.info/?l=dm-devel&m=129252985123755&w=2

Comment 14 RHEL Program Management 2011-02-01 16:51:24 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 16 Jarod Wilson 2011-04-08 16:26:00 UTC

Patch(es) available in kernel-2.6.18-256.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 22 Gris Ge 2011-06-07 01:55:44 UTC

Dell,
We don't have hardware to test this bug.
Can your guys verify it with the newly build kernel?

Any test detail will be appreciated.

Comment 23 Chao Ye 2011-06-15 05:17:09 UTC

Confirm patch in kernel git tree

Comment 24 Gris Ge 2011-06-22 08:08:38 UTC

Can Dell confirm whether the kernel build fix the problem?

Comment 25 Menny Hamburger 2011-06-22 10:11:04 UTC

We are currently using a RHEL5.6 based kernel in our product and not an upstream kernel so testing this may require additional work.

During my testing I found that the following testing method gives the same results: instead of /etc/init.d/iscsi stop, one can write a script that goes through the sd devices and call:
echo 1 > /sys/block/sdN/device/delete.

Providing you have FC RDAC based storage (which is easier to find), I think this will give the same results.

Comment 26 Gris Ge 2011-06-24 08:50:00 UTC

(In reply to comment #25)

> 
> Providing you have FC RDAC based storage (which is easier to find), I think
> this will give the same results.

The problem is, I cannot find any RDAC based iscsi target in Red Hat.

Comment 27 Gris Ge 2011-06-24 10:04:19 UTC

(In reply to comment #26)
> (In reply to comment #25)
> 
> > 
> > Providing you have FC RDAC based storage (which is easier to find), I think
> > this will give the same results.
> 
> The problem is, I cannot find any RDAC based iscsi target in Red Hat.

For FC, some team have it, I will try my best.

Comment 28 Gris Ge 2011-06-27 04:32:36 UTC

No hardware.

Sanity Only. patch applied into kernel -269.

Comment 29 errata-xmlrpc 2011-07-21 09:48:14 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html

Comment 30 Mike Christie 2011-08-11 21:14:15 UTC

*** Bug 674932 has been marked as a duplicate of this bug. ***

Comment 31 Mike Snitzer 2011-09-23 21:17:07 UTC

*** Bug 708914 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.