Bug 1402099 - ZWS: multipath takes too long to add restored paths back to map
Summary: ZWS: multipath takes too long to add restored paths back to map
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath
Version: 6.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks: 1422991 1425546
TreeView+ depends on / blocked
 
Reported: 2016-12-06 19:11 UTC by George Beshers
Modified: 2017-04-19 21:15 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-30 18:28:21 UTC
Target Upstream Version:


Attachments (Terms of Use)
Requested data for comment #4. (233.23 KB, application/x-gzip)
2016-12-07 22:26 UTC, Michael Reed
no flags Details
udev rules file to disable blkid on multipath devices with no valid paths (1.06 KB, text/plain)
2016-12-09 17:54 UTC, Ben Marzinski
no flags Details
patch for /lib/udev/rules.d/40-multipath.rules to fix kpartx hang (427 bytes, patch)
2016-12-09 18:10 UTC, Ben Marzinski
no flags Details | Diff
Attachment associated with comment #9 (208.13 KB, application/x-gzip)
2016-12-09 22:05 UTC, Michael Reed
no flags Details

Description George Beshers 2016-12-06 19:11:10 UTC
Description of problem:

Test environment
----------------
RH6.8 - 2.6.32-642.el6.x86_64
device-mapper-multipath-0.4.9-96.el6.x86_64 (bugzilla 1365710)
also tested with: device-mapper-multipath-0.4.9-93.el6.x86_64
defaults {
        polling_interval        15 
        no_path_retry           12
        path_checker            directio
        user_friendly_names     no
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        prio                    const
}
SMC946 90 bay enclosure fully populated with a variety of SAS (no SATA) devices
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
[root@perf3094 mdr]# cat /sys/class/scsi_host/host[01]/version_fw
10.00.00.00
10.00.00.00

Problem description
-------------------
Fault insertion reveals that multipath takes too long to begin adding paths
back to maps after both cables have been disconnected and a single cable
has been restored.  This causes multipath to fail queued I/O even though
a path to the device is available.


Test
----

Foreach /dev/mapper partition device in the enclosure (90 devices):
  dd if=/dev/mapper/part_mapname of=/dev/null bs=512k iflag=direct &

Pull cable connecting host0 to the enclosure
  - verify i/o continues using just the remaining path

Pull cable connecting host1 to the enclosure
  - observe syslog messages indicating that "recovery mode" was entered

Reconnect either host0 or host1 SAS cable
  - observe SCSI infrastructure bringing paths back online

Wait, wait, wait for multipath to add paths to map and resume i/o
  - observe multipath failing I/O after recovery mode timeout
  - also observe multipath adding paths back to maps at the same time
  - observe all "dd" processes failing with EIO
  - roughly 180 seconds from recovery mode entered to i/o failures

     180 second timeout (12*15)
     10:40:00 disable hba1
     10:40:18 multipathd removed  paths from maps
     10:43:00 disable hba2
     10:43:14 multipathd removed paths from maps, retrying
     10:44:00 reenabled hba2
       - multipathd does not restore paths to maps, i/o remains stopped
     10:46:09 i/o terminated
     10:46:20 paths added back to maps, too late

If I set the timeout to 10 minutes, it takes multipath approximately
5-6 minutes to begin adding paths back and resuming i/o.


Analysis
--------

As near as I can tell, the problem stems from udev rule 13-dm-disk.rules
inappropriately running "blkid" and/or 40-multipath.rules running "kpartx"
when the last path to the mapper device has been failed.  These programs
hang as the map has no valid path.  Commenting out the execution of these
programs results in multipath discovering the newly restored paths in a timely
manner.

40-multipath.rules has a line intended to suppress execution of kpartx.
ENV{DM_ACTION}=="PATH_FAILED", GOTO="end_mpath"
but it does not appear to work.  udevadm monitor shows the DM_ACTION
value properly set to PATH_FAILED but....

KERNEL[1480449937.733499] change   /devices/virtual/block/dm-78 (block)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/virtual/block/dm-78
SUBSYSTEM=block
DM_TARGET=multipath
DM_ACTION=PATH_FAILED
DM_SEQNUM=14
DM_PATH=67:192
DM_NR_VALID_PATHS=0
DM_NAME=35000c5002615610b
DM_UUID=mpath-35000c5002615610b
DEVNAME=dm-78
DEVTYPE=disk
SEQNUM=165461
MAJOR=253
MINOR=78


Workaround
----------
Increase timeout to between 6 and 10 minutes.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 George Beshers 2016-12-07 14:11:36 UTC
Note: ZWS - Zero Watt Storage.  Previously known as JBFS.

Comment 4 Ben Marzinski 2016-12-07 19:41:50 UTC
Are you sure that kpartx is hanging on the change event you posted in the
problem description?

It would be easier for me to debug this if I could see the output of

# udevadm monitor --udev --property

and the contents of

/var/log/messages

from while you are running your test. Also, if you set

fast_io_fail_tmo 5
dev_loss_tmo 600

in your multipath.conf defaults section, does that work around the issue.  This will make your scsi devices fail IOs to down devices after 5 seconds, but not remove failed devices from the system for 10 minutes.  This will let multipath
respond quickly to errors on the devices (the fast_io_fail_tmo parameter is set to 5 seconds by default in RHEL7), but it won't remove the devices right away.  This means that multipathd doesn't need to rely on udev to restore access to the device.

Comment 5 Michael Reed 2016-12-07 22:20:44 UTC
The change event posted was an "example" from udevadm monitor that was
showing DM_ACTION=PATH_FAILED was being generated.  I doubt that the specific
event I posted was directly tied to a specific kpartx.  And, in general,
it's blkid that is hanging.  Occasionally a kpartx is also observed to be
hanging.

I will capture the requested syslog and udevadm monitor output
for you and post to this bug.

First, trying the suggested change to multipath.conf.  Keep in mind, the
storage targets in question are connected to LSI SAS adapter, not Fibre
Channel.  The multipath.conf.8 documentation suggests that the parameters
you've requested be set are limited in scope to FC.

After running the test, I can report that the requested changes to 
multipath.conf do not work around the issue.  They appear to have no effect.
Below are the settings used for the test.

defaults {
        path_checker            directio
#       checker_timeout         10
#       force_sync              yes
        polling_interval        15
#       no_path_retry           40
        no_path_retry           12
#       no_path_retry           queue
        fast_io_fail_tmo        5
        dev_loss_tmo            600
        user_friendly_names     no
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        prio                    const
#               verbosity               3
}

Comment 6 Michael Reed 2016-12-07 22:26:12 UTC
Created attachment 1229224 [details]
Requested data for comment #4.

Contains following files

udevadm_monitor.1402099.log
messages.1402099.log
blkid.1402099.log   -- ps -ef|grep blkid  # after loss of 2nd path

Comment 7 Ben Marzinski 2016-12-09 17:54:50 UTC
Created attachment 1230109 [details]
udev rules file to disable blkid on multipath devices with no valid paths

If you copy this file to /lib/udev/rules.d/ it should take care of the blkid problem.  This won't solve the kpartx problem. I'll send you fix for that next.

Comment 8 Ben Marzinski 2016-12-09 18:10:06 UTC
Created attachment 1230110 [details]
patch for /lib/udev/rules.d/40-multipath.rules to fix kpartx hang

If you apply this patch to /lib/udev/rules.d/40-multipath.rules, it should fix the kpartx hang as well. All it's doing is skipping kpartx when DM_NOSCAN is set.

The 11-dm-mpath.rules file from the previous attachment made sure that this was
set whenever there are no valid paths. Let me know if these two changes fix the issue for you.

Comment 9 Michael Reed 2016-12-09 22:02:42 UTC
Comment on attachment 1230109 [details]
udev rules file to disable blkid on multipath devices with no valid paths

Adding 11-dm-mpath.rules to /lib/udev/rules.d results in these syslog messages.

[root@perf3094 rules.d]# cp /root/mdr/11-dm-mpath.rules .
Dec  9 15:43:02 perf3094 udevd[2719]: GOTO 'mpath_end' has no matching label in: '/lib/udev/rules.d/11-dm-mpath.rules'
Dec  9 15:43:02 perf3094 udevd[2719]: GOTO 'mpath_end' has no matching label in: '/lib/udev/rules.d/11-dm-mpath.rules'
Dec  9 15:43:02 perf3094 udevd[2719]: GOTO 'mpath_end' has no matching label in: '/lib/udev/rules.d/11-dm-mpath.rules'
Dec  9 15:43:02 perf3094 udevd[2719]: GOTO 'mpath_end' has no matching label in: '/lib/udev/rules.d/11-dm-mpath.rules'
Dec  9 15:43:02 perf3094 udevd[2719]: GOTO 'mpath_end' has no matching label in: '/lib/udev/rules.d/11-dm-mpath.rules'
Dec  9 15:43:02 perf3094 udevd[2719]: GOTO 'mpath_end' has no matching label in: '/lib/udev/rules.d/11-dm-mpath.rules'

The patch applied cleanly to 40-multipath.rules.

[root@perf3094 mdr]# diff save_40-multipath.rules 40-multipath.rules 
23a24
> ENV{DM_NOSCAN}=="1", GOTO="end_mpath"

I made an edit to 11-dm-mpath.rules, adding the appropriate LABEL at the end of the file.  This eliminated the syslog messages.  If this is not the right place to have placed it please advise.

I've captured the udevadm monitor output per previous request, as well as associated syslog.  Syslog contains a period count of the number of blkid and kpartx processes executing.  I will attach the tarball shortly.

Comment 10 Michael Reed 2016-12-09 22:05:43 UTC
Created attachment 1230166 [details]
Attachment associated with comment #9

[root@perf3094 logs]# tar tvfz 1402099.2.tar.gz
-rw-r--r-- root/root   2635375 2016-12-09 16:01 udevadm_monitor.1402099.2.log
-rw-r--r-- root/root    349682 2016-12-09 16:01 messages.2.1402099.log
-rw-r--r-- root/root      3425 2016-12-09 15:57 blkid.2.1402099.log

Comment 11 George Beshers 2016-12-22 16:15:45 UTC
Getting ready for OnBoarding first week of January.

Comment 12 Joseph Kachuck 2017-02-16 18:58:30 UTC
Hello,
Due to where we are in the RHEL 6.9 release. This will not make RHEL 6.9. This is now requested for RHEL 6.10.

Thank You
Joe Kachuck

Comment 14 Joseph Kachuck 2017-03-30 18:28:21 UTC
Hello,
RHEL 6 has entered Phase 3. In phase 3 only Critical impact Security Advisories and selected Urgent Priority Bug Fix Advisories will be accepted.
https://access.redhat.com/support/policy/updates/errata

At current this BZ does not meet these requirements. I am closing this BZ as WONTFIX.

Please reopen if this fix is required for RHEL 6. If so please also provide a justification for this fix.

Thank You
Joe Kachuck

Comment 15 Frank Ramsay (HPE) 2017-04-07 16:03:40 UTC
Hi,
   We were hoping Redhat would re-consider this issue for 6.10.  we feel that multipath failing queued-IO when there is a connection is a serious issue.
And we feel that the original report of this is a strong case.

thank you.

Comment 16 Ben Marzinski 2017-04-07 18:07:51 UTC
This is certainly fixable.  There was a similar issue in RHEL7, Bug 1239173. The problem with my previous solution is that it didn't handle all the cases where a multipath device got an event that made it lose it's last path.  If the last valid path was removed when the device was reloaded, instead of failing first, and then being removed later. The test patches that I send above wouldn't catch this.  Unfortunately, fixing this correctly is a significantly bigger change, but it's still just a backport.

Comment 17 Michael Reed 2017-04-07 20:26:42 UTC
Ben, I hope your comment ultimately leads to this BZ being reopened and
to a resolution.  Thank you for considering this request!

I cannot view 1239173.


Isn't the core issue that the udev rule isn't correct?  And the attempt
to correct it didn't work?  (Comment #8.)


From comment #1:


"As near as I can tell, the problem stems from udev rule 13-dm-disk.rules
inappropriately running "blkid" and/or 40-multipath.rules running "kpartx"
when the last path to the mapper device has been failed."

Comment 18 Ben Marzinski 2017-04-07 21:29:58 UTC
(In reply to Michael Reed from comment #17)
> Isn't the core issue that the udev rule isn't correct?  And the attempt
> to correct it didn't work?  (Comment #8.)

Yes, but the issue is that the uevents don't have enough information to determine what to do in all cases. To deal with that, multipath needs to add some some flags to some of it's device-mapper callouts so that it can check these flags in the udev rules, and do the right thing. Otherwise there just isn't enough information for the udev rules to know the right thing to do.

Comment 19 Michael Reed 2017-04-07 21:45:11 UTC
Thank you for the clarification!

Comment 20 Joseph Kachuck 2017-04-19 19:23:39 UTC
Hello Frank,
To reopen for RHEL 6.10.
This is currently a High issue. What would make this issue an Urgent issue?
Does HPE/SGI have clients seeing this issue in the field?
From the summery. There is a delay in the drives reappearing, if both cables are pulled. How long is this delay?

Thank You
Joe Kachuck

Comment 21 Frank Ramsay (HPE) 2017-04-19 19:57:25 UTC
hi Joe,
  This is a BZ that I took over from George, so I'm not 100% sure on this.  But I believe it's 5 or 6 minutes to recover; and that this is customer reported.

-frank

Comment 22 Michael Reed 2017-04-19 20:44:24 UTC
Fortunately, this is not customer reported!  It was discovered during
fault insertion testing prior to FCS. It does take 5-6 minutes to recover
after the cable is plugged back in.  The sd devices are present and
ready to go.

Comment 23 Frank Ramsay (HPE) 2017-04-19 21:15:55 UTC
Micheal, thanks for the clarification.


Note You need to log in before you can comment on or make changes to this bug.