Bug 706309

Summary: path-checker of multipath keep checking the devices even after dev_loss_tmo.
Product: Red Hat Enterprise Linux 5 Reporter: Gris Ge <fge>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED NOTABUG QA Contact: Storage QE <storage-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.7CC: agk, bmarzins, bmr, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, qcai, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-18 18:38:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gris Ge 2011-05-20 02:56:05 UTC
Description of problem:

In RHEL5, dev_loss_tmo will not remove the dev file.
Hence path-checker of multipathd will keep checking that device even its rport have been offline (/sys/class/fc_remote_ports/rport-2:0-2/port_state is "Not Present")

This will cause a flooding of error message of these line:
======
May 19 22:46:46 storageqe-04 kernel: sd 2:0:2:9: SCSI error: return code = 0x00010000
May 19 22:46:46 storageqe-04 kernel: end_request: I/O error, dev sddj, sector 0
======

If we have many LUN for this host, it will be multiplied.

Version-Release number of selected component (if applicable):
RHEL 5.7 Beta
kernel-2.6.18-262.el5
device-mapper-multipath-0.4.7-46.el5

How reproducible:
100%

Steps to Reproduce:
1. You need to setup a multipath environment
2. Bring 1 FC port down.
3. Wait for the dev_loss_tmo, after that, you will got I/O error for sector 0 multipath are using directio sector 0 as path checker.
  
Actual results:
Multipath should not check the device after dev_loss_tmo or rport state is "Not Present".

Expected results:
Multipath keep checking the device using path-checker which cause man I/O error message.

Additional info:

RHEL 6 don't have this issue as they remove dev file after dev_loss_tmo.
And RHEL5 by-default is keep that dev file there.

Comment 1 RHEL Program Management 2011-06-21 05:29:06 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 4 Ben Marzinski 2011-08-18 18:38:05 UTC
This is unfortunately not doable. In RHEL6, the device is deleted at dev_loss_tmo. If the LUN comes back online after this, a new device is created, which multipathd adds to the map and begins checking.  In RHEL5 the device never goes away.  Because of this, if multipathd stops checking when dev_loss_tmo is
reached, then it will not be able to restore the device if the LUN comes back up after this.

The messages may be a big mess in the logs, but there is no way to avoid them in RHEL5.