455678 – DM-multipath marks the surviving path as failed on failbacks

Bug 455678 - DM-multipath marks the surviving path as failed on failbacks

Summary: DM-multipath marks the surviving path as failed on failbacks

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	5.4
Assignee:	Mike Christie
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	480792
TreeView+	depends on / blocked

Reported:	2008-07-16 23:20 UTC by Pradipmaya Maharana
Modified:	2018-10-20 03:18 UTC (History)
CC List:	36 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-02 08:30:36 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
SOSreport from system (1.65 MB, application/x-bzip2) 2008-07-29 21:20 UTC, Gary Case	no flags	Details
zip file containing multipath and debug logs and kernel patch from first failure on RHEL5.3 beta (3.26 KB, application/x-zip-compressed) 2008-11-11 20:15 UTC, joseph.r.gruher	no flags	Details
zip file containing multipath and debug logs and kernel patch from second failure on RHEL5.3 beta (265.79 KB, application/x-zip-compressed) 2008-11-11 20:29 UTC, joseph.r.gruher	no flags	Details
zip of latest failure logs with RHEL5.3 GA and importing scsi_dh (643.59 KB, application/x-zip-compressed) 2009-03-11 19:27 UTC, joseph.r.gruher	no flags	Details
add dbg output (3.43 KB, patch) 2009-06-16 14:43 UTC, Mike Christie	no flags	Details \| Diff
mpt log file for FO/FB test (150.16 KB, text/plain) 2009-06-18 22:02 UTC, ilgu hong	no flags	Details
Last portion of log file (2.63 MB, application/x-zip-compressed) 2009-06-18 22:05 UTC, ilgu hong	no flags	Details
FOFB-with-serial-console (489.73 KB, application/x-zip-compressed) 2009-06-22 18:40 UTC, ilgu hong	no flags	Details
/proc/mpt/ioc0/info (621 bytes, text/plain) 2009-06-30 18:11 UTC, ilgu hong	no flags	Details
Patch will remove limitation of max device support in driver (381 bytes, application/octet-stream) 2009-06-30 18:34 UTC, kashyap	no flags	Details
system log with kashyap's patch. (862.81 KB, application/octet-stream) 2009-07-17 21:13 UTC, ilgu hong	no flags	Details
corrected previous patch. This will solve dm failure after 111 target id (509 bytes, patch) 2009-07-20 07:48 UTC, kashyap	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1243	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update	2009-09-01 08:53:34 UTC

Description Pradipmaya Maharana 2008-07-16 23:20:18 UTC

Description of problem:
While running the failover/failback script in a loop, while failback is going 
on, sometimes the only surviving paths are marked as failed. This causes the OS 
to freeze.

Version-Release number of selected component (if applicable):
- RHEL 5.1 32 bit and 64 bit installed on 2 diskless blades. 
- 2 LUNs (LUN 0 = 25 GB and LUN 1 = 5 GB))are mapped each blade, OS is 
installed on LUN 0 
- The LUN 1 (5 GB) is only for running “dd” i/o test. 
- OS is installed with option “linux mpath” 
- Blades are connected to 2 controllers (so basically 2 paths per LUN) 
- We have written a hardware-handler, dm-oem, for ALUA support 
- The hardware-handler, dm-oem, is based on the code originally written by 
Hannes Reinecke 
(http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.24/dm-mpath-
add-alua.patch) 
- We have also written mpath_prio_oem, based on mpath_prio_alua 
- Following is the multipath.conf we are using. 
    o defaults { 
    o         user_friendly_names yes 
    o } 
    o   
    o blacklist { 
    o         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" 
    o         devnode "^(hd|xvd)[a-z][[0-9]*]" 
    o         devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]" 
    o } 
    o   
    o devices { 
    o         device { 
    o                 vendor                  "OEM" 
    o                 product                 "OEM-Series" 
    o                 path_grouping_policy    "group_by_prio" 
    o                 getuid_callout          "/sbin/scsi_id -g -u -s /block/%
n" 
    o                 prio_callout            "/sbin/mpath_prio_oem /dev/%n" 
    o                 path_checker            tur 
    o                 path_selector           "round-robin 0" 
    o                 hardware_handler        "1 oem" 
    o                 failback                immediate 
    o                 rr_weight               uniform 
    o                 rr_min_i    o        100 
    o                 no_path_retry           120 
    o                 features                "1 queue_if_no_path" 
    o                 } 
    o } 
    o   

How reproducible:
- This does not happen on the first failover/failback; it takes a few 
iterations to get this failure, but it won’t run longer then 60-75minutes 
(basically 6-7 failover/failbacks) 


Steps to Reproduce:
- We are running a script that would kill one controller and restart it back; 
and do the same for the second one after some duration. So basically a 
failover/failback stress test. 
- On the host we are running “dd if=/dev/zero of=/dev/dm-5” in repeated loop; 
where dm-5 is the second LUN of 5 GB. 

  
Actual results:
- After a few rounds of failover and failback, machine either hangs (with only 
the mouse moving) or the OS is mounted as read-only  
- What we have seen is when the failback is happening (preferred controller 
coming back), some how the existing/surviving paths are marked as failed, which 
causes the OS to mounted as read-only or the host would hang. 
- Following is the message we see in the logs in case of path failures: 
    o device-mapper: multipath: Failing path 8:32. 
    o device-mapper: multipath: Failing path 8:48. 
    o scsi 0:0:10:1: rejecting I/O to dead device 
    o scsi 0:0:10:0: rejecting I/O to dead device 
    o Buffer I/O error on device dm-3, logical block 1300482 
    o lost page write due to I/O error on dm-3 
    o Aborting journal on device dm-3. 
    o EXT3-fs error (device dm-3): ext3_find_entry: 
<4>__journal_remove_journal_head: freeing b_frozen_data 
    o reading directory #2973492 offset 0<4>__journal_remove_journal_head: 
freeing b_frozen_data 
    o Buffer I/O error on device dm-5, logical block 0 
    o lost page write due to I/O error on dm-5 
    o Buffer I/O error on device dm-5, logical block 2 
    o lost page write due to I/O error on dm-5 
    o Buffer I/O error on device dm-5, logical block 3 
    o lost page write due to I/O error on dm-5 
    o Buffer I/O error on device dm-5, logical block 4 
    o lost page write due to I/O error on dm-5 
    o Buffer I/O error on device dm-5, logical block 5 
    o lost page write due to I/O error on dm-5 
    o Buffer I/O error on device dm-5, logical block 6 
    o lost page write due to I/O error on dm-5 
    o Buffer I/O error on device dm-5, logical block 7 
    o lost page write due to I/O error on dm-5 
    o Buffer I/O error on device dm-3, logical block 0 
    o lost page write due to I/O error on dm-3 
    o ext3_abort called. 
    o EXT3-fs error (device dm-3): ext3_journal_start_sb: Detected aborted 
journal 
    o Remounting filesystem read-only 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2717557 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2717557 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2717557 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2717557 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2717557 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2350105 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2614008 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2679319 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2679290 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2350114 
offset 0 
    o EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2350107 
offset 0 
    o Contd… 


Expected results:
- Failove/failback should run fine for days with out any such failure.

Additional info:
- Almost on every failback, the surviving paths are marked as failed but 
somehow OS device mapper recovers out of it; BUT in case of failure it does not 
and causes the read-only OS or a hang. 
- One more interesting this is, whenever a path is marked is failure or any 
error received, our hardware-handler dm-oem is called for vendor specific error 
handling (function .error_handler) BUT in the failure case mentioned above, our 
hardware-handler is not called at all.

Comment 1 Steven J. Levine 2008-07-22 16:02:25 UTC

Reassigning: This doesn't appear to be a documentation issue.

Comment 2 Pradipmaya Maharana 2008-07-22 17:16:17 UTC

The core dump collected using kdump is uploaded at 
ftp://dropbox.redhat.com/incoming  
 
It's under the file name : 1837756.vmcore.md5sum.sha1sum.tar.

Comment 4 Gary Case 2008-07-29 21:20:50 UTC

Created attachment 312936 [details]
SOSreport from system

Comment 37 Mike Christie 2008-08-22 17:48:01 UTC

Promise guys,

Is your hardware handler just going ALUA?

Comment 38 Pradipmaya Maharana 2008-08-22 17:57:06 UTC

Yes, the hardware handler is mainly to take care of ALUA and what it mainly does is fires RTPG and if needed STPG.

The hardware handler code is based on the dm-alua hardware handler written by Hannes Reinecke (http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.23/dm-mpath-add-alua.patch)

Comment 39 Mike Christie 2008-08-22 18:21:00 UTC

(In reply to comment #38)
> Yes, the hardware handler is mainly to take care of ALUA and what it mainly
> does is fires RTPG and if needed STPG.
> 

Ughhh, maybe you are in luck or maybe you are in trouble :)


> The hardware handler code is based on the dm-alua hardware handler written by
> Hannes Reinecke
> (http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.23/dm-mpath-add-alua.patch)


We are working on merging the scsi_dh alua code. Here is the current patchset
http://people.redhat.com/mchristi/scsi_dh/

Do you need a completely different alua module or can you use the common scsi_dh_alua one that went upstream?  Can you modify scsi_dh_alua to support what you need?

If you need your own module then, you should probably be working against the scsi_dh framework. If you want to do this in RHEL 5 then you need to get it upstream. We can then stick it in RHEL. If you use the scsi dh framework from the beginning it will be an easy port from usptream to RHEL5 :)

We do not have plans to port scsi_dh modules to RHEL4 though.

Comment 40 Issue Tracker 2008-08-22 18:52:42 UTC

Mike, to answer the version questions you had on the phone, they are
using:

For RHEL5.1

â¢ device-mapper-1.02.20-1.el5

â¢ device-mapper-multipath-0.4.7-12.el5

They have tried RHEL5.2 with the same results (that was one of my first
questions).


This event sent from IssueTracker by gcase 
 issue 192427

Comment 41 Mike Christie 2008-08-26 20:04:31 UTC

Promise and intel guys,

Just to make sure we are in sync.

The io errors in  b6_putty_redhat10.log   are the ones from devices like sdc right?

mpath0 (2220400015593d28b) dm-0 Intel,Multi-Flex
[size=20G][features=1 queue_if_no_path][hwhandler=1 intel]
\_ round-robin 0 [prio=50][active]
 \_ 0:0:0:0 sda 8:0   [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 0:0:1:0 sdc 8:32  [active][ready]


Here it is the lower priority and the backup, and failover to it works ok, but when failing back we get errors like this

Aug 19 15:39:36 localhend_request: I/O error, dev sdc, sector 0

Comment 42 Mike Christie 2008-08-26 20:18:29 UTC

Could the intel and promise guys check that the backup path does or does not get removed from the system during fail back? You could add some printks to __scsi_remove_target where you print all the devices that are getting removed.

Also could some of the errors in the logs be from IO that is queued? If we have 10 commands in sdc's queue, then send the failover/failback command at the head of the queue, would the 10 commands in the queue after the failover/failback command be failed because that port is now not active?

Comment 54 joseph.r.gruher 2008-10-01 23:15:33 UTC

Per recommendation with RH we are attempting to test with the scsi_dh framework that will be used in RHEL5.3.  We have been unable to successfully patch the scsi_dh framework into RHEL5.2 for testing so we will retest when the RHEL5.3 public beta is available.

Comment 55 joseph.r.gruher 2008-11-11 20:14:24 UTC

We have restested under RHEL5.3 using the included ALUA device handler.  We are able to run failover/failback testing for some time but still see a significant failure rate.  To test, we did the following:
1) Install RHEL5.3 beta
2) Patch kernel with Intel Multi-Flex device (patch attached)
3) Configure ALUA device handler
4) Execute automated failover/failback test (fails one controller, bring it back online, fails the other, brings it back online, cycles indefinitely)
5) After 7-9 hours we see a failure

We have reproduce the failure twice, I am attaching logs from the first failure now and will attach the logs from the second failure shortly.  Kernel patch is also included.

Comment 56 joseph.r.gruher 2008-11-11 20:15:50 UTC

Created attachment 323230 [details]
zip file containing multipath and debug logs and kernel patch from first failure on RHEL5.3 beta

Comment 57 joseph.r.gruher 2008-11-11 20:29:24 UTC

Created attachment 323241 [details]
zip file containing multipath and debug logs and kernel patch from second failure on RHEL5.3 beta

second set of logs attached.  note the second set is from three blades running on the storage so there are three sets of logs included.

Comment 58 joseph.r.gruher 2008-11-12 20:13:14 UTC

We setup four blades on the storage and executed an automated test where one controller is failed, then the other, back and forth.  All blades survived for about 9 hours 45 minutes and then failed.

All the tests fail after naming a device as 111. (0:111:0:0).

We saw similar results in another Linux distribution, the tests would fail after attaching the devices and naming them 111.

Is this any kind of known limitation or failure condition?

Comment 59 Issue Tracker 2008-11-13 00:19:12 UTC

I spoke with engineering, and they had this to say:

You've found the same issue in another distro which makes it likely this
is a problem in the upstream code. It's our opinion that the best course
of action for you is to begin by replicating this on upstream code. The
community (including Red Hat) could then work on the problem using the
regular mailing lists. The upstream maintainer of the scsi_dh_alua code
works at SuSE/Novell and we have a good history of working together on
issues like this. Working upstream with all the work out in the open will
allow us to address this more quickly and prevent duplication of effort,
as the SuSE folks may already be working on the problem. 


Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by gcase 
 issue 192427

Comment 60 joseph.r.gruher 2008-11-17 20:24:21 UTC

In our initial testing with SLES10.1 and 10.2 and a device handler we created (dm-intel) we saw occasional failover/failback failures and noted these seemed correlated to a device being named 0:111:0:0. Since then we have moved away from using our own device handler and are trying to get the standard ALUA handler (dm-alua) up and running. We moved to the standard handler to have a less “proprietary” solution in the hopes of improved support – in general it made sense to use the included handler instead of a custom one, and it was Novell’s recommendation. So far we cannot reach the point where we can actively test failover/failback, we are stuck on setup problems and working with Novell to get testing up and running. We are not currently seeing or working a problem similar to this Bugzilla entry on SLES, although that could possibly change when we get testing with the alua handler up and running.

In our initial testing with RHEL5.1 and 5.2 we would see frequent failures in failover/failback testing using dm-intel. At Redhat’s request we waited or the RHEL5.3 beta. RH indicated the new scsi_dh framework in RH5.3 might give us improved results, and they indicated they would be interested in supporting debug of any failures if we were using their latest code. We also moved to dm-alua when we began testing RHEL5.3, for the same reasons as with SLES. We do indeed see improved results with RHEL5.3, as we can now run hours of failover/failback testing, but we reliably see a failure after about nine hours of automated testing. We are trying to engage with RH to debug this failure through this Bugzilla entry. I would think RH would be more concerned the ALUA support in their beta, using the new handler they recommended we try, may be buggy.

One thing we noted with the RHEL5.3 failures is we again see the failure correlated with a device being named 0:111:0:0. This seems interesting to us but we do not understand the meaning. I certainly do not think it is adequate cause to define the problems as being the same across SLES and RHEL when the distribution, device handler and framework are all different. I could be a symptom which is common to multiple root causes or maybe it is just a coincidence. We entered this into the BZ as information which might be pertinent to debug, not to make a case we see the same failure in other distributions or that this is an upstream problem, I do not think anyone can say that at this point.

Comment 61 joseph.r.gruher 2008-11-19 01:13:50 UTC

Red Hat team, further comments?

Comment 62 joseph.r.gruher 2008-11-20 18:07:55 UTC

Red Hat team, any further updates?

Comment 63 joseph.r.gruher 2008-12-02 20:36:23 UTC

We have moved our testing from the RHEL5.3 beta (kernel -120) to the RHEL5.3 snapshot4 (kernel -124).  With the beta we were able to setup MPIO and run failover/failback testing for some hours before encountering the error described in previous comments.  With snapshot4 the same setup steps result in kernel panic.  This suggests something changed between beta and snapshot4 that resulted in a serious change in behavior, possibly a serious bug.  Notes from our developer follow:


Today, we try to install RHEL5.3 GA version with multipath support (install command : linux mpath).
But we have a problem with this distribution both 32bit and 64bit.
We always found kernel panic with reboot step, So we try to change partition type with LVM and without LVM, but result is same.
We also checked this with removing an SCM controller, but result is same.

Kernel boot messages are below

        "Unable to access resume device (/dev/mapper/mpathOp3) or Unable to access resume device (/dev/VolGroup00/LogVol01)"
        --> this message depend on partition type, we try to install without LVM and with LVM
        "mount:could not find filesystem 'dev/root'
        "setuproot: moving /dev failed: No such file or directory"
        "setuproot: error mounting /proc: No such file or directory"
        "setuproot: error mounting /sys: No such file or directory"
        "switchroot: mount failed: No such file or directory"
        Kernel panic - not syncing: Attempted to kill init!"

Comment 64 Ben Marzinski 2008-12-02 23:48:04 UTC

There is a bug in some of the initrd code starting with snapshot2 (see 471879 and 471689). A fix has already been found, and should be in the next snapshot. Sorry about that.

Comment 65 joseph.r.gruher 2008-12-17 17:10:58 UTC

With the problem from comment #63 resolved by the new snapshot, as described in comment #64, we still have the original problem (see comment #60) and need input from RH.

Comment 66 joseph.r.gruher 2009-01-05 23:52:20 UTC

This problem (see comment #60) remains open and still blocks support of RHEL on our product line.  We need input from RHEL to debug and resolve and meet active customer demand for RHEL on our systems.

Comment 67 joseph.r.gruher 2009-01-09 21:31:42 UTC

This problem (see comment #60) remains open and still blocks support of RHEL on
our product line.  We need input from RHEL to debug and resolve and meet active
customer demand for RHEL on our systems.

Comment 68 joseph.r.gruher 2009-01-15 20:52:59 UTC

Additional testing supports the idea that this is a cumulative failure.  When device names increment through failover/failback operations we see a failure when we reach device name 0:111:0:0.  If we reboot the server (which resets the device names) prior to reaching 0:111:0:0 we do not experience the failure.  Testing continues.  Still need RH input.

Comment 70 joseph.r.gruher 2009-02-17 21:58:09 UTC

Note: this BZ corresponds to IT 192427.

Comment 71 joseph.r.gruher 2009-03-11 01:59:53 UTC

In testing using the latest scsi_dh code imported into the RHEL5.3 GA build we still see the failure when SCSI ID increments to 0:0:111:x as described in previous comments in this BZ.

Comment 72 Mike Christie 2009-03-11 19:16:58 UTC

The problem is the rejecting I/O to dead device right?

I thought there was a oops too. That is gone with your patch right?

Comment 73 joseph.r.gruher 2009-03-11 19:27:29 UTC

Created attachment 334852 [details]
zip of latest failure logs with RHEL5.3 GA and importing scsi_dh

Attaching are logs of this failure.  Failure documented in bug 481633 is not currently seen (but testing continues).

Comment 74 Keve Gabbert 2009-05-19 19:42:39 UTC

why is this BZ still in NEW state?
Is it being addressed for RHEL 5.4?

Comment 75 Mike Christie 2009-06-02 20:30:23 UTC

Is intel/promise still investigating? I thought we left off trying to get a setup. I thought we were going to modify the remote setup that intel/promise did so we could replicate it or were you guys still digging into it?

Comment 76 Mike Christie 2009-06-02 20:31:31 UTC

Oh yeah, was this fixed upstream? Was intel/promise using just mptsas drivers or was there a mix?

Comment 77 joseph.r.gruher 2009-06-08 19:24:17 UTC

Yes, Intel would still like to see this fixed.  If necessary we could possibly set up a system for remote access, however, we find this very easy to reproduce and do not think it requires our specific hardware to reproduce.  Has Red Hat tried to reproduce this issue?  That might be a more efficient path to a solution than remote debug.

Comment 78 Mike Christie 2009-06-08 19:58:14 UTC

Does your setup have 111+ luns or do you have less than that but they are getting added and removed and for some reason the OS's LUN is getting incremented so you hit number 111 that way?

Comment 79 joseph.r.gruher 2009-06-08 20:04:36 UTC

We have just one ALUA LUN (two controllers, two paths active/standby, one LUN).  We fail the LUN repeatedly by using a script to alternately reset one controller, then the other, back and forth.  On each cycle the ID increments, I would guess when the controller comes back online after reset the OS is giving the LUN a new ID, so as the controllers keep resetting and coming back the IDs keep incrementing.  When we get to 111 we see the failure.

Comment 80 joseph.r.gruher 2009-06-08 20:06:12 UTC

In the previous comment I should say we FAILOVER the LUN, to the surviving controller, not FAIL the LUN in general.

Comment 81 Mike Christie 2009-06-08 20:22:53 UTC

Are you using mptsas or mpt2sas?

Comment 82 Mike Christie 2009-06-10 17:46:56 UTC

Joseph,

(In reply to comment #76)
> Oh yeah, was this fixed upstream? Was intel/promise using just mptsas drivers
> or was there a mix?  

Did you mean you hit the problem when you get 0:0:111:0 (not 0:111:0:0) by any chance?

Comment 83 ilgu hong 2009-06-10 18:58:56 UTC

Hi, Mike Christie.

for the question
>> Are you using mptsas or mpt2sas? 

We are using mptsas driver. 
What is mpt2sas? Is it latest patch for mptsas?

for the question
>>Did you mean you hit the problem when you get 0:0:111:0 (not 0:111:0:0) by any
chance?  

Yes. we did. 
We always see this problem when we run FO/FB test. This test do FO/FB continueously and system forcely is restarted when scsi id reach about 0:0:111:0.

Comment 84 Mike Christie 2009-06-11 16:26:15 UTC

(In reply to comment #83)
> Hi, Mike Christie.
> 
> for the question
> >> Are you using mptsas or mpt2sas? 
> 
> We are using mptsas driver. 
> What is mpt2sas? Is it latest patch for mptsas?
> 

It is the a mpt driver for a new sas card that they have I think.


> for the question
> >>Did you mean you hit the problem when you get 0:0:111:0 (not 0:111:0:0) by any
> chance?  
> 
> Yes. we did. 
> We always see this problem when we run FO/FB test. This test do FO/FB
> continueously and system forcely is restarted when scsi id reach about
> 0:0:111:0.  

There is a known issue in the sas code where the target/id is going to increase. See here http://marc.info/?t=124130267400001&r=1&w=2. What we are still trying to figure out is why this would cause DID_NO_CONNECTs when we get to 111.

I do not have a setup where I can replicate the failures because the boxes are remote and have the disks in the system. Do you guys know a way I can simulate a failure with that type of setup? I do not normally work on mpt drivers. So if there is a ioctl or tool I can use to force this let me know.

If not could you guys rerun your test but turn on scsi logging. The mptsas guys ask that you turn it on like this:

echo 0x808 > /proc/sys/dev/scsi/logging_level

Comment 85 Tom Coughlan 2009-06-16 00:17:09 UTC

(In reply to comment #84)

> If not could you guys rerun your test but turn on scsi logging. The mptsas guys
> ask that you turn it on like this:
> 
> echo 0x808 > /proc/sys/dev/scsi/logging_level  

Tomas or ilgu, can you help with this?

Comment 87 ilgu hong 2009-06-16 00:32:31 UTC

Hi, Tom.

What do you want to support fo fix this problem? Do you want to access remote system? or just log file for the FO/FB with enabled mptsas debuging?

To setup remove acess, It will take a little time. I need to check resource in our EUT lab and request service for remote acess to IT manager. 
And another thing, I don't have any idea run FO/FB script remotely. so if you want to test it, I will help you with manual execution FO/FB script.


Do you want to remove access?

Comment 88 Mike Christie 2009-06-16 14:42:33 UTC

(In reply to comment #87)
> Hi, Tom.
> 
> What do you want to support fo fix this problem? Do you want to access remote
> system? or just log file for the FO/FB with enabled mptsas debuging?
> 

The mpt guys have been trying to replicate it, but have not been able to. They requested that someone that can replicate it turn on that debugging and send them the output.

> To setup remove acess, It will take a little time. I need to check resource in
> our EUT lab and request service for remote acess to IT manager. 
> And another thing, I don't have any idea run FO/FB script remotely. so if you
> want to test it, I will help you with manual execution FO/FB script.
> 
> 
> Do you want to remove access?  

If you can provide remote access it would help. If not, rerun your tests the debugging on.

Comment 89 Mike Christie 2009-06-16 14:43:55 UTC

Created attachment 348121 [details]
add dbg output

I also did this patch to print where we getting the did no connect errors. If you could run with this it hopefully will help.

You should be able to run with this patch instead of turning the scsi logging which will reduce the log output.

Comment 90 Mike Christie 2009-06-16 15:06:09 UTC

(In reply to comment #89)
> Created an attachment (id=348121) [details]
> add dbg output
> 
> I also did this patch to print where we getting the did no connect errors. If
> you could run with this it hopefully will help.
> 
> You should be able to run with this patch instead of turning the scsi logging
> which will reduce the log output.  

ilgu,

I forgot. Did we track down where this failing already? Was it the sdev del check in scsi_dispatch_cmd?

Comment 91 ilgu hong 2009-06-16 20:49:30 UTC

Hi, Mike.

We didn't track down this problem. Becuase this problem didn't give exact debug message, so we don't have information where is the bug located.

So, I will fisrt patch (id=348121) and rerun FO/FB test with mpt debug option to get debug information. And if resouce is available, prepare setting remote for remote access.

Thanks.

Comment 92 ilgu hong 2009-06-16 22:17:59 UTC

Hi, Mike.

There is samll mistake in the patch file.
>>>>
@@ -1743,6 +1747,8 @@ mptscsih_abort(struct scsi_cmnd * SCpnt)
 		dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT
 		    "task abort: device has been deleted (sc=%p)\n",
 		    ioc->name, SCpnt));
+		printk(KERN_ERR "no connect");
+		dump_stack()l


last line must be "dump_stack();"

Thanks.

Comment 93 ilgu hong 2009-06-16 23:07:46 UTC

Hi, Mike.

There is another mistake.

>>
diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index ce0788e..06f81be 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -1014,6 +1014,8 @@ mptsas_qcmd(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *))
 
 	if (!vdevice || !vdevice->vtarget || vdevice->vtarget->deleted) {
 		SCpnt->result = DID_NO_CONNECT << 16;
+		prinkt(KERN_ERR "mptsas_qcmd no connect\n");

last line prinkt --> printk

Thanks.

Comment 95 ilgu hong 2009-06-18 22:02:37 UTC

Created attachment 348573 [details]
mpt log file for FO/FB test

This log information for FO/FB, which gether with mptsas debug option, patched to get stack.

Comment 96 ilgu hong 2009-06-18 22:05:38 UTC

Created attachment 348574 [details]
Last portion of log file

This is last portion of original log file

Comment 97 ilgu hong 2009-06-18 22:08:59 UTC

Hi, Mike.

I finished test with your patch. The system go to hang status when scsi id reache to 0:0:111:x.

Log size is too big (almost 2G), so I gether debug information related with mptsas and attched file (id=348573).

And I also attached last portion of original log file (id=348574).


Thanks.

Comment 98 kashyap 2009-06-19 09:16:26 UTC

"I finished test with your patch. The system go to hang status when scsi id
reache to 0:0:111:x." this need more clarification. Is it kernel crash ?

I need some input from you.
#1. Output of "cat /proc/mpt/ioc0/info"
#2. All logs are not having required data to progress. I need serial redirect console log. It will give Opps message prints if is there any crash. 

Thanks,
Kashyap

Comment 99 ilgu hong 2009-06-19 17:34:32 UTC

Hi, Kashyap.

>>Is it kernel crash ?

There is no kernel crash message in the log, but system did not response to the any input (keyboard, mouse) and looks like in the hang status.
Whatever, I didn't see any oop message in the previous tests.



>>#1. Output of "cat /proc/mpt/ioc0/info"
>>#2. All logs are not having required data to progress. I need serial redirect
>>console log. It will give Opps message prints if is there any crash. 

I will give it to you ASAP.

thanks.

Comment 100 Mike Christie 2009-06-21 17:11:11 UTC

Hey,

Not sure if it helps. It looks like the DID_NO_CONNECTs are a result of the device removals, which I think are caused to the cables pulls.

I am not sure why it happens at 111 or why we cannot get to any target higher than 111.

Comment 101 kashyap 2009-06-22 05:16:10 UTC

Mike,

Why I have asked for serial console redirect log is just to see end result of the test. There are couple of place I have initiated my doubts. According to me if 111 value is greater than MAX device support at IOC firmware then Driver is simply changing SCSI midlayer queue value to "1" from "64". There is no clue why system hangs because of that.

If "111" value is not related to IOC max device support value then Things are not related to my above thought. It is something different at all. I need those inputs for that.

- Kashyap

Comment 102 ilgu hong 2009-06-22 18:40:25 UTC

Created attachment 348959 [details]
FOFB-with-serial-console

This is log file witch get from serial log.
In this log, I did not set mpt debug feature.

Comment 103 ilgu hong 2009-06-22 18:42:37 UTC

Hi, Mike and Kashyap.


I run FO/FB test during last weekend. you can get this log from the attached file (id=348959). it's a zipped file.

thanks.

Comment 104 kashyap 2009-06-23 12:19:07 UTC

can you tell what is output of 
"cat /proc/mpt/ioc0/info"

- Kashyap

Comment 105 Tom Coughlan 2009-06-30 14:46:17 UTC

Kashyap, ilgu,

We are running low on time to get a fix for this in RHEL 5.4. There are just a few weeks left to take a patch for this. Please continue to investigate. Thanks.

Tom

Comment 106 ilgu hong 2009-06-30 18:11:15 UTC

Created attachment 349989 [details]
/proc/mpt/ioc0/info

this is test machine's infomation

Comment 107 ilgu hong 2009-06-30 18:13:55 UTC

Hi make and Kashyap.

I attached file  /proc/mpt/ioc0/info (id=349989).

Sorry for late response. 

There is some problem in bugzilla. I didn't receive latest information.

Comment 108 kashyap 2009-06-30 18:34:43 UTC

Created attachment 350001 [details]
Patch will remove limitation of max device support in driver

Comment 109 kashyap 2009-06-30 18:37:39 UTC

ilgu,

Just observing your attachment. Maxdevice support in this case is 112. It means this issue is surely related to MaxDevice support. But as I already told eventhough we reach Target Id 111 in this case, current code simply change queue size of scsi midlayer to 1 from 64. It means IO will be slow.But anyway I have follow same logic as our MPT2SAS does. changing max_id in driver to -1 instead of setting it from ioc portfact.

Please find patch for this in attachment.

Thanks,
Kashyap

Comment 111 Tom Coughlan 2009-07-07 13:27:25 UTC

ilgu,

Please try the patch in comment 108. Let us know the result.

Tom

Comment 112 Mike Christie 2009-07-07 15:58:48 UTC

(In reply to comment #109)
> ilgu,
> 
> Just observing your attachment. Maxdevice support in this case is 112. It means
> this issue is surely related to MaxDevice support. But as I already told
> eventhough we reach Target Id 111 in this case, current code simply change
> queue size of scsi midlayer to 1 from 64.

I think the scsi layer will prevent scans from occurring on targets greater than 111 (shost->max_id), so we will not get a chance to hit the queue size code you meantion. The scsi layer will just not scan and add anything once we hot max_id. So after enough add and remove sequences, we have no paths or bad paths due to the cable pulls, IO will not be executed and will be failed and the FS will be mounted read only or if it was root on this device the box will probably eventually lock up.


 It means IO will be slow.But anyway I
> have follow same logic as our MPT2SAS does. changing max_id in driver to -1
> instead of setting it from ioc portfact.
> 
> Please find patch for this in attachment.
> 
> Thanks,
> Kashyap

Comment 113 Mike Christie 2009-07-07 16:01:10 UTC

Kashyap,

One question for you guys on the patch. What is ioc->pfacts[0].PortSCSIID? Is it a hardware limit? Can a user change it with your tools?

If we do try to access a target with id greater than that we just hit the code that changes the queue size of scsi midlayer to 1 from 64, right?

Comment 114 ilgu hong 2009-07-07 17:12:46 UTC

Hi, all.

Currently I have resource problem, so I can schedule this test next week.

Good news is I didn't find this problem with RHEL 5.4 alpa.

I checked RHEL 5.4 alpa mptsas code, but it does not change "ioc->pfacts[0].PortSCSIID" line.

Kashyap,

Are you sure this patch really related with this problem?

Comment 115 kashyap 2009-07-10 12:40:19 UTC

My patch will solve this problem.
You can see in 5.4 file mptscsih.c - mptscsih_slave_configure() will not do scsi_adjust_queue_depth to 1. That is why it is not hitting this issue in Rhel5.4

So you can consider both the things are related.
- Kashyap

Comment 116 ilgu hong 2009-07-17 21:09:38 UTC

Hi, kashyap and Mile.

In this week, I test this problem with Kashyap's patch.

I found that scsi id problem does not occured, But in my guessing, multipathd does not receive REMOVE event from kernel after scsi id go over 0:0:111:x.
So multipathd does not remove path which go to FO status and system continuesly try to use this dead path. 

finally system go to hang status.


I will attach log.

please, check this.

Thanks.

Comment 117 ilgu hong 2009-07-17 21:13:06 UTC

Created attachment 354213 [details]
system log with kashyap's patch.

This is 32bit system log.

After scsi id go to 0:0:111:x, system does not configure multipath with newly arrived removed/arrived path.

Comment 118 Mike Christie 2009-07-18 03:13:01 UTC

Hey kashyap and Ilgu,

Did you notice that there is no target 112 in the log? There is the 111 and then a 113, but I do not see a 112. Is that the only hole?

Comment 119 kashyap 2009-07-20 07:48:37 UTC

Created attachment 354308 [details]
corrected previous patch. This will solve dm failure after 111 target id

shost has value called this_id. This value is currently set from IOC port fact. It does not have any meaningful restriction in MPT driver. This value must be -1. (infinite).

now both max_id and this_id field of Scsi_Host struct are set to -1.

In scsi_scan.c midlayer will check if this_id is matching with target id it will skip that allocation. Because of this check we are not seeing Target id 112 added in last log file. Eventually both the path will be down and Multipath driver will see device unreachable.


Thanks,
Kashyap

Comment 120 Mike Christie 2009-07-20 17:23:58 UTC

Thanks kashyap,

Ilgu please test and let us know.

I think the patch is safe enough to add it this late in development, so if we can get it tested quick it enough we can get it into one of the next kernels.

Comment 121 Tom Coughlan 2009-07-23 13:54:03 UTC

(In reply to comment #120)
> Thanks kashyap,
> 
> Ilgu please test and let us know.
> 
> I think the patch is safe enough to add it this late in development, so if we
> can get it tested quick it enough we can get it into one of the next kernels.  

Ilgu, patches for the final 5.4 build are due by the end of today. If you have testing status by then we may be able to get it in. If not, it will be deferred to 5.5.

Comment 122 joseph.r.gruher 2009-07-23 18:41:29 UTC

We are setting up to test with the patch today.  We will provide results as soon as possible but due to configuration and especially test time required results will probably not be available until tomorrow.

Comment 123 Ronald Pacheco 2009-07-24 18:19:08 UTC

Any test results yet?

Comment 124 Mike Christie 2009-07-24 18:41:35 UTC

I am posting the patch now because we know it fixes some bug. It might not be this one specifically, but if it ends up not fixing it, I can always make a new bugzilla to send that fix under.

Note:

this_id was already set to -1 in our current kernel, so I did not merged that part.

Comment 126 kashyap 2009-07-25 07:26:48 UTC

Mike,
It is fine to include this patch.
See my MPT driver patch for RHEL 5.4 
https://bugzilla.redhat.com/show_bug.cgi?id=475455

Only part missing in this patch is "max_id = -1".
this_id change is already removed there, so It is fine to merge only max_id change.

Thanks,
Kashyap

Comment 127 ilgu hong 2009-07-28 17:48:06 UTC

Hi, all.

I tested with latest LSI patch(dm_111.patch) and it fixed scsi id 111 problem.
I run more than 20 hours and scsi id go over 238.

Lastest patch works fine.


Thanks.

Comment 128 Don Zickus 2009-07-28 20:13:23 UTC

in kernel-2.6.18-160.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 132 joseph.r.gruher 2009-08-11 17:19:28 UTC

Please see comment #127, we have tested this and it works fine in our testing, we consider this verified and think it can be closed.

Comment 133 errata-xmlrpc 2009-09-02 08:30:36 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.

agk
berthiaume_wayne
bmarzins
bmr
christophe.varoqui
cluster-maint
coughlan
cward
duck
dwysocha
dzickus
edamato
egoggin
heinzm
ilgu.hong
jane.lv
joseph.r.gruher
junichi.nomura
jvillalo
kashyap.desai
keve.a.gabbert
kueda
kzhang
lmb
luyu
mbroz
mchristi
prockai
qcai
rpacheco
sam
sathya.prakash
syeghiay
tao
thenzl
tranlan