Bug 240734 - Reboot/Shutdown fails if device on mpt times out
Reboot/Shutdown fails if device on mpt times out
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.5
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Martin Jenner
:
Depends On:
Blocks: 246627 253671 422551 430698
  Show dependency treegraph
 
Reported: 2007-05-21 06:58 EDT by Martin Poole
Modified: 2010-10-22 11:11 EDT (History)
3 users (show)

See Also:
Fixed In Version: 4.6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-21 16:50:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Amended customer patch now applicable to 4.5 kernel. (738 bytes, patch)
2007-05-21 06:58 EDT, Martin Poole
no flags Details | Diff
Compressed tar file of LSI driver source code. (455.75 KB, application/octet-stream)
2007-07-26 09:35 EDT, Larry Stephens
no flags Details

  None (edit)
Description Martin Poole 2007-05-21 06:58:46 EDT
Description of problem:

Reboot or Shutdown fails if a device attached to an mpt controller suffers a timeout

Version-Release number of selected component (if applicable):

Originally reported against RHEL4 U2

How reproducible:

Always

Steps to Reproduce:
1. ensure broken device exists.
2. Execute reboot or shutdown.
  
Actual results:

shutdown hangs

 remote(pam_unix)[6995]: session closed for user guest
 Sending all processes the KILL signal...
 Saving random seed:
 Syncing hardware clock to system time
 Turning off swap:
 Turning off quotas:
 Unmounting pipe file systems:
 Unmounting file systems:
 Please stand by while rebooting the system...
 md: stopping all md devices.
 md: md0 switched to read-only mode.
 Synchronizing SCSI cache for disk sda:


Expected results:

successful reboot/shutdown.

Additional info:

 Fusion MPT driver issued synchronize
 cache command to SES chip but the chip didn't return response.
 In this case, fusion MPT driver set timer(10sec) and then detects timeout
 but mptscsih_timer_expired() didn't do nothing.
 This cause fusion MPT driver waits for response forever from the device.

Original patch was against Update 2, but mpt code was shuffled in a later update
and applies to different file. Since the original patch duplicated the action in
the else clause the attached patch just makes the reset code unconditional.
Comment 1 Martin Poole 2007-05-21 06:58:47 EDT
Created attachment 155083 [details]
Amended customer patch now applicable to 4.5 kernel.
Comment 2 Larry Stephens 2007-06-25 09:37:04 EDT
I agree that the "if" statement is unnecessary, since the same code is executed 
with both paths.

The HardReset approach is rather dramatic, resetting the chip.  If this works 
for everyone, fine with me. But I wonder about the other devices, if any, with 
an ID greater than the SES chip.  With the HardReset approach, will those 
devices have a Sync Cache command executed?  

What driver version is being used?  The code I've looked at doesn't issue a 
SYNCH CACHE command unless the device is DISK_TYPE.  A SES device should not 
have a SYNC CACHE command sent to it.  Shouldn't the SES device be a Processor 
type and so indicated in its Inquiry data?

If the timer expires, it might be better to issue a Bus Reset to the device and 
allow the driver to proceed to the next device, rather than the "Big Hammer" of 
a chip Diagnostic Reset.

Please advise what works for everyone.
Comment 3 Masao fukuchi 2007-07-02 20:46:47 EDT
Hello Larry,

We detected this problem in RHEL4U2(Fusion MPT driver 3.02.18).
As you say, same problem doesn't occur in RHEL4U3(3.02.62.01rh)
or later because SYNCH CACHE command isn't issued to SES chip.
But timeout may occur in disk if disk is broken.
So we need to fix this problem.

I think hard reset is easiest approach. 
But I think device reset or bus reset are better if possible.

Masao Fukuchi
Comment 4 Larry Stephens 2007-07-26 09:35:16 EDT
Created attachment 160016 [details]
Compressed tar file of LSI driver source code.
Comment 5 Issue Tracker 2007-07-31 03:20:47 EDT
Hello Larry,

I looked at the offered source code.
I think there are no problem because the driver will issue hard reset for
all internal command.

But I couldn't find all part where you revised because I don't have
source code of 3.02.11.

Could you make a patch against fusion MPT driver 3.02.73rh(driver for
RHEL4.5) or fusion MPT driver 3.02.99.00(candidate for RHEL4.6)?

Thank you.
M.Fukuchi


This event sent from IssueTracker by mmatsuya 
 issue 121161
Comment 6 Issue Tracker 2007-08-30 08:46:46 EDT
Fujitsu confirmed that this issue was gone in RHEL4.6. So, we can change it
to MODIFIED.

-----------------------------------------
I confirmed Fusion MPT driver 3.02.99.00rh in RHEL4.6
Beta fixes this problem.

In former version of Fusion MPT driver, when timeout
occurred for internal command, no action did. And it
causes system hang.
But, Fusion MPT driver 3.02.99.00rh calls wake_up() and
it wakes up mptscsih_do_cmd() forcibly. Then the response will return to
upper layer soon.

Thank you.
M.Fukuchi


This event sent from IssueTracker by mmatsuya 
 issue 121161
Comment 7 Issue Tracker 2007-09-13 01:42:11 EDT
Hi there,

As I wrote before, this issue was fixed on 4.6 beta. 
Please set the proper flags and the status.

Regards,


This event sent from IssueTracker by mmatsuya 
 issue 121161

Note You need to log in before you can comment on or make changes to this bug.