Bug 240734

Summary: Reboot/Shutdown fails if device on mpt times out
Product: Red Hat Enterprise Linux 4 Reporter: Martin Poole <mpoole>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: jbaron, larry.stephens, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 4.6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-21 21:50:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 246627, 253671, 422551, 430698    
Attachments:
Description Flags
Amended customer patch now applicable to 4.5 kernel.
none
Compressed tar file of LSI driver source code. none

Description Martin Poole 2007-05-21 10:58:46 UTC
Description of problem:

Reboot or Shutdown fails if a device attached to an mpt controller suffers a timeout

Version-Release number of selected component (if applicable):

Originally reported against RHEL4 U2

How reproducible:

Always

Steps to Reproduce:
1. ensure broken device exists.
2. Execute reboot or shutdown.
  
Actual results:

shutdown hangs

 remote(pam_unix)[6995]: session closed for user guest
 Sending all processes the KILL signal...
 Saving random seed:
 Syncing hardware clock to system time
 Turning off swap:
 Turning off quotas:
 Unmounting pipe file systems:
 Unmounting file systems:
 Please stand by while rebooting the system...
 md: stopping all md devices.
 md: md0 switched to read-only mode.
 Synchronizing SCSI cache for disk sda:


Expected results:

successful reboot/shutdown.

Additional info:

 Fusion MPT driver issued synchronize
 cache command to SES chip but the chip didn't return response.
 In this case, fusion MPT driver set timer(10sec) and then detects timeout
 but mptscsih_timer_expired() didn't do nothing.
 This cause fusion MPT driver waits for response forever from the device.

Original patch was against Update 2, but mpt code was shuffled in a later update
and applies to different file. Since the original patch duplicated the action in
the else clause the attached patch just makes the reset code unconditional.

Comment 1 Martin Poole 2007-05-21 10:58:47 UTC
Created attachment 155083 [details]
Amended customer patch now applicable to 4.5 kernel.

Comment 2 Larry Stephens 2007-06-25 13:37:04 UTC
I agree that the "if" statement is unnecessary, since the same code is executed 
with both paths.

The HardReset approach is rather dramatic, resetting the chip.  If this works 
for everyone, fine with me. But I wonder about the other devices, if any, with 
an ID greater than the SES chip.  With the HardReset approach, will those 
devices have a Sync Cache command executed?  

What driver version is being used?  The code I've looked at doesn't issue a 
SYNCH CACHE command unless the device is DISK_TYPE.  A SES device should not 
have a SYNC CACHE command sent to it.  Shouldn't the SES device be a Processor 
type and so indicated in its Inquiry data?

If the timer expires, it might be better to issue a Bus Reset to the device and 
allow the driver to proceed to the next device, rather than the "Big Hammer" of 
a chip Diagnostic Reset.

Please advise what works for everyone.

Comment 3 Masao fukuchi 2007-07-03 00:46:47 UTC
Hello Larry,

We detected this problem in RHEL4U2(Fusion MPT driver 3.02.18).
As you say, same problem doesn't occur in RHEL4U3(3.02.62.01rh)
or later because SYNCH CACHE command isn't issued to SES chip.
But timeout may occur in disk if disk is broken.
So we need to fix this problem.

I think hard reset is easiest approach. 
But I think device reset or bus reset are better if possible.

Masao Fukuchi


Comment 4 Larry Stephens 2007-07-26 13:35:16 UTC
Created attachment 160016 [details]
Compressed tar file of LSI driver source code.

Comment 5 Issue Tracker 2007-07-31 07:20:47 UTC
Hello Larry,

I looked at the offered source code.
I think there are no problem because the driver will issue hard reset for
all internal command.

But I couldn't find all part where you revised because I don't have
source code of 3.02.11.

Could you make a patch against fusion MPT driver 3.02.73rh(driver for
RHEL4.5) or fusion MPT driver 3.02.99.00(candidate for RHEL4.6)?

Thank you.
M.Fukuchi


This event sent from IssueTracker by mmatsuya 
 issue 121161

Comment 6 Issue Tracker 2007-08-30 12:46:46 UTC
Fujitsu confirmed that this issue was gone in RHEL4.6. So, we can change it
to MODIFIED.

-----------------------------------------
I confirmed Fusion MPT driver 3.02.99.00rh in RHEL4.6
Beta fixes this problem.

In former version of Fusion MPT driver, when timeout
occurred for internal command, no action did. And it
causes system hang.
But, Fusion MPT driver 3.02.99.00rh calls wake_up() and
it wakes up mptscsih_do_cmd() forcibly. Then the response will return to
upper layer soon.

Thank you.
M.Fukuchi


This event sent from IssueTracker by mmatsuya 
 issue 121161

Comment 7 Issue Tracker 2007-09-13 05:42:11 UTC
Hi there,

As I wrote before, this issue was fixed on 4.6 beta. 
Please set the proper flags and the status.

Regards,


This event sent from IssueTracker by mmatsuya 
 issue 121161