Bug 602714

Summary: megaraid_sas: fix physical disk handling
Product: Red Hat Enterprise Linux 5 Reporter: Bryn M. Reeves <bmr>
Component: kernelAssignee: Tomas Henzl <thenzl>
Status: CLOSED DUPLICATE QA Contact: Storage QE <storage-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.5CC: andriusb, bdonahue, bo.yang, bubrown, coughlan, dhoward, jpirko, jwest, ltroan, martin.wilck, mchristi, moshiro, revers, tao, vgoyal
Target Milestone: rc   
Target Release: 5.6   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 577178 Environment:
Last Closed: 2010-07-29 10:53:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
add the online controller reset to the driver
none
submit the patch based on rhel5.3z none

Description Bryn M. Reeves 2010-06-10 14:46:18 UTC
+++ This bug was initially created as a clone of Bug #577178 +++

Description of problem:
The megaraid_sas driver in RHEL4 has a problem with handling physical disks and management ioctls; all physical disks are exported to the disk layer allowing an oops in megasas_complete_cmd_dpc when completing the ioctl command if a timeout occurs.

The megasas_mgmt_fw_ioctl constructs a megasas_cmd struct with a null cmd->scmd field and hands this to the adapter via megasas_issue_blocked_cmd() (setting cmd->sync_cmd to 1 to prevent the ISR from completing the command to the mid-layer, e.g.:

crash-4.0-6.3> struct megasas_cmd 0000010037dd4980 | less
 struct megasas_cmd {
   frame = 0x10000051800,
   frame_phys_addr = 333824,
   sense = 0x1000004cb00 "",
   sense_phys_addr = 314112,
   index = 566,
   sync_cmd = 0 '\0', <-- cleared by megasas_mgmt_fw_ioctl() after timeout
   cmd_status = 61 '=',
   abort_aen = 0,
   list = {
     next = 0x10037dd4228,
     prev = 0x100052916a8
   },
   scmd = 0x0, <-- scmd is NULL 
   instance = 0x10237439248,
   frame_count = 2
 }

Once submitted the driver uses wait_event_timeout to wait for the command to complete. If the timeout fires sync_cmd is cleared and megasas_complete_cmd() is called to complete the command.

megasas_complete_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd,
                      u8 alt_status)
 {
[...]
                 /*
                  * MFI_CMD_PD_SCSI_IO and MFI_CMD_LD_SCSI_IO could have been
                  * issued either through an IO path or an IOCTL path. If it
                  * was via IOCTL, we will send it to internal completion.
                  */
                 if (cmd->sync_cmd) {
                         cmd->sync_cmd = 0;
                         megasas_complete_int_cmd(instance, cmd);
                         break;
                 }

                 /*
                  * Don't export physical disk devices to mid-layer.
                  */
                 if (!MEGASAS_IS_LOGICAL(cmd->scmd) && ***** crash *****
                     (hdr->cmd_status == MFI_STAT_OK) &&
                     (cmd->scmd->cmnd[0] == INQUIRY)) {

                         if (((*(u8 *) cmd->scmd->request_buffer) & 0x1F) ==
                             TYPE_DISK) {
                                 cmd->scmd->result = DID_BAD_TARGET << 16;
                                 exception = 1;
                         }
                 }
[...]

Since sync_cmd is already 0 the code proceeds to MEGASAS_IS_LOGICAL(cmd->scmd) and oopses on the NULL cmd->scmd member.

Upstream deleted much of the above code in the following commit:

Chandra_Nelogal noticed that megaraid_sas currently exports all physical
disks normally to the disk layer, which is obviously quite bad.

The problems is that megaraid_sas is doing inquiry sniffing, and since
2.6.15 inquiry commands are sent down as one-element scatterlists on
which the code in the driver doesn't work anymore.  The right place to
keep the scsi midlayer from attaching to a device is the slave_alloc
method in the host template.  To completely prevent attaching the method
needs to return -ENXIO, but the patch below sets the no_uld_attach flag
instead which prevents upper level drivers from attaching while still
allowing scsi generic access to it, as in other raid HBA drivers.

commit 147aab6aa22ce7775be944f8fb9932aa000dda61
Author: Christoph Hellwig <hch>
Date:   Fri Feb 17 12:13:48 2006 +0100

    [SCSI] megaraid_sas: fix physical disk handling
    
    This patch hides the devices completely from the midlayer instead.
    It requires the patch to handle the slave_configure failure I posted
    earlier.
    
    Signed-off-by: Christoph Hellwig <hch>
    Signed-off-by: James Bottomley <James.Bottomley>



Version-Release number of selected component (if applicable):
2.6.9-89.EL

How reproducible:
Difficult - requires the megasas_issue_blocked_cmd timeout to fire. The system where this was observed was under severe memory pressure at the time of the crash.

Steps to Reproduce:
1. Issue management ioctls to disk devices on megaraid_sas controller
2. Generate high system/ I/O load to try to provoke timeout (may be triggerable with the SCSI fault injection framework, not tested).
3.
  
Actual results:
Oops in megasas_complete_cmd_dpc

Expected results:
No oops even under sever load

Additional info:
Fixed in commit 147aab6aa22ce7775be944f8fb9932aa000dda61

Comment 4 bo yang 2010-06-10 16:57:41 UTC
This issue already fixed in our latest driver (4.27 or 4.30).  We already submited the patch to rhel5.6 and rhel6.0. (the one come with the online controller reset -- OCR added)

Comment 5 Bryn M. Reeves 2010-06-15 10:11:54 UTC
This is open as a separate bug because it has been reported on earlier releases; since we cannot include a wholesale driver update in EUS packages (targeted fixes only) it needs to be tracked separately.

Comment 6 Issue Tracker 2010-06-16 03:07:09 UTC
Event posted on 06-16-2010 12:07pm JST by moshiro

Hi Bo, 

Following is from Tokunaga-san, Fujitsu. Could you please kindly reply to
him?
---
Looking at the patchset provided by LSI for 5.6 (bug 564249 Comment 17),
it turns out it's not a whole version-up patchset, but it's a minimized
one for 5.6, thanks to Bo's efforts on this.  So, we suppose it won't be
really difficult to identify pieces for 5.3 hotfix.

Bo,

Could you please identify pieces for 5.3 hotfix as soon as possible?  One
of Fujitsu's customers has requested a 5.3 hotfix and it is quite
urgent.

Kei Tokunaga 
---


This event sent from IssueTracker by moshiro 
 issue 1000913

Comment 8 Tom Coughlan 2010-06-20 19:25:37 UTC
Bo,

Please post the specific fix for the bug described in this BZ, backported from your latest driver (4.27 or 4.30), ready to go in 5.5.z, 5.4.z, and 5.3.z.

Tom

Comment 10 bo yang 2010-07-02 17:03:46 UTC
Created attachment 429116 [details]
add the online controller reset to the driver

1.      Add the fix of the kernel panic if the applocation cmds take too long (> 3 mins).

2.      Add the ontime controller reset to the driver (this is need for fixing item 1).

        A. for online controller reset, driver added the chip reset functions.

        B. In driver's ISR function, driver will receive FW state change interrupt
        plus fw in failed state to trig the driver do the Online Controller Reset.

        C. during the FW reset time, driver will save the pending cmds to the internal
        queue and re-fire those cmds after the OCR finished.

        D. In driver's ioctl routine, the cmds from application should wait for the OCR
        to finish to issue the cmds.

        D. If driver's timeout routine get called during the OCR, driver will return the OS
        as reset.

Comment 12 Dwight (Bud) Brown 2010-07-02 18:06:31 UTC
(In reply to comment #10)

Bo,

These changes appear to be on top of other changes currently missing from the 5.3(z) version of the driver.

For example none of the megasas*skinny routines exist in the 5.3z driver.  There are some other changes that indicate other missing patches are needed prior to applying these changes.  Is there a more comprehensive patch available?

bud brown

Comment 13 bo yang 2010-07-02 18:14:58 UTC
I am using rhel5.5 as the base code.  If you want to apply to rhel5.3 or rhel5.4.

Can you send me the base src by e-mail and I will create the patches (which will be fast).  Otherwise, I need to download the src from rhel5.3 and rhel5.4 to make the patches.  

Just send me megaraid_sas.c and megaraid_sas.h.

Comment 14 bo yang 2010-07-02 20:27:21 UTC
Created attachment 429157 [details]
submit the patch based on rhel5.3z

recreate the patch based on 5.3z

Comment 18 Issue Tracker 2010-07-05 05:14:49 UTC
Event posted on 07-05-2010 02:14pm JST by moshiro

Hi, 

Could you please answer the question From FJ? 
---
From bug 602714:
>  bo yang      2010-07-02 16:27:21 EDT
>
> Created an attachment (id=429157) [details]
> submit the patch based on rhel5.3z
>
> recreate the patch based on 5.3z   

We built a kernel with the patch applied and started a test on it.  It's
been about two days and no kernel panics have been seen.  What we did is
create 32 processes and each process issued an ioctol continuously.  We
saw no adapter resets during the test.

However, there is still one thing we'd like to clarify.  If a hardware
failure happens with a megaraid_sas adapter, the adapter likely is unable
to send a response back to the OS.  In such a case, the OS has to detect
it and notify the apps so that the apps can either retry the ioctl or
initiates a fail-over process.  But, with the patch, it looks there is no
way for the OS to detect the failure.  (Before, with wait_event_timeout(),
the OS was able to detect a hardware failure.)  Could you explain how to
handle such a scenario after applying the patch?

We didn't do any normal IOs during the test.  If we did, because some of
the ioctls occasionally waited for a response for 20 minutes or more, the
normal IOs would sleep as well due to the stacked ioctls.  Then, the
common SCSI layer would detect it and initiate an adapter reset?  If
that's the case, we will do the test again with normal IOs.

> We have created a test package. Could you please verify it and give us
your feedback?
>
> http://people.redhat.com/moshiro/1000913/

Thank you.  We'll use it from the next time for testing.
---

Best Regards,
M Oshiro

Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by moshiro 
 issue 1000913

Comment 19 Moritoshi Oshiro 2010-07-05 08:26:03 UTC
setting needinfo by a request from FJ.

Comment 20 Moritoshi Oshiro 2010-07-06 00:45:24 UTC
setting needinfo - bo.yang.

Comment 21 Moritoshi Oshiro 2010-07-07 00:21:27 UTC
From FJ:
=================================================================
Hi Bo,

We downloaded the test kernel Red Hat made, including your fix, and we installed it into our machine.

We ran the test again on the kernel with running relatively many normal IOs continuously this time.  But, the results are the same.  We saw no adapter resets.  (We modified the kernel to have an interface to trigger adapter reset and tried it.  An adapter reset took about 30 secs to complete and any IO related works were not able to perform during the reset.  Also, a reset outputs some printk messages.  So, it's easy to notice when a reset runs.)

We had a chance to talk with our hardware team.  They gave us an information about OCR feature.  They explained that OCR was originally introduced for relatively old adapters that have a bug in their chipset (BZ563083).  They also informed us that MegaRAID SAS 8880EM2, which is installed in the customer's systems, doesn't have the bug, and therefore it disables the adapter reset feature in its firmware by default.  That means even the megaraid_sas driver with OCR feature won't initiate OCR anyways on the customer systems.

That leads us to the question again: what happens when a something wrong with the firmware or hardware occurs on the adapter and a response to ioctl never comes back?  We thought OCR would save such a situation, but that might not be the case?

Kei Tokunaga
=================================================================

Comment 22 Moritoshi Oshiro 2010-07-08 06:13:17 UTC
Bo-san, could you please kindly reply?

Another comment from FJ:
We've been doing some testing and investigations for a fix and
got some questions.  We'd like to obtain information/answers from LSI.  (This list includes some questions Fujitsu asked before.)

1) We had a chance to talk with our hardware team.  They gave us
  an information about OCR feature.  They explained that OCR was
  originally introduced for relatively old adapters that have a
  bug in their chipset (BZ563083).  They also informed us that
  MegaRAID SAS 8880EM2, which is installed in the customer's
  systems, doesn't have the bug, and therefore it disables the
  adapter reset feature in its firmware by default.  That means
  even the megaraid_sas driver with OCR feature won't initiate
  OCR anyways on the customer systems.  Is that true?  And, this
  question will be broke down to two technical questions.

 1-1) megasas_wait_for_outstanding() sees
      instance->disableOnlineCtrlReset, which we believe is
      firmware setting, to determine whether or not it should
      call megasas_do_ocr().  Is it set on MegaRAID SAS 8880EM2?

 1-2) If instance->diableOnlineCtrlReset is set, what happens
      when we call megasas_do_ocr() directly?  The firmware
      won't do a reset?

2) OCR will be initiated when one of the following conditions is
  met.
  a) SCSI common layer detects timeout.
  b) The firmware reports a failure (the firmware is in failure
     state) to the driver.

 What kind of situations will the firmware report a failure to
 the driver?

3) What is the scenario when a response for ioctl never comes
  back due to a firmware/hardware failure?

4) Per our investigation, MegaRAID SAS 8880EM2 uses ppc functions
 (not xscale, and gen2).  Per the source code,
 megasas_adp_reset_ppc() does nothing.  Adapter reset (OCR) is
 not supported on ppc machines such as MegaRAID SAS 8880EM2? 
---

Best Regards,
Moritoshi Oshiro

Comment 23 bo yang 2010-07-12 23:43:40 UTC
I went to vacation and just get the chance to see the questions:

>1) We had a chance to talk with our hardware team.  They gave us
>  an information about OCR feature.  They explained that OCR was
>  originally introduced for relatively old adapters that have a
>  bug in their chipset (BZ563083).  They also informed us that
>  MegaRAID SAS 8880EM2, which is installed in the customer's
>  systems, doesn't have the bug, and therefore it disables the
>  adapter reset feature in its firmware by default.  That means
>  even the megaraid_sas driver with OCR feature won't initiate
>  OCR anyways on the customer systems.  Is that true?  And, this
>  question will be broke down to two technical questions.

In the driver and FW, OCR implemeneted for XScale and 2108 chip (Gen2).  For skinny and PPC chip, FW will not support OCR now.

>1-1) megasas_wait_for_outstanding() sees
>      instance->disableOnlineCtrlReset, which we believe is
>      firmware setting, to determine whether or not it should
>      call megasas_do_ocr().  Is it set on MegaRAID SAS 8880EM2?

FW need to set the flag to support the OCR.
MegaRAID SAS 8880EM2 used PPC chip which will not have the OCR support in FW.

> 1-2) If instance->diableOnlineCtrlReset is set, what happens
>      when we call megasas_do_ocr() directly?  The firmware
>      won't do a reset?

If the flag is set and call megasas_do_ocr, fw will do reset.

>2) OCR will be initiated when one of the following conditions is
>  met.
>  a) SCSI common layer detects timeout.
>  b) The firmware reports a failure (the firmware is in failure
>     state) to the driver.

OCR will be be initiated if:

a) FW set OCR flag, generated the FW state change interrupt and FW in failed state (MFI state).
b}. SCSI layer detected timeout, FW set OCR flag, and driver detected  FW in failed state (MFI state).

>3) What is the scenario when a response for ioctl never comes
>  back due to a firmware/hardware failure?

For Xscale and Gen2 chips, driver will do OCR try to bring the HW/FW back.  ioctl cmds may take long time to return.  For PPC chip, if FW/HW failed, controller will be killed.     

>4) Per our investigation, MegaRAID SAS 8880EM2 uses ppc functions
> (not xscale, and gen2).  Per the source code,
> megasas_adp_reset_ppc() does nothing.  Adapter reset (OCR) is
> not supported on ppc machines such as MegaRAID SAS 8880EM2?

For MegaRAID SAS 8880EM2, OCR will not be supported by FW and driver.

Comment 24 bo yang 2010-07-19 08:13:39 UTC
For rhel 5.4z and rhel5.5z, does the customer only want to apply this patch to MegaRAID SAS 8880EM2 controller (PPC) or other type of controllers (XScale and Gen2) also will be applied?

If it is only applied to MegaRAID SAS 8880EM2, we can create the minimal patch.

Thanks,

Bo Yang

Comment 25 Issue Tracker 2010-07-21 05:55:07 UTC
Event posted on 07-21-2010 02:55pm JST by moshiro

Hi Bo-san,

FJ has replied to your last comment as below:
---
> For rhel 5.4z and rhel5.5z, does the customer only want to
> apply this patch to MegaRAID SAS 8880EM2 controller (PPC)
> or other type of controllers (XScale and Gen2) also will
> be applied?
>
> If it is only applied to MegaRAID SAS 8880EM2, we can create
> the minimal patch.

Some customers are using XScale and Gen2 controllers, so please include
the OCR feature in the patch.

Kei Tokunaga
---

Thanks.

Best Regards,
M Oshiro

Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by moshiro 
 issue 1000913

Comment 26 Issue Tracker 2010-07-22 03:24:02 UTC
Event posted on 07-22-2010 12:24pm JST by moshiro

Hi Bo-san,

Following is from FJ:
---
>> For rhel 5.4z and rhel5.5z, does the customer only want to
>> apply this patch to MegaRAID SAS 8880EM2 controller (PPC)
>> or other type of controllers (XScale and Gen2) also will
>> be applied?
>>
>> If it is only applied to MegaRAID SAS 8880EM2, we can create
>> the minimal patch.
>
> Some customers are using XScale and Gen2 controllers, so please include
the OCR feature in the patch.

Bo,

Reading your comments again, I noticed you only talked about 5.4.z and
5.5.z there, but not 4.9 or 4.8.z explicitly.  Could you please add the
OCR feature to a fix for 4.9 and 4.8.z as well?

Kei Tokunaga 
---

Thanks. 

Moritoshi


This event sent from IssueTracker by moshiro 
 issue 1000913

Comment 30 Tomas Henzl 2010-07-22 16:39:10 UTC
Hi Bo,
sorry to be a pain, but this is hot - could you post comment on comment#25?

Comment 31 bo yang 2010-07-23 17:07:04 UTC
Kei,

I am in China now and am traving back.  Just let you know to port OCR to 5.4z, 5.5z and 4.xz, we need to spend at least one and half days to do each (no other interrupt come).  I may start the porting after I back to office next Monday or Tuesday.  rhel 4.xz will be delayed until finished 5.x.

Bo Yang

Comment 32 Issue Tracker 2010-07-26 08:12:32 UTC
Event posted on 2010-07-26 17:12 JST by myamazak

Hello Bo-san,

I've got a response from FJ.
----------------------------------------------------------------------
Fujitsu already confirmed the hotfix works fine.

Fujitsu is waiting for errata that are supposed to come out on 10th Aug. 
If there are any concerns with the date, please let us know.
----------------------------------------------------------------------

Best regards,
M Yamazaki



This event sent from IssueTracker by myamazak 
 issue 1000913

Comment 33 Tomas Henzl 2010-07-26 14:57:20 UTC
(In reply to comment #32)
> ----------------------------------------------------------------------
> Fujitsu already confirmed the hotfix works fine.
> 
> Fujitsu is waiting for errata that are supposed to come out on 10th Aug. 
> If there are any concerns with the date, please let us know.
> ----------------------------------------------------------------------
> 
> Best regards,
> M Yamazaki

Based on comment#25, I think Fujitsu still wants the OCR support for all controllers. Is it so?

Comment 34 Issue Tracker 2010-07-27 03:38:58 UTC
Event posted on 2010-07-27 12:38 JST by myamazak

Hi all,

FJ gave us a summary of the status of this issue. If anyone has any
comments, please let me know.
----------------------------------------------------------------------
Here is a summary of the status of this issue.

- This ticket has been used for 5.3hotfix and 5.3.z, and now used for
5.5.z and 5.6 as well.

- 5.3hotfix was provided and Fujitsu confirmed it works fine.

- 5.3.z errata release is planned for 10th Aug.

- Fujitsu requested LSI to provide a fix patch with OCR for 5.5.z, 5.6,
4.8.z, and 4.9 and LSI acknowledged it.  (4.8.z and 4.9 are handled on
IT604473)

Please give it a correction if there is anything inaccurate.

Kei Tokunaga
----------------------------------------------------------------------

Regards,
M Yamazaki



This event sent from IssueTracker by myamazak 
 issue 1000913

Comment 40 bo yang 2010-07-28 14:15:09 UTC
Can Fujitsu confirm rhel5.3z works fine?  

Also for rhel5.4z, rhel5.5z, rhel4.8z and rhel4.9z, we are waiting for the feedback from Fujitsu.

Thanks,

Bo Yang

Comment 41 Tomas Henzl 2010-07-28 15:03:40 UTC
The test kernel is posted on http://people.redhat.com/thenzl/602714/ 
if you want builds for other archs, please let me know.

Comment 42 Larry Troan 2010-07-28 15:12:21 UTC
Setting NEEDINFO=tmuneda per comment #40 and comment #41 above.

Comment 43 Issue Tracker 2010-07-29 00:47:16 UTC
Event posted on 2010-07-29 09:47 JST by myamazak

Here is a response from FJ.
----------------------------------------------------------------------
Bo wrote:
> Can Fujitsu confirm rhel5.3z works fine?

We sure will.

> Also for rhel5.4z, rhel5.5z, rhel4.8z and rhel4.9z, we are
> waiting for the feedback from Fujitsu.

Do you mean you need to wait for feedback from Fujitsu on 5.3.z to start
development of 5.5.z, 5.4.z, 4.8.z and 4.9?  Or, do you mean you are
looking for some other information?

Tomas wrote:
> The test kernel is posted on http://people.redhat.com/thenzl/602714/

Thank you for the packages.  We will start testing.  Just one thing to
make sure: they are test packages of 5.3.z, right?

Kei Tokunaga
----------------------------------------------------------------------



This event sent from IssueTracker by myamazak 
 issue 1000913

Comment 44 Tomas Henzl 2010-07-29 10:53:59 UTC
This issue is fixed by driver update - bz#564249.

*** This bug has been marked as a duplicate of bug 564249 ***

Comment 45 Tomas Henzl 2010-07-29 12:23:56 UTC
(In reply to comment #43)
> 
> Tomas wrote:
> > The test kernel is posted on http://people.redhat.com/thenzl/602714/
> 
> Thank you for the packages.  We will start testing.  Just one thing to
> make sure: they are test packages of 5.3.z, right?

Yes it's 5.3.z

Comment 46 Tomas Henzl 2010-07-29 12:29:44 UTC
(In reply to comment #40)
> Can Fujitsu confirm rhel5.3z works fine?  
> 
> Also for rhel5.4z, rhel5.5z, rhel4.8z and rhel4.9z, we are waiting for the
> feedback from Fujitsu.
> 

Bo,
when the patch is accepted for 5.3.z, then it is a must to have it in also for 5.4.z, 5.5.z. Not having it here would create a regression.
On the other side chances are good that we can use the 5.3.z patch for that.
For 5.4.z is  bz#619363, for 5.5.z is bz#619365