Bug 844513 - [hpsa]: hpsa driver hangs at function hpsa_scsi_do_inquiry() during the OS install in MSIx mode.
[hpsa]: hpsa driver hangs at function hpsa_scsi_do_inquiry() during the OS in...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
All Linux
unspecified Severity high
: rc
: ---
Assigned To: Tomas Henzl
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-30 20:31 EDT by Krishna Chaitanya Gudipati
Modified: 2013-05-29 08:02 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-05-29 08:02:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
hpsa driver hung task (201.55 KB, image/jpeg)
2012-07-30 20:38 EDT, Krishna Chaitanya Gudipati
no flags Details
hpsa driver load before dud install (313.11 KB, image/jpeg)
2012-07-30 20:45 EDT, Krishna Chaitanya Gudipati
no flags Details
hpsa driver load after dud install (333.02 KB, image/jpeg)
2012-07-30 20:48 EDT, Krishna Chaitanya Gudipati
no flags Details

  None (edit)
Description Krishna Chaitanya Gudipati 2012-07-30 20:31:28 EDT
Description of problem:

HP Smart Array Controller driver (hpsa)hangs at the function hpsa_scsi_do_inquiry() forever during OS installation only in MSIx mode.

Version-Release number of selected component (if applicable):

HP HPSA Driver v 2.0.2-4

How reproducible:

Easy to reproduce

Steps to Reproduce:

1.Start RHEL6.3 Installation (in default MSIx mode) and pass "linux dd" for invoking driver update.
2.Have a driver update disk to update the inbox drivers.
3.As part of the driver update disk the initrd modules get unload/reloaded.
4.All the other modules get re-loaded fine, but HPSA driver module is struck
  during module re-load when trying to do inquiry using hpsa_scsi_do_inquiry().
5. We see the task hung status every 120 seconds.
6. OS installation is hung because of the failure.

Actual results:

The module reload of HPSA is struck when trying to do inquiry using hpsa_scsi_do_inquiry().

Expected results:

Module unload/reload of HPSA module during the driver update disk should not
hang and block OS install.

Additional info:

We have done a little bit more investigation and looks like the issue might
be with HP (SAS) driver when running in MSIx mode. Looks like there is some 
issue in the module unload/reload of the “hpsa” driver when running in MSIx 
mode. For some reason during the re-load of the hpsa module the interrupts seem 
to be not happening causing the hpsa probe routine to hang.

Steps:

1) With RHEL6.3 before the DUD load we see that inbox BFA driver 3.0.2.2
   getting loaded & also the HPSA driver v2.0.2-4 getting loaded.
2) After DUD load most of the initrd modules get unloaded and reloaded again.
3) After the BFA dud load we see that the BFA gets loaded fine + but the HPSA 
   module load (which was fine before) is hung as the probe routine hangs
   during the device discovery (hpsa_scsi_do_inquiry).

Workaround:
- Booting with kernel param "pci=nomsi" / when running in INTx mode
  HPSA module does not hit this issue.

I am enclosing the screen shots for the defect debug.

Thanks,
Krishna.
Comment 2 Krishna Chaitanya Gudipati 2012-07-30 20:38:59 EDT
Created attachment 601372 [details]
hpsa driver hung task

HPSA driver hung task.
Comment 3 Krishna Chaitanya Gudipati 2012-07-30 20:45:06 EDT
Created attachment 601373 [details]
hpsa driver load before dud install

hpsa: driver load which is fine before dud load.
Comment 4 Krishna Chaitanya Gudipati 2012-07-30 20:48:07 EDT
Created attachment 601374 [details]
hpsa driver load after dud install

hpsa: driver load fails after dud install

Fails to complete the probe and we see the task hung after 120 seconds.

We also see an error message saying:

do_IRQ: 0:116 No irq handler for vector (irq -1).
Comment 5 Tomas Henzl 2012-11-23 07:53:28 EST
Krishna,
there process of updating the hpsa driver is already in process and a lot of patches is being added. Do you have access to some newer kernel via your partner link, let's kernel-344 or newer? And please retest the issue with this kernel.
Thanks, Tomas
Comment 6 apfeiffe 2012-11-26 11:30:34 EST
Hello Tomas,
This issue happens at install time so I can't upgrade the kernel that is part of the installer.  Have all of the changes you mentioned been incorporated into Red Hat 6.4?  If so, I could test with the alpha version of 6.4.

Thanks
Comment 7 Tomas Henzl 2012-11-27 07:50:23 EST
(In reply to comment #6)
> Hello Tomas,
> This issue happens at install time so I can't upgrade the kernel that is
> part of the installer.
You could install the 6.3 with msix disabled and then update the kernel?
>  Have all of the changes you mentioned been
> incorporated into Red Hat 6.4?  If so, I could test with the alpha version
> of 6.4.
The alpha is a -338 kernel, for this test at least a -340 is be needed, the -344 is optimal (the last patch most likely can't play a role in this case).


I think I have missed something in your reproducer scenario
(In reply to comment #0)
> 1.Start RHEL6.3 Installation (in default MSIx mode) and pass "linux dd" for
> invoking driver update.
> 2.Have a driver update disk to update the inbox drivers.
Which driver do you update this way the hpsa or some other driver?
> 3.As part of the driver update disk the initrd modules get unload/reloaded.
> 4.All the other modules get re-loaded fine, but HPSA driver module is struck
>   during module re-load when trying to do inquiry using
> hpsa_scsi_do_inquiry().
......
Comment 8 RHEL Product and Program Management 2012-12-14 03:02:18 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 9 Mike Miller (OS Dev) 2013-04-22 14:23:49 EDT
What version of the dud was used here? I'm not sure if it really matters but that info may be helpful.
Comment 10 Stephen Cameron 2013-04-22 14:32:36 EDT
How many MSI-X vectors does the install media driver use and how many does the dud driver use?  If they are different (e.g. 4 vs. 8) that is likely a clue.

-- steve
Comment 11 Mike Miller (OS Dev) 2013-04-22 14:57:26 EDT
The media driver uses 4 vectors. As far as how many the dud driver uses that will depend on which dud they used. The current dud uses MAX_REPLY_QUEUES (16) msix vectors.

I'm setting this back to more info needed.
Comment 12 Tomas Henzl 2013-04-23 07:48:24 EDT
Steve, Mike,
isn't this a similar issue to bz#949499? I think the conclusion has been, that the problem is in the module unload/load (this is I hope solved for RHEL6.4 and higher) and that a workaround should be used in form of a 'pci=nomsi' option used when the 6.3 kernel starts.
Comment 13 Mike Miller (OS Dev) 2013-04-24 11:15:39 EDT
(In reply to comment #12)
> Steve, Mike,
> isn't this a similar issue to bz#949499? I think the conclusion has been,
> that the problem is in the module unload/load (this is I hope solved for
> RHEL6.4 and higher) and that a workaround should be used in form of a
> 'pci=nomsi' option used when the 6.3 kernel starts.

Since this an install time issue I suppose using pci=nomsi is probably OK. I looked at the media driver more closely and determined that we're not calling pci_disable_device in hpsa_remove_one. Our current assumption is that is the root cause. How will this be documented?
Comment 14 Tomas Henzl 2013-04-24 11:31:31 EDT
(In reply to comment #13)
> (In reply to comment #12)
> > Steve, Mike,
> > isn't this a similar issue to bz#949499? I think the conclusion has been,
> > that the problem is in the module unload/load (this is I hope solved for
> > RHEL6.4 and higher) and that a workaround should be used in form of a
> > 'pci=nomsi' option used when the 6.3 kernel starts.
> 
> Since this an install time issue I suppose using pci=nomsi is probably OK. I
> looked at the media driver more closely and determined that we're not
> calling pci_disable_device in hpsa_remove_one. Our current assumption is
> that is the root cause. How will this be documented?

I'll clarify it if it was already documented somewhere and if not we use this bz to document it.
Comment 16 Tomas Henzl 2013-04-26 07:33:25 EDT
(In reply to comment #13)
> (In reply to comment #12)
> > Steve, Mike,
> > isn't this a similar issue to bz#949499? I think the conclusion has been,
> > that the problem is in the module unload/load (this is I hope solved for
> > RHEL6.4 and higher) and that a workaround should be used in form of a
> > 'pci=nomsi' option used when the 6.3 kernel starts.
> 
> Since this an install time issue I suppose using pci=nomsi is probably OK. I
> looked at the media driver more closely and determined that we're not
> calling pci_disable_device in hpsa_remove_one. Our current assumption is
> that is the root cause. How will this be documented?

I just learnt that this is going to be documented in RHEL6.3 and 6.4 Technical Notes.
Comment 17 Tomas Henzl 2013-05-29 08:02:56 EDT
I'm closing this bz, the issue is fixed for RHEL6.4 and a workaround, the use of a kernel option pci=nomsi, is documented in Technical Notes.

Note You need to log in before you can comment on or make changes to this bug.