RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 844513 - [hpsa]: hpsa driver hangs at function hpsa_scsi_do_inquiry() during the OS install in MSIx mode.
Summary: [hpsa]: hpsa driver hangs at function hpsa_scsi_do_inquiry() during the OS in...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.3
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Tomas Henzl
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-31 00:31 UTC by Krishna Chaitanya Gudipati
Modified: 2013-05-29 12:02 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-29 12:02:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
hpsa driver hung task (201.55 KB, image/jpeg)
2012-07-31 00:38 UTC, Krishna Chaitanya Gudipati
no flags Details
hpsa driver load before dud install (313.11 KB, image/jpeg)
2012-07-31 00:45 UTC, Krishna Chaitanya Gudipati
no flags Details
hpsa driver load after dud install (333.02 KB, image/jpeg)
2012-07-31 00:48 UTC, Krishna Chaitanya Gudipati
no flags Details

Description Krishna Chaitanya Gudipati 2012-07-31 00:31:28 UTC
Description of problem:

HP Smart Array Controller driver (hpsa)hangs at the function hpsa_scsi_do_inquiry() forever during OS installation only in MSIx mode.

Version-Release number of selected component (if applicable):

HP HPSA Driver v 2.0.2-4

How reproducible:

Easy to reproduce

Steps to Reproduce:

1.Start RHEL6.3 Installation (in default MSIx mode) and pass "linux dd" for invoking driver update.
2.Have a driver update disk to update the inbox drivers.
3.As part of the driver update disk the initrd modules get unload/reloaded.
4.All the other modules get re-loaded fine, but HPSA driver module is struck
  during module re-load when trying to do inquiry using hpsa_scsi_do_inquiry().
5. We see the task hung status every 120 seconds.
6. OS installation is hung because of the failure.

Actual results:

The module reload of HPSA is struck when trying to do inquiry using hpsa_scsi_do_inquiry().

Expected results:

Module unload/reload of HPSA module during the driver update disk should not
hang and block OS install.

Additional info:

We have done a little bit more investigation and looks like the issue might
be with HP (SAS) driver when running in MSIx mode. Looks like there is some 
issue in the module unload/reload of the “hpsa” driver when running in MSIx 
mode. For some reason during the re-load of the hpsa module the interrupts seem 
to be not happening causing the hpsa probe routine to hang.

Steps:

1) With RHEL6.3 before the DUD load we see that inbox BFA driver 3.0.2.2
   getting loaded & also the HPSA driver v2.0.2-4 getting loaded.
2) After DUD load most of the initrd modules get unloaded and reloaded again.
3) After the BFA dud load we see that the BFA gets loaded fine + but the HPSA 
   module load (which was fine before) is hung as the probe routine hangs
   during the device discovery (hpsa_scsi_do_inquiry).

Workaround:
- Booting with kernel param "pci=nomsi" / when running in INTx mode
  HPSA module does not hit this issue.

I am enclosing the screen shots for the defect debug.

Thanks,
Krishna.

Comment 2 Krishna Chaitanya Gudipati 2012-07-31 00:38:59 UTC
Created attachment 601372 [details]
hpsa driver hung task

HPSA driver hung task.

Comment 3 Krishna Chaitanya Gudipati 2012-07-31 00:45:06 UTC
Created attachment 601373 [details]
hpsa driver load before dud install

hpsa: driver load which is fine before dud load.

Comment 4 Krishna Chaitanya Gudipati 2012-07-31 00:48:07 UTC
Created attachment 601374 [details]
hpsa driver load after dud install

hpsa: driver load fails after dud install

Fails to complete the probe and we see the task hung after 120 seconds.

We also see an error message saying:

do_IRQ: 0:116 No irq handler for vector (irq -1).

Comment 5 Tomas Henzl 2012-11-23 12:53:28 UTC
Krishna,
there process of updating the hpsa driver is already in process and a lot of patches is being added. Do you have access to some newer kernel via your partner link, let's kernel-344 or newer? And please retest the issue with this kernel.
Thanks, Tomas

Comment 6 apfeiffe 2012-11-26 16:30:34 UTC
Hello Tomas,
This issue happens at install time so I can't upgrade the kernel that is part of the installer.  Have all of the changes you mentioned been incorporated into Red Hat 6.4?  If so, I could test with the alpha version of 6.4.

Thanks

Comment 7 Tomas Henzl 2012-11-27 12:50:23 UTC
(In reply to comment #6)
> Hello Tomas,
> This issue happens at install time so I can't upgrade the kernel that is
> part of the installer.
You could install the 6.3 with msix disabled and then update the kernel?
>  Have all of the changes you mentioned been
> incorporated into Red Hat 6.4?  If so, I could test with the alpha version
> of 6.4.
The alpha is a -338 kernel, for this test at least a -340 is be needed, the -344 is optimal (the last patch most likely can't play a role in this case).


I think I have missed something in your reproducer scenario
(In reply to comment #0)
> 1.Start RHEL6.3 Installation (in default MSIx mode) and pass "linux dd" for
> invoking driver update.
> 2.Have a driver update disk to update the inbox drivers.
Which driver do you update this way the hpsa or some other driver?
> 3.As part of the driver update disk the initrd modules get unload/reloaded.
> 4.All the other modules get re-loaded fine, but HPSA driver module is struck
>   during module re-load when trying to do inquiry using
> hpsa_scsi_do_inquiry().
......

Comment 8 RHEL Program Management 2012-12-14 08:02:18 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 9 Mike Miller (OS Dev) 2013-04-22 18:23:49 UTC
What version of the dud was used here? I'm not sure if it really matters but that info may be helpful.

Comment 10 Stephen Cameron 2013-04-22 18:32:36 UTC
How many MSI-X vectors does the install media driver use and how many does the dud driver use?  If they are different (e.g. 4 vs. 8) that is likely a clue.

-- steve

Comment 11 Mike Miller (OS Dev) 2013-04-22 18:57:26 UTC
The media driver uses 4 vectors. As far as how many the dud driver uses that will depend on which dud they used. The current dud uses MAX_REPLY_QUEUES (16) msix vectors.

I'm setting this back to more info needed.

Comment 12 Tomas Henzl 2013-04-23 11:48:24 UTC
Steve, Mike,
isn't this a similar issue to bz#949499? I think the conclusion has been, that the problem is in the module unload/load (this is I hope solved for RHEL6.4 and higher) and that a workaround should be used in form of a 'pci=nomsi' option used when the 6.3 kernel starts.

Comment 13 Mike Miller (OS Dev) 2013-04-24 15:15:39 UTC
(In reply to comment #12)
> Steve, Mike,
> isn't this a similar issue to bz#949499? I think the conclusion has been,
> that the problem is in the module unload/load (this is I hope solved for
> RHEL6.4 and higher) and that a workaround should be used in form of a
> 'pci=nomsi' option used when the 6.3 kernel starts.

Since this an install time issue I suppose using pci=nomsi is probably OK. I looked at the media driver more closely and determined that we're not calling pci_disable_device in hpsa_remove_one. Our current assumption is that is the root cause. How will this be documented?

Comment 14 Tomas Henzl 2013-04-24 15:31:31 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Steve, Mike,
> > isn't this a similar issue to bz#949499? I think the conclusion has been,
> > that the problem is in the module unload/load (this is I hope solved for
> > RHEL6.4 and higher) and that a workaround should be used in form of a
> > 'pci=nomsi' option used when the 6.3 kernel starts.
> 
> Since this an install time issue I suppose using pci=nomsi is probably OK. I
> looked at the media driver more closely and determined that we're not
> calling pci_disable_device in hpsa_remove_one. Our current assumption is
> that is the root cause. How will this be documented?

I'll clarify it if it was already documented somewhere and if not we use this bz to document it.

Comment 16 Tomas Henzl 2013-04-26 11:33:25 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Steve, Mike,
> > isn't this a similar issue to bz#949499? I think the conclusion has been,
> > that the problem is in the module unload/load (this is I hope solved for
> > RHEL6.4 and higher) and that a workaround should be used in form of a
> > 'pci=nomsi' option used when the 6.3 kernel starts.
> 
> Since this an install time issue I suppose using pci=nomsi is probably OK. I
> looked at the media driver more closely and determined that we're not
> calling pci_disable_device in hpsa_remove_one. Our current assumption is
> that is the root cause. How will this be documented?

I just learnt that this is going to be documented in RHEL6.3 and 6.4 Technical Notes.

Comment 17 Tomas Henzl 2013-05-29 12:02:56 UTC
I'm closing this bz, the issue is fixed for RHEL6.4 and a workaround, the use of a kernel option pci=nomsi, is documented in Technical Notes.


Note You need to log in before you can comment on or make changes to this bug.