Bug 451586 - RHEL5.3: SB600/700 SATA controller PMP support
Summary: RHEL5.3: SB600/700 SATA controller PMP support
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: David Milburn
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 364381
TreeView+ depends on / blocked
 
Reported: 2008-06-16 02:03 UTC by Shane Huang
Modified: 2018-10-19 23:54 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 20:10:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
SB600/700 SATA PMP support for RHEL5.2 (5.45 KB, patch)
2008-06-16 02:26 UTC, Shane Huang
no flags Details | Diff
boot fails when HDD and ODD are being used, without any PMP device (19.27 KB, text/plain)
2008-08-13 07:14 UTC, Shane Huang
no flags Details
boot ok when only one HDD is used, without any PMP device (19.27 KB, text/plain)
2008-08-13 07:17 UTC, Shane Huang
no flags Details
boot ok when only one HDD is used, without any PMP device (17.76 KB, text/plain)
2008-08-13 07:21 UTC, Shane Huang
no flags Details
kernel panic still exists with kernel -102.el5.bz451586.2 (20.92 KB, text/plain)
2008-08-14 07:18 UTC, Shane Huang
no flags Details
More debug info with kernel -102.el5.bz451586.3 x86_64 (19.70 KB, text/plain)
2008-08-15 01:40 UTC, Shane Huang
no flags Details
More debug info with kernel -102.el5.bz451586.2 (157.52 KB, text/plain)
2008-08-15 03:15 UTC, Shane Huang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Shane Huang 2008-06-16 02:03:23 UTC
Description of problem:

There is one bug in ATI SATA PMP of SB600 and SB700 old revision, 
which leadsto soft reset failure. This patch can fix the bug.

patch to kernel upstream has been acceopted, which is:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=bd17243a84632465f5403bc9eb8b4831bd67e582

But it need backport for RHEL5.3

Comment 1 Shane Huang 2008-06-16 02:22:58 UTC
I find that Jeff backported SATA driver from 2.6.24-rcX to RHEL5.2
with the patch linux-2.6-sata-rhel5-2-driver-update, which includes the
SATA PMP feature support.

I'm backporting the SB600/700 SATA PMP patch on the base of RHEL5.2,
finding that there is much difference between 2.6.26-rcX and RHEL5.2 AHCI
driver.

I wonder whether RedHat will continue to update SATA driver to 2.6.26-rcX for 
RHEL5.3? So that the upstream patch only need little backport.

Otherwise I have to add more codes for my backport, some extra flags have
to be added.  Please review my attached backported patch for RHEL5.2(kernel
2.6.18-92) and give me your suggestion.


Comment 2 Shane Huang 2008-06-16 02:26:45 UTC
Created attachment 309434 [details]
SB600/700 SATA PMP support for RHEL5.2

It need more update if RedHat update AHCI driver to 2.6.26 for RHEL5.3

Comment 3 Bhavna Sarathy 2008-06-16 14:51:37 UTC
Russ, please add to RHEL5.3 tracker.

Comment 4 Bhavna Sarathy 2008-06-16 15:24:55 UTC
I mean the AMD RHEL5.3 tracker

Comment 5 Russell Doty 2008-06-16 15:55:29 UTC
Requesting this fix be included included in RHEL 5.3.

AMD: we try to avoid rebasing drivers within a RHEL release. We have a strong
preference for backporting selected changes.

Comment 6 Shane Huang 2008-06-17 01:28:24 UTC
> AMD: we try to avoid rebasing drivers within a RHEL release. 
> We have a strong preference for backporting selected changes.

I know, but since you are always updating SATA ahci driver just like RHEL5.2,
and the pmp patch depends heavily on the ahci driver update, we want to know
your schedule of of RHEL5.3 on ahci update.

There is also another solution:
When Jeff or other RedHat guys updates the ahci driver for RHEL5.3, 
please also backport the SBX00 pmp patch together(upstream commit already
provided), instead of using my backported one in comment #2.

Thanks


Comment 7 Russell Doty 2008-07-30 15:11:24 UTC
How and where are SATA port multipliers used? Do we have the hardware at Red Hat
to test this?

Comment 8 Shane Huang 2008-08-01 07:19:26 UTC
SATA PMP can be used if the SATA ports on the motherboard are limited
and user need more ports, a little like USB hub device.
A SATA PMP hardware device is necessary for testing SATA PMP feature.

Comment 9 David Milburn 2008-08-12 00:35:31 UTC
This updates the ahci driver to 2.6.26-rc5 plus

commit bd17243a84632465f5403bc9eb8b4831bd67e582
Author: Shane Huang <shane.huang>
Date:   Tue Jun 10 15:52:04 2008 +0800

    ahci: Workaround HW bug for SB600/700 SATA controller PMP support

Would you please test kernel-2.6.18-102.el5.bz451586.1? Thanks.

http://people.redhat.com/dmilburn

Comment 10 Shane Huang 2008-08-12 01:16:40 UTC
David, I can ask our QA to test the PMP with your kernel,
and can you also share us the .src.rpm package?
Thanks

Comment 11 Shane Huang 2008-08-12 08:05:10 UTC
David, your testing kernel can NOT work with one PMP device(kernel panic)
Can you share us the source rpm package? We can check the code first.

Comment 12 David Milburn 2008-08-12 12:46:05 UTC
Shane, the kernel-2.6.18-102.el5.bz451586.1.src.rpm is 

http://people.redhat.com/dmilburn/

Would you please post the stack trace when the system panics? Thanks.

Comment 13 Shane Huang 2008-08-13 07:14:59 UTC
Created attachment 314171 [details]
boot fails when HDD and ODD are being used, without any PMP device

Comment 14 Shane Huang 2008-08-13 07:17:09 UTC
Created attachment 314172 [details]
boot ok when only one HDD is used, without any PMP device

Comment 15 Shane Huang 2008-08-13 07:20:12 UTC
Comment on attachment 314172 [details]
boot ok when only one HDD is used, without any PMP device

posted the wrong log file by mistake

Comment 16 Shane Huang 2008-08-13 07:21:53 UTC
Created attachment 314174 [details]
boot ok when only one HDD is used, without any PMP device

Comment 17 Shane Huang 2008-08-13 09:53:03 UTC
The error should be caused by your ahci driver porting, because my backported
PMP patch above can work on the base of kernel -103, which does not contain
your ahci driver porting. please check it, thanks.

Comment 18 David Milburn 2008-08-13 19:52:57 UTC
Shane,

The crash is due a backport error in libata-core.c, it is actually crashing
in ata_qc_issue, would you please test kernel-2.6.18-102.el5.bz451586.2? 
I do not have a system that is crashing so I am unable to verify myself, 
would please let me know as soon as possible? Thanks.

http://people.redhat.com/dmilburn

Comment 19 Shane Huang 2008-08-14 07:18:01 UTC
Created attachment 314285 [details]
kernel panic still exists with kernel -102.el5.bz451586.2

Comment 20 Shane Huang 2008-08-14 07:21:34 UTC
David, kernel panic still exists if SATA ODD is used, no matter it is
connected to SATA PMP device or board directly.
But SATA HDD can work well, no matter SATA PMP device is used or not.
Please check the boot log in the comment #19 above.

Comment 21 David Milburn 2008-08-14 10:51:29 UTC
Shane, thanks for the feedback, would it be possible to install the 
src.rpm and edit include/linux/libata.h and define ATA_DEBUG and
ATA_VERBOSE_DEBUG and capture more output on boot up? I will look through
the code paths and update the kernel if I see the obvious problem.

Comment 22 David Milburn 2008-08-14 21:25:16 UTC
Shane,

Would you try the .3 kernel for debug purposes and supply dmesg or
console output?

Thanks,
David

Comment 23 Shane Huang 2008-08-15 01:40:53 UTC
Created attachment 314369 [details]
More debug info with kernel -102.el5.bz451586.3 x86_64

Comment 24 Shane Huang 2008-08-15 03:15:58 UTC
Created attachment 314371 [details]
More debug info with kernel -102.el5.bz451586.2

Comment 25 David Milburn 2008-08-16 19:41:15 UTC
Shane,

This problem is specific to the ahci driver, the driver was not initializing
all the dma buffers properly in ahci_port_start, would you please verify the kernel-2.6.18-104.el5.RHEL5.3.sata and let me know as soon as possible. Thanks.

http://people.redhat.com/dmilburn

Comment 26 Shane Huang 2008-08-18 08:51:43 UTC
Hi David,

Here is the test result with kernel-2.6.18-104.el5.RHEL5.3.sata x86_64
on one SB700 Shiner board:
1. Without PMP device:
  1.1 SATA HDD + SATA ODD: PASS
  1.2 SATA HDD alone: PASS
2. With PMP device:
  2.1 SATA HDD + PMP device connect to MB, SATA ODD alone to PMP: PASS
  2.2 SATA HDD + PMP device connect to MB, another HDD alone to PMP: NG
      boot hang with many "request_module: runaway loop modprobe binfmt-464c"
  2.3 SATA HDD + PMP device connect to MB, SATA ODD + another HDD to PMP: NG
      boot hang with many "request_module: runaway loop modprobe binfmt-464c"

But one another SB700 Shiner board, all the PMP test cases are okay
without the above error messages with kernels for both x86_64 and i386.
Do you know anything about the error message?

Thanks

Comment 27 David Milburn 2008-08-18 10:17:35 UTC
Hi Shane,

Thanks for testing again, our official build system was down this weekend
and I had to build the rpms on a local system, the error message looks
related to loading executables, it is very possible that this is a build issue.
As soon as the build system is up, I will build another set of rpms. It
does sound like the SATA code is working.

David

Comment 29 Shane Huang 2008-08-19 01:20:20 UTC
David:

> it is very possible that this is a build issue.

But the same testing x86_64 kernel can work on another SB700 board without the
error message. I still do not know the difference. Thanks.

Comment 30 Shane Huang 2008-08-19 03:36:16 UTC
David:

After further confirmation, the difference exists in two different SATA HDDs
with the same kernel-2.6.18-104.el5.RHEL5.3.sata x86_64 instead of boards.
But I do not understand the difference, do you? 
Is it related with LVM? One HDD is using LVM while the other one does not.

Comment 31 David Milburn 2008-08-19 18:37:18 UTC
Shane,

The error message indicates a 64/32 mismatch between the kernel and modprobe,
do you have 32bit installation on the HDD that you adding? LVM could be 
getting confused if the existing drive and the new drive have the same
label. Can you check the new drive on another system with "parted -l" or
"blkid" and compare that to the existing drive and to /etc/fstab?

If that doesn't help, can you look at the console output before these
messages and compare that to the dmesg output for the "non-lvm" case
that didn't have these error messages. Thanks again for testing.

David

Comment 33 Shane Huang 2008-08-20 02:24:18 UTC
David,

I recovered my partition to a fresh RHEL5.1, then installed the
kernel-2.6.18-104.el5.RHEL5.3.sata, the error message "request_module..."
seems disappeared, so I will not pay more effort to this issue,
and it passed our QA's test.

I think it's time for you to merge your sata backport to RHEL5.3 kernel,
we can do further verification with the coming RHEL5.3 Beta/Snapshot release.

Thanks

Comment 34 Shane Huang 2008-09-05 01:24:03 UTC
David, from which RHEL5 kernel version will your patch be merged?
Will it be kernel 2.6.18-108? Thanks.

Comment 35 David Milburn 2008-09-05 20:42:01 UTC
Hi Shane, the patch is not in -108, but, should be merged soon. You should 
get notified on this BZ.

Comment 36 Don Zickus 2008-09-09 21:16:07 UTC
in kernel-2.6.18-109.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 37 Shane Huang 2008-09-10 01:24:19 UTC
I can NOT find -109 in the above link but -107, are you sure you have uploaded?
please check it again. thanks.

Comment 38 Don Zickus 2008-09-10 15:53:20 UTC
Gah!  Sorry about that.  Must remember to uncomment script lines when done debugging.  Thanks for the heads up.

Comment 39 Shane Huang 2008-09-17 09:32:44 UTC
Our SW QA has verified that the kernel-2.6.18-110.el5 fixed the bug,
The status will be set to VERIFIED after QE sends instructions doing so.
Thanks.

Comment 43 errata-xmlrpc 2009-01-20 20:10:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.