Description of problem: There is one bug in ATI SATA PMP of SB600 and SB700 old revision, which leadsto soft reset failure. This patch can fix the bug. patch to kernel upstream has been acceopted, which is: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux- 2.6.git;a=commit;h=bd17243a84632465f5403bc9eb8b4831bd67e582 But it need backport for RHEL5.3
I find that Jeff backported SATA driver from 2.6.24-rcX to RHEL5.2 with the patch linux-2.6-sata-rhel5-2-driver-update, which includes the SATA PMP feature support. I'm backporting the SB600/700 SATA PMP patch on the base of RHEL5.2, finding that there is much difference between 2.6.26-rcX and RHEL5.2 AHCI driver. I wonder whether RedHat will continue to update SATA driver to 2.6.26-rcX for RHEL5.3? So that the upstream patch only need little backport. Otherwise I have to add more codes for my backport, some extra flags have to be added. Please review my attached backported patch for RHEL5.2(kernel 2.6.18-92) and give me your suggestion.
Created attachment 309434 [details] SB600/700 SATA PMP support for RHEL5.2 It need more update if RedHat update AHCI driver to 2.6.26 for RHEL5.3
Russ, please add to RHEL5.3 tracker.
I mean the AMD RHEL5.3 tracker
Requesting this fix be included included in RHEL 5.3. AMD: we try to avoid rebasing drivers within a RHEL release. We have a strong preference for backporting selected changes.
> AMD: we try to avoid rebasing drivers within a RHEL release. > We have a strong preference for backporting selected changes. I know, but since you are always updating SATA ahci driver just like RHEL5.2, and the pmp patch depends heavily on the ahci driver update, we want to know your schedule of of RHEL5.3 on ahci update. There is also another solution: When Jeff or other RedHat guys updates the ahci driver for RHEL5.3, please also backport the SBX00 pmp patch together(upstream commit already provided), instead of using my backported one in comment #2. Thanks
How and where are SATA port multipliers used? Do we have the hardware at Red Hat to test this?
SATA PMP can be used if the SATA ports on the motherboard are limited and user need more ports, a little like USB hub device. A SATA PMP hardware device is necessary for testing SATA PMP feature.
This updates the ahci driver to 2.6.26-rc5 plus commit bd17243a84632465f5403bc9eb8b4831bd67e582 Author: Shane Huang <shane.huang> Date: Tue Jun 10 15:52:04 2008 +0800 ahci: Workaround HW bug for SB600/700 SATA controller PMP support Would you please test kernel-2.6.18-102.el5.bz451586.1? Thanks. http://people.redhat.com/dmilburn
David, I can ask our QA to test the PMP with your kernel, and can you also share us the .src.rpm package? Thanks
David, your testing kernel can NOT work with one PMP device(kernel panic) Can you share us the source rpm package? We can check the code first.
Shane, the kernel-2.6.18-102.el5.bz451586.1.src.rpm is http://people.redhat.com/dmilburn/ Would you please post the stack trace when the system panics? Thanks.
Created attachment 314171 [details] boot fails when HDD and ODD are being used, without any PMP device
Created attachment 314172 [details] boot ok when only one HDD is used, without any PMP device
Comment on attachment 314172 [details] boot ok when only one HDD is used, without any PMP device posted the wrong log file by mistake
Created attachment 314174 [details] boot ok when only one HDD is used, without any PMP device
The error should be caused by your ahci driver porting, because my backported PMP patch above can work on the base of kernel -103, which does not contain your ahci driver porting. please check it, thanks.
Shane, The crash is due a backport error in libata-core.c, it is actually crashing in ata_qc_issue, would you please test kernel-2.6.18-102.el5.bz451586.2? I do not have a system that is crashing so I am unable to verify myself, would please let me know as soon as possible? Thanks. http://people.redhat.com/dmilburn
Created attachment 314285 [details] kernel panic still exists with kernel -102.el5.bz451586.2
David, kernel panic still exists if SATA ODD is used, no matter it is connected to SATA PMP device or board directly. But SATA HDD can work well, no matter SATA PMP device is used or not. Please check the boot log in the comment #19 above.
Shane, thanks for the feedback, would it be possible to install the src.rpm and edit include/linux/libata.h and define ATA_DEBUG and ATA_VERBOSE_DEBUG and capture more output on boot up? I will look through the code paths and update the kernel if I see the obvious problem.
Shane, Would you try the .3 kernel for debug purposes and supply dmesg or console output? Thanks, David
Created attachment 314369 [details] More debug info with kernel -102.el5.bz451586.3 x86_64
Created attachment 314371 [details] More debug info with kernel -102.el5.bz451586.2
Shane, This problem is specific to the ahci driver, the driver was not initializing all the dma buffers properly in ahci_port_start, would you please verify the kernel-2.6.18-104.el5.RHEL5.3.sata and let me know as soon as possible. Thanks. http://people.redhat.com/dmilburn
Hi David, Here is the test result with kernel-2.6.18-104.el5.RHEL5.3.sata x86_64 on one SB700 Shiner board: 1. Without PMP device: 1.1 SATA HDD + SATA ODD: PASS 1.2 SATA HDD alone: PASS 2. With PMP device: 2.1 SATA HDD + PMP device connect to MB, SATA ODD alone to PMP: PASS 2.2 SATA HDD + PMP device connect to MB, another HDD alone to PMP: NG boot hang with many "request_module: runaway loop modprobe binfmt-464c" 2.3 SATA HDD + PMP device connect to MB, SATA ODD + another HDD to PMP: NG boot hang with many "request_module: runaway loop modprobe binfmt-464c" But one another SB700 Shiner board, all the PMP test cases are okay without the above error messages with kernels for both x86_64 and i386. Do you know anything about the error message? Thanks
Hi Shane, Thanks for testing again, our official build system was down this weekend and I had to build the rpms on a local system, the error message looks related to loading executables, it is very possible that this is a build issue. As soon as the build system is up, I will build another set of rpms. It does sound like the SATA code is working. David
David: > it is very possible that this is a build issue. But the same testing x86_64 kernel can work on another SB700 board without the error message. I still do not know the difference. Thanks.
David: After further confirmation, the difference exists in two different SATA HDDs with the same kernel-2.6.18-104.el5.RHEL5.3.sata x86_64 instead of boards. But I do not understand the difference, do you? Is it related with LVM? One HDD is using LVM while the other one does not.
Shane, The error message indicates a 64/32 mismatch between the kernel and modprobe, do you have 32bit installation on the HDD that you adding? LVM could be getting confused if the existing drive and the new drive have the same label. Can you check the new drive on another system with "parted -l" or "blkid" and compare that to the existing drive and to /etc/fstab? If that doesn't help, can you look at the console output before these messages and compare that to the dmesg output for the "non-lvm" case that didn't have these error messages. Thanks again for testing. David
David, I recovered my partition to a fresh RHEL5.1, then installed the kernel-2.6.18-104.el5.RHEL5.3.sata, the error message "request_module..." seems disappeared, so I will not pay more effort to this issue, and it passed our QA's test. I think it's time for you to merge your sata backport to RHEL5.3 kernel, we can do further verification with the coming RHEL5.3 Beta/Snapshot release. Thanks
David, from which RHEL5 kernel version will your patch be merged? Will it be kernel 2.6.18-108? Thanks.
Hi Shane, the patch is not in -108, but, should be merged soon. You should get notified on this BZ.
in kernel-2.6.18-109.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
I can NOT find -109 in the above link but -107, are you sure you have uploaded? please check it again. thanks.
Gah! Sorry about that. Must remember to uncomment script lines when done debugging. Thanks for the heads up.
Our SW QA has verified that the kernel-2.6.18-110.el5 fixed the bug, The status will be set to VERIFIED after QE sends instructions doing so. Thanks.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html