Bug 438266
| Summary: | [libata config disk] 2.6.24.* SATA and ACPI errors not in 2.6.23.* | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Bas Mevissen <redhat.bugzilla> | ||||||
| Component: | kernel | Assignee: | Alan Cox <alan> | ||||||
| Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 9 | CC: | kernel-maint | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2008-07-28 13:39:43 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Bas Mevissen
2008-03-19 22:03:38 UTC
Created attachment 298601 [details]
2.6.23 dmesg
Created attachment 298602 [details]
2.6.24 dmesg
2.6.24:
ata4.00: ATA-6: Config Disk, RGL10364, max UDMA/133
ata4.00: 640 sectors, multi 1: LBA
ata4.00: Drive reports diagnostics failure. This may indicate a drive
ata4.00: fault or invalid emulation. Contact drive vendor for information.
ata4.00: device is on DMA blacklist, disabling DMA
ata4.00: configured for PIO4
sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support
DPO or FUA
sdb:<3>ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: cmd c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
ata4.00: revalidation failed (errno=-5)
ata4: failed to recover some devices, retrying in 5 secs
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: SRST failed (errno=-16)
ata4: reset failed, giving up
ata4.00: disabled
ata4: EH complete
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0
2.6.23: almost the same but it works here: ata4.00: ATA-6: Config Disk, RGL10364, max UDMA/133 ata4.00: 640 sectors, multi 1: LBA ata4.00: Drive reports diagnostics failure. This may indicate a drive ata4.00: fault or invalid emulation. Contact drive vendor for information. ata4.00: configured for UDMA/133 sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sdb: unknown partition table sd 3:0:0:0: [sdb] Attached SCSI disk RGL10364 is a magic SIL4723 raid port splitter I believe. Its a buggy mgic raid port splitting thing and the 2.6.24 changes happen by chance to have made it show up more on your box. We try and talk to it , fail and give it the boot. See: http://lkml.org/lkml/2007/9/27/4 (In reply to comment #5) > RGL10364 is a magic SIL4723 raid port splitter I believe. Its a buggy mgic raid > port splitting thing and the 2.6.24 changes happen by chance to have made it > show up more on your box. > > We try and talk to it , fail and give it the boot. > > See: http://lkml.org/lkml/2007/9/27/4 But the scsi layer tries endlessly to read the partition table after that happens. ata4: SRST failed (errno=-16) ata4: reset failed, giving up ata4.00: disabled ata4: EH complete The SCSI layer gives up after a few attempts to talk to it and then disables it Is this somehow related to the ACPI errors I get with that new kernel? I'm wondering why the SCSI layer or the ATA device driver doesn't detect that there is no disk on a channel and silently continues. Now, it gives a long delay in boot time and a heart attack that my disk was dead... (@Alan: nice talk you gave at the High Tech Campus, Eindhoven. Nice eye-opener and some interesting points about Xen.) There is a device on the channel pretending (very badly) to be a disk: ATA-6: Config Disk, RGL10364, max UDMA/133 I doubt this has any connection with ACPI (In reply to comment #7) > ata4: SRST failed (errno=-16) > ata4: reset failed, giving up > ata4.00: disabled > ata4: EH complete > > The SCSI layer gives up after a few attempts to talk to it and then disables it > But something keeps trying later: Bridge firewalling registered virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature. virbr0: starting userspace STP failed, starting kernel STP sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 0 printk: 30 messages suppressed. Buffer I/O error on device sdb, logical block 0 Buffer I/O error on device sdb, logical block 1 Buffer I/O error on device sdb, logical block 2 Buffer I/O error on device sdb, logical block 3 sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 0 Bas, does it ever stop spewing these messages? Well, with the latest F8 kernel (2.6.24.5-85.fc8), the issue still exists. Box seems to run stable and fine with it. But booting takes a needles extra minute or so for the time-out. Is someone working on this issue? Can I do something to help? Maybe I can work around it with some command line option? I would appreciate some kind of solution, at least for now. Guess that F9 will suffer the same issue... OK, good news at last: on my Asus P5W DH Deluxe, the updates to libata (ata_piix.c) in vanilla 2.6.26 seem to fix the time-out problems during boot. Yay althohg it may well have been the core changes Tejun did that were the magic. Why already close this report? There is no Fedora solution yet. 2.6.26 Breaks third party drivers from nvidia and virtual box. So for Fedora 8&9, I would recommend to backport Tejun's changes to 2.6.25. I closed it as fixed upstream. When Fedora ships a 2.6.26 kernel the problem will go away. A backport would be complex and risk breaking other working systems. Nvidia proprietary problems should be addressed to Nvidia. We will be releasing a 2.6.26 kernel for F8 and F9 soon. Any idea when this will be? Ah, this morning I found out that the newkey repos were created. I tested the kernels 2.6.26.3-29.fc9.x86_64 and 2.6.26.3-29.fc9.i686. They seem to be fine on my system. |