Bug 438266 - [libata config disk] 2.6.24.* SATA and ACPI errors not in 2.6.23.*
[libata config disk] 2.6.24.* SATA and ACPI errors not in 2.6.23.*
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
9
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Alan Cox
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-19 18:03 EDT by Bas Mevissen
Modified: 2008-09-10 16:05 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-28 09:39:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
2.6.23 dmesg (26.64 KB, text/plain)
2008-03-19 18:04 EDT, Bas Mevissen
no flags Details
2.6.24 dmesg (38.72 KB, text/plain)
2008-03-19 18:05 EDT, Bas Mevissen
no flags Details

  None (edit)
Description Bas Mevissen 2008-03-19 18:03:38 EDT
Hi,

After an update from kernel 2.6.23.* to 2.6.24.*, I found a problem with the
SATA device recognition. The kernel is complaining about a device that doesn't
exist. Previously, everthing was fine. Also, some new ACPI errors appeared.

The SATA issue was something that might already been identified by Jeff Garzik.
See <http://www.webservertalk.com/archive242-2007-10-2053483.html>

Some additional info: motherboard is Asus P5W DH Deluxe. 2GB RAM, Core2Duo
2.4GHz. Disk is WDC 320GB. Runs Fedora 8 x86_64.

Attached dmesg outputs tell it all. Better than I can. :-)

Regards,

Bas.
Comment 1 Bas Mevissen 2008-03-19 18:04:25 EDT
Created attachment 298601 [details]
2.6.23 dmesg
Comment 2 Bas Mevissen 2008-03-19 18:05:00 EDT
Created attachment 298602 [details]
2.6.24 dmesg
Comment 3 Chuck Ebbert 2008-03-20 14:35:57 EDT
2.6.24:

ata4.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
ata4.00: 640 sectors, multi 1: LBA 
ata4.00: Drive reports diagnostics failure. This may indicate a drive
ata4.00: fault or invalid emulation. Contact drive vendor for information.
ata4.00: device is on DMA blacklist, disabling DMA
ata4.00: configured for PIO4

sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support
DPO or FUA
 sdb:<3>ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: cmd c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
ata4.00: revalidation failed (errno=-5)
ata4: failed to recover some devices, retrying in 5 secs
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: SRST failed (errno=-16)
ata4: reset failed, giving up
ata4.00: disabled
ata4: EH complete
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0
Comment 4 Chuck Ebbert 2008-03-20 14:37:33 EDT
2.6.23: almost the same but it works here:

ata4.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
ata4.00: 640 sectors, multi 1: LBA 
ata4.00: Drive reports diagnostics failure. This may indicate a drive
ata4.00: fault or invalid emulation. Contact drive vendor for information.
ata4.00: configured for UDMA/133

sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support
DPO or FUA
 sdb: unknown partition table
sd 3:0:0:0: [sdb] Attached SCSI disk
Comment 5 Alan Cox 2008-03-20 15:13:09 EDT
RGL10364 is a magic SIL4723 raid port splitter I believe. Its a buggy mgic raid
port splitting thing and the 2.6.24 changes happen by chance to have made it
show up more on your box.

We try and talk to it , fail and give it the boot.

See: http://lkml.org/lkml/2007/9/27/4
Comment 6 Chuck Ebbert 2008-03-20 20:03:28 EDT
(In reply to comment #5)
> RGL10364 is a magic SIL4723 raid port splitter I believe. Its a buggy mgic raid
> port splitting thing and the 2.6.24 changes happen by chance to have made it
> show up more on your box.
> 
> We try and talk to it , fail and give it the boot.
> 
> See: http://lkml.org/lkml/2007/9/27/4

But the scsi layer tries endlessly to read the partition table after that happens.
Comment 7 Alan Cox 2008-03-21 06:54:10 EDT
ata4: SRST failed (errno=-16)
ata4: reset failed, giving up
ata4.00: disabled
ata4: EH complete

The SCSI layer gives up after a few attempts to talk to it and then disables it
Comment 8 Bas Mevissen 2008-03-21 08:18:37 EDT
Is this somehow related to the ACPI errors I get with that new kernel?

I'm wondering why the SCSI layer or the ATA device driver doesn't detect that
there is no disk on a channel and silently continues. Now, it gives a long delay
in boot time and a heart attack that my disk was dead...



(@Alan: nice talk you gave at the High Tech Campus, Eindhoven. Nice eye-opener
and some interesting points about Xen.)
Comment 9 Alan Cox 2008-03-21 12:01:02 EDT
There is a device on the channel pretending (very badly) to be a disk:

ATA-6: Config  Disk, RGL10364, max UDMA/133

I doubt this has any connection with ACPI
Comment 10 Chuck Ebbert 2008-04-14 18:18:28 EDT
(In reply to comment #7)
> ata4: SRST failed (errno=-16)
> ata4: reset failed, giving up
> ata4.00: disabled
> ata4: EH complete
> 
> The SCSI layer gives up after a few attempts to talk to it and then disables it
> 

But something keeps trying later:

Bridge firewalling registered
virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature.
virbr0: starting userspace STP failed, starting kernel STP
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0
printk: 30 messages suppressed.
Buffer I/O error on device sdb, logical block 0
Buffer I/O error on device sdb, logical block 1
Buffer I/O error on device sdb, logical block 2
Buffer I/O error on device sdb, logical block 3
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0

Bas, does it ever stop spewing these messages?

Comment 11 Bas Mevissen 2008-05-11 18:43:30 EDT
Well, with the latest F8 kernel (2.6.24.5-85.fc8), the issue still exists. Box
seems to run stable and fine with it. But booting takes a needles extra minute
or so for the time-out.
Is someone working on this issue? Can I do something to help?
Maybe I can work around it with some command line option?
I would appreciate some kind of solution, at least for now. Guess that F9 will
suffer the same issue... 
Comment 12 Bas Mevissen 2008-07-28 09:25:50 EDT
OK, good news at last: on my Asus P5W DH Deluxe, the updates to libata
(ata_piix.c) in vanilla 2.6.26 seem to fix the time-out problems during boot.
Comment 13 Alan Cox 2008-07-28 09:39:43 EDT
Yay althohg it may well have been the core changes Tejun did that were the magic.

Comment 14 Bas Mevissen 2008-07-28 09:47:33 EDT
Why already close this report? There is no Fedora solution yet. 

2.6.26 Breaks third party drivers from nvidia and virtual box. So for Fedora
8&9, I would recommend to backport Tejun's changes to 2.6.25.
Comment 15 Alan Cox 2008-07-28 10:09:59 EDT
I closed it as fixed upstream. When Fedora ships a 2.6.26 kernel the problem
will go away. A backport would be complex and risk breaking other working systems.

Nvidia proprietary problems should be addressed to Nvidia.
Comment 16 Chuck Ebbert 2008-07-29 00:23:34 EDT
We will be releasing a 2.6.26 kernel for F8 and F9 soon.
Comment 17 Bas Mevissen 2008-09-09 18:38:33 EDT
Any idea when this will be?
Comment 18 Bas Mevissen 2008-09-10 16:05:58 EDT
Ah, this morning I found out that the newkey repos were created. I tested the kernels 2.6.26.3-29.fc9.x86_64 and 2.6.26.3-29.fc9.i686. They seem to be fine on my system.

Note You need to log in before you can comment on or make changes to this bug.