Bug 438266

Summary:

[libata config disk] 2.6.24.* SATA and ACPI errors not in 2.6.23.*

Product:

[Fedora] Fedora

Reporter:

Bas Mevissen <redhat.bugzilla>

Component:

kernel

Assignee:

Alan Cox <alan>

Status:

CLOSED UPSTREAM

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

low

Version:

CC:

kernel-maint

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-07-28 13:39:43 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
2.6.23 dmesg	none
2.6.24 dmesg	none

Description Bas Mevissen 2008-03-19 22:03:38 UTC

Hi,

After an update from kernel 2.6.23.* to 2.6.24.*, I found a problem with the
SATA device recognition. The kernel is complaining about a device that doesn't
exist. Previously, everthing was fine. Also, some new ACPI errors appeared.

The SATA issue was something that might already been identified by Jeff Garzik.
See <http://www.webservertalk.com/archive242-2007-10-2053483.html>

Some additional info: motherboard is Asus P5W DH Deluxe. 2GB RAM, Core2Duo
2.4GHz. Disk is WDC 320GB. Runs Fedora 8 x86_64.

Attached dmesg outputs tell it all. Better than I can. :-)

Regards,

Bas.

Comment 1 Bas Mevissen 2008-03-19 22:04:25 UTC

Created attachment 298601 [details]
2.6.23 dmesg

Comment 2 Bas Mevissen 2008-03-19 22:05:00 UTC

Created attachment 298602 [details]
2.6.24 dmesg

Comment 3 Chuck Ebbert 2008-03-20 18:35:57 UTC

2.6.24:

ata4.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
ata4.00: 640 sectors, multi 1: LBA 
ata4.00: Drive reports diagnostics failure. This may indicate a drive
ata4.00: fault or invalid emulation. Contact drive vendor for information.
ata4.00: device is on DMA blacklist, disabling DMA
ata4.00: configured for PIO4

sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support
DPO or FUA
 sdb:<3>ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: cmd c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
ata4.00: revalidation failed (errno=-5)
ata4: failed to recover some devices, retrying in 5 secs
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: SRST failed (errno=-16)
ata4: soft resetting link
ata4: SRST failed (errno=-16)
ata4: reset failed, giving up
ata4.00: disabled
ata4: EH complete
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0

Comment 4 Chuck Ebbert 2008-03-20 18:37:33 UTC

2.6.23: almost the same but it works here:

ata4.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
ata4.00: 640 sectors, multi 1: LBA 
ata4.00: Drive reports diagnostics failure. This may indicate a drive
ata4.00: fault or invalid emulation. Contact drive vendor for information.
ata4.00: configured for UDMA/133

sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support
DPO or FUA
 sdb: unknown partition table
sd 3:0:0:0: [sdb] Attached SCSI disk

Comment 5 Alan Cox 2008-03-20 19:13:09 UTC

RGL10364 is a magic SIL4723 raid port splitter I believe. Its a buggy mgic raid
port splitting thing and the 2.6.24 changes happen by chance to have made it
show up more on your box.

We try and talk to it , fail and give it the boot.

See: http://lkml.org/lkml/2007/9/27/4

Comment 6 Chuck Ebbert 2008-03-21 00:03:28 UTC

(In reply to comment #5)
> RGL10364 is a magic SIL4723 raid port splitter I believe. Its a buggy mgic raid
> port splitting thing and the 2.6.24 changes happen by chance to have made it
> show up more on your box.
> 
> We try and talk to it , fail and give it the boot.
> 
> See: http://lkml.org/lkml/2007/9/27/4

But the scsi layer tries endlessly to read the partition table after that happens.

Comment 7 Alan Cox 2008-03-21 10:54:10 UTC

ata4: SRST failed (errno=-16)
ata4: reset failed, giving up
ata4.00: disabled
ata4: EH complete

The SCSI layer gives up after a few attempts to talk to it and then disables it

Comment 8 Bas Mevissen 2008-03-21 12:18:37 UTC

Is this somehow related to the ACPI errors I get with that new kernel?

I'm wondering why the SCSI layer or the ATA device driver doesn't detect that
there is no disk on a channel and silently continues. Now, it gives a long delay
in boot time and a heart attack that my disk was dead...



(@Alan: nice talk you gave at the High Tech Campus, Eindhoven. Nice eye-opener
and some interesting points about Xen.)

Comment 9 Alan Cox 2008-03-21 16:01:02 UTC

There is a device on the channel pretending (very badly) to be a disk:

ATA-6: Config  Disk, RGL10364, max UDMA/133

I doubt this has any connection with ACPI

Comment 10 Chuck Ebbert 2008-04-14 22:18:28 UTC

(In reply to comment #7)
> ata4: SRST failed (errno=-16)
> ata4: reset failed, giving up
> ata4.00: disabled
> ata4: EH complete
> 
> The SCSI layer gives up after a few attempts to talk to it and then disables it
> 

But something keeps trying later:

Bridge firewalling registered
virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature.
virbr0: starting userspace STP failed, starting kernel STP
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0
printk: 30 messages suppressed.
Buffer I/O error on device sdb, logical block 0
Buffer I/O error on device sdb, logical block 1
Buffer I/O error on device sdb, logical block 2
Buffer I/O error on device sdb, logical block 3
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 0

Bas, does it ever stop spewing these messages?

Comment 11 Bas Mevissen 2008-05-11 22:43:30 UTC

Well, with the latest F8 kernel (2.6.24.5-85.fc8), the issue still exists. Box
seems to run stable and fine with it. But booting takes a needles extra minute
or so for the time-out.
Is someone working on this issue? Can I do something to help?
Maybe I can work around it with some command line option?
I would appreciate some kind of solution, at least for now. Guess that F9 will
suffer the same issue...

Comment 12 Bas Mevissen 2008-07-28 13:25:50 UTC

OK, good news at last: on my Asus P5W DH Deluxe, the updates to libata
(ata_piix.c) in vanilla 2.6.26 seem to fix the time-out problems during boot.

Comment 13 Alan Cox 2008-07-28 13:39:43 UTC

Yay althohg it may well have been the core changes Tejun did that were the magic.

Comment 14 Bas Mevissen 2008-07-28 13:47:33 UTC

Why already close this report? There is no Fedora solution yet. 

2.6.26 Breaks third party drivers from nvidia and virtual box. So for Fedora
8&9, I would recommend to backport Tejun's changes to 2.6.25.

Comment 15 Alan Cox 2008-07-28 14:09:59 UTC

I closed it as fixed upstream. When Fedora ships a 2.6.26 kernel the problem
will go away. A backport would be complex and risk breaking other working systems.

Nvidia proprietary problems should be addressed to Nvidia.

Comment 16 Chuck Ebbert 2008-07-29 04:23:34 UTC

We will be releasing a 2.6.26 kernel for F8 and F9 soon.

Comment 17 Bas Mevissen 2008-09-09 22:38:33 UTC

Any idea when this will be?

Comment 18 Bas Mevissen 2008-09-10 20:05:58 UTC

Ah, this morning I found out that the newkey repos were created. I tested the kernels 2.6.26.3-29.fc9.x86_64 and 2.6.26.3-29.fc9.i686. They seem to be fine on my system.