Bug 436591 - kernel 2.6.24.3-12.fc8 fails to recognize disks so it does not boot
kernel 2.6.24.3-12.fc8 fails to recognize disks so it does not boot
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
8
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Jeff Garzik
Fedora Extras Quality Assurance
:
Depends On:
Blocks: FCMETA_SATA
  Show dependency treegraph
 
Reported: 2008-03-08 01:28 EST by Michal Jaegermann
Modified: 2013-07-02 22:35 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-03-10 18:11:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg from a booting 2.6.23.15-137.fc8 kernel (21.08 KB, text/plain)
2008-03-08 01:28 EST, Michal Jaegermann
no flags Details
dmesg from 2.6.24.3-12.fc8 (on the same machine after a BIOS update) (25.22 KB, text/plain)
2008-03-10 17:45 EDT, Michal Jaegermann
no flags Details

  None (edit)
Description Michal Jaegermann 2008-03-08 01:28:53 EST
Description of problem:

Booting after a kernel update on M2R32-MVP ASUSTeK board fails
because disks are not recognized anymore.  On a screen something
like that shows up (this is from a hasty scribbles from a scrolling
screen so some pieces are missing or may be garbled):

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (error 0xec)
ata1.00: failed to IDENTIFY (I/O error, err .. =0x)
ata1.00: failed to recognize some devices, retrying in 5 sec

That is repeated few times, I think three, with long timeouts
and followed by a similar sequence for the second disk.  An obvious
failure follows. 2.6.23.15-137.fc8 and earlier F8 kernels booted
without problems.  A dmesg from a boot with 2.6.23.15-137.fc8
is attached.

In the past there were reported problems with a disk recognition
on the same hardware but later those were fixed.  Reports are
bugzilla 232490 and 235787. A layout of a PCI bus is in
https://bugzilla.redhat.com/attachment.cgi?id=152106

A machine is "remote" so trying other kernels/options is
a somewhat protraced affair.

Version-Release number of selected component (if applicable):
2.6.24.3-12.fc8

How reproducible:
always

Additional info:
The same kernel on two other x86_64, but a different hardware
configuration, booted (although on one machine something in
an update totally deconfigured both network interfaces; luckily
that machine was local).
Comment 1 Michal Jaegermann 2008-03-08 01:28:53 EST
Created attachment 297268 [details]
dmesg from a booting 2.6.23.15-137.fc8 kernel
Comment 2 Michal Jaegermann 2008-03-10 03:41:15 EDT
Hm, that looks like that
http://kerneltrap.org/mailarchive/linux-kernel/2008/2/9/795354
http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/885364
Unfortunately it does not say which BIOS version was used.
I will try to update that BIOS when I will have a chance and report back.
Comment 3 Michal Jaegermann 2008-03-10 17:43:18 EDT
I had a chance to try this morning a BIOS update.  Eventually
I used version 1009 which is the latest "non-beta" available from
ASUS.  After a protracted fight with BIOS I eventually managed
to boot a machine and that worked with 2.6.24.3-12.fc8.
dmesg from that boot shows what looks like essential differences
so it is attached below.

Should this be closed then?  The issue is that an older BIOS was
good enough for 2.6.23.15-137.fc8 but not for 2.6.24.3-12.fc8.
Luckily a suitable update did exist (but I would not dare to
tell my wife to apply it :-).
Comment 4 Michal Jaegermann 2008-03-10 17:45:32 EDT
Created attachment 297519 [details]
dmesg from 2.6.24.3-12.fc8 (on the same machine after a BIOS update)
Comment 5 Chuck Ebbert 2008-03-10 18:11:38 EDT
Looks like the BIOS now disables PMP and 64-bit DMA.
Comment 6 Michal Jaegermann 2008-03-10 18:58:33 EDT
> Looks like the BIOS now disables PMP and 64-bit DMA.

That could be because of mis-settings after a BIOS update.
I had a really hard time to get that BIOS to a state where
it was booting anything at all ("safe defaults" were not suitable
for this) and after I managed to accomplish that feat I really
had to hurry somewhere else with no time to rescan and fish out
what else could/should possibly be changed without making
the machine unbootable again.

I cannot even tell you if there are options to enable the above.
Comment 7 Michal Jaegermann 2008-03-11 01:23:58 EDT
> Looks like the BIOS now disables PMP and 64-bit DMA.
> (Chuck Ebbert in comment #5)

I looked again at dmesg output for 2.6.23.15-137.fc8 with 0804, i.e.
old, BIOS and 2.6.24.3-12.fc8 booting with 1009.  If a comment is
about these two lines:

ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit
ahci 0000:00:12.0: controller can't do PMP, turning off CAP_PMP

then the first one is the same in both cases and only the second
one showed up after updates.  I do not see also any other info
about DMA which would change in any essential way.

(Not that I have a clue what PMP is and Google is not of much help :-).
Comment 8 Michal Jaegermann 2008-03-11 12:26:30 EDT
I am not that convinced that closing that as NOTABUG was not a bit
too hasty.  True, I found a way to boot that particular hardware but
see also http://lkml.org/lkml/2008/3/9/136 . It looks strangely familiar.
That may be buggy BIOSes but what is new here?

http://lkml.org/lkml/2008/3/9/171 and follow-ups also seem to have
some relevance.
Comment 9 Chuck Ebbert 2008-03-12 14:48:09 EDT
(In reply to comment #7)

> (Not that I have a clue what PMP is and Google is not of much help :-).

PMP = port multiplier port

http://www.sata-io.org/portmultiplier.asp

Comment 10 George 2008-03-14 13:10:25 EDT
i have those issues described in the original post too. However i have found a
kernel parameter that fixes it:
pci=nomsi
without that parameter i get:

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (error 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: failed to recognize some devices, retrying in 5 sec

With pci=nomsi it works fine and recognizes my western digital HDD (Jmicron sata
controller, kernel 2.6.24-2.fc9 on FC9 Alpha, i had this in the regular FC8 too,
after the latest kernel 2.6.24 update)

Note You need to log in before you can comment on or make changes to this bug.