Bug 231294

Summary: ATA: abnormal status message loops booting 2.6.20-1.2966.fc7
Product: [Fedora] Fedora Reporter: cje
Component: kernelAssignee: Alan Cox <alan>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: rawhide   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.20-1.2982 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-19 10:31:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output
none
portege p2000 lspci none

Description cje 2007-03-07 15:41:36 UTC
Description of problem:

i can't boot 2.6.20-1.2966.fc7 on my toshiba portege 2000.  2.6.20-1.2962.fc7 is
fine.

i get the following:

ATA: abnormal status 0x58 on port 0x000101F7

(followed by a few lines beginning "ata1.00") repeated over and over again.


Version-Release number of selected component (if applicable):
2.6.20-1.2966.fc7

How reproducible:
just boot.

Steps to Reproduce:
1.upgrade to that kernel
2.boot
3.
  
Actual results:
looping error messages

Expected results:
boot

Additional info:
this is an old laptop (at least five years) with a PATA disk but it's been
working fine up until this kernel.  and it's running fine right now with the
previous kernel.

Comment 1 Alan Cox 2007-03-07 19:38:59 UTC
Can you attach the dmesg of a successful boot and an lspci -vxx

Thanks


Comment 2 cje 2007-03-08 11:36:47 UTC
Created attachment 149559 [details]
dmesg output

here's one

Comment 3 cje 2007-03-08 11:37:39 UTC
Created attachment 149560 [details]
portege p2000 lspci

and the other

Comment 4 cje 2007-03-08 12:22:45 UTC
just trying 2.6.20-1.2967 and got same errors.  here's some more details:

the following messages are in there somewhere - 

simplex DMA is claimed by other device, disabling DMA
configured for PIO0
EH complete
(HSM violation)

whilst noting all that i actually left it running longer than before and it got
past that bit.  the last bits are:

ata1.00: configured for PIO0
sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current [descriptor[: sense key: Aborted Command
    Additional sense: Scsi parity error
Descriptor

urk.  it's gone now.  the system seems to be running but it's extremely slow. 
if it ever makes it to a login prompt at a useable speed i'll try to get another
dmesg output before i go back to 2.6.20-1.2962.fc7.

Comment 5 Alan Cox 2007-03-08 13:07:14 UTC
"Simplex DMA is claimed by other device, disabling DMA" is the root cause of the
very slow performance, and I fixed that partly and Petr posted a fix to my fix
last night.

ata1.00: configured for PIO0
sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current [descriptor[: sense key: Aborted Command
    Additional sense: Scsi parity error

Tht bit is most peculiar however



Comment 6 cje 2007-03-08 13:26:55 UTC
okay.  so it's possible there's some problem with PIO mode?

how about we wait for your fix to turn up in devel and then see if we can
reproduce this weird error by manually disabling DMA mode?

(by the way, the messages are hand-typed .. it should be "[descriptor]", not
"[descriptor[" .. in case you were wondering!)

for completeness ... i've tried booting to just a shell but i can't write
anything to the disk.  (i can remount rw with lots of those errors but actually
writing anything to the disk fails with another error) but i can 'dmesg | more'
to make a more careful copy of the messages.

looks like there's a pattern.  you get six copies of a 7 line block which starts
with

ATA: abnormal status 0x58 on port 0x000101f7

includes the DMA messages and ends with

ata1: EH complete

and those six blocks are followed by the SCSI error.

Comment 7 cje 2007-03-12 14:12:30 UTC
just updated to 2.6.20-1.2982 and it boots fine.  :-)

tried booting with "nodma" option.  i'm not sure if that's still supposed to
have an effect but it didn't make any difference.  'hdparm -d /dev/sda' just
returns '/dev/sda:' either way.

anyway, i guess the weird messages were just weird and nothing more.  i'm happy
to try some things out if you do want to investigate that further but otherwise
i'm equally happy for you to close this call.  many thanks for the responses and
fix.