Bug 254007

Summary: [sata_sil] Can't boot because newer kernels can't access SATA disk
Product: [Fedora] Fedora Reporter: David A. De Graaf <dad>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7CC: cebbert, chris.brown, davej, juha.anon, kevin, peterm
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-13 18:25:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 172490    
Attachments:
Description Flags
photos of F8 booting none

Description David A. De Graaf 2007-08-23 15:58:32 UTC
Description of problem:
Fedora 7 won't boot on my main gateway machine.  
More precisely, the newer kernels can't access the third disk which is SATA,
and therefore can't find the filesystems on it.

Version-Release number of selected component (if applicable):

vmlinuz-2.6.22.1-41.fc7

How reproducible:
Every time


Steps to Reproduce:
1.  Try to boot any of the new F7 kernels
2.
3.
  
Actual results:
  Won't boot

Expected results:


Additional info:

Fedora 7 won't boot on my main gateway machine.  

More precisely, the kernel can't access the third disk which is SATA, and
therefore can't find the filesystems on it.  The original F7 kernel was
able to access this disk, but none since.  I have these kernels installed:

    vmlinuz-2.6.21-1.3194.fc7
    vmlinuz-2.6.21-1.3228.fc7
    vmlinuz-2.6.22.1-27.fc7
    vmlinuz-2.6.22.1-41.fc7

Only the first is usable.

Here are typical messages transcribed from the screen while booting
the vmlinuz-2.6.22.1-27.fc7 kernel:

Uncompressing Linux... OK, booting the kernel.
RedHat nash version 6.0.9 starting
ata3.00:  revalidation failed (errno=-5)
ata3.00:  failed to set xfermode (err_mask=0x40)
ata3.00:  failed to set xfermode (err_mask=0x40)
ata3.00:  exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen
ata3.00:  failed to set xfermode (err_mask=0x40)
ata3.00:  failed to set xfermode (err_mask=0x40)
ata3.00:  failed to set xfermode (err_mask=0x40)
    (last 4 lines repeat 3 more times)
ata3:  EH pending after 5 tries, giving up

        Welcome to Fedora
    (booting proceeds)
fsck.ext3:  Unable to resolve "LABEL=/h3"
fsck.ext3:  Unable to resolve "LABEL=/h2"
    (drops to shell for manual repair)

Of course, no repair is possible because the SATA disk containing /h2
and /h3 cannot be accessed at all.

When the system runs (with vmlinuz-2.6.21-1.3194.fc7) I can glean from
dmesg and dmidecode that

mobo:  ABIT NF7
cpu:   Athlon socket A 3000 MHz
disks:
  scsi1 : pata_amd
  SCSI device sda: 240121728 512-byte hdwr sectors (122942 MB)

  SCSI device sdb: 20005650 512-byte hdwr sectors (10243 MB)
  
  scsi2 : sata_sil
  SCSI device sdc: 234441648 512-byte hdwr sectors (120034 MB)


Evidently, the sata_sil driver in the later kernels is the culprit.
Googling reveals that others have had similar problems, but with the
Intel sata driver, not sata_sil.

This sata disk has been running perfectly with Fedora Core 6 and with
the original F7 kernel.

Comment 1 Christopher Brown 2007-09-30 12:42:34 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Comment 2 David A. De Graaf 2007-10-01 17:28:19 UTC
I am sad to report that this bug remains alive and well.
I have just added the latest kernel - vmlinuz-2.6.22.9-91.fc7 
It, too, cannot access my SATA disk, which contains important but non-system
data files.  I have not tried unplugging the sata cable, but I'm pretty sure 
that would assuage the problem, but wouldn't allow access to those important 
filesystems.

Please DO NOT close this bugzilla.  This is a show-stopper for me.
I'm stuck with vmlinuz-2.6.21-1.3194.fc7, the only Fedora 7 kernel that works.

What additional data can I provide?

Comment 3 Christopher Brown 2007-10-02 11:17:00 UTC
Okay, thanks for the update. I'm re-assigning this to the SATA maintainer who
may wish to review this further. I'm also adding a F8 blocker bug as it might
prevent a successful install of the next version of Fedora. To confirm this, it
would be helpful if you could download the latest live cd version from:

http://torrent.fedoraproject.org/torrents//rawhide-i386-Live-20070925.torrent

and see if this detects your disk.

Comment 4 David A. De Graaf 2007-10-03 16:08:24 UTC
I'd love to try the latest live cd.  I spent all day yesterday trying to get a 
bittorrent download to finish.  I restarted it at 1;30, and again it stopped 
short of completion.  Now 11 hours later, I have 
  Size: 694.6 MB (728,258,633 bytes)
  Transferred: 694.6 MB (728,242,249 bytes)
The actual files received are:
  -rw-rw-r-- 1 dad dad         0 2007-10-02 13:32 SHA1SUM
  -rw-rw-r-- 1 dad dad 728258560 2007-10-02 17:09 rawhide-i386-Live-20070925.iso

The last ~16K bytes never arrive.
I can't find any non-bittorrent way (ftp, rsync, http) to obtain this file.
Is there one?

Thanks for adding the "F8 blocker bug" label.  It seems appropriate.

Comment 5 David A. De Graaf 2007-10-03 21:19:37 UTC
Created attachment 215111 [details]
photos of F8 booting

Comment 6 Chuck Ebbert 2007-10-03 21:50:52 UTC
Please don't attach .gz files, nobody can view them. And jpegs are already
compressed...

Comment 7 Christopher Brown 2007-10-04 09:53:58 UTC
F8 Test 3 is out any day now which will be available over ftp, http...

Comment 8 David A. De Graaf 2007-10-06 20:18:24 UTC
I have booted F8 Test 3 Live CD (Fedora-7.92-Live-i686.iso).

During the detection phase it failed to detect my third disk, which is
a SATA drive, but apparently did detect the two ATA drives.
I will refrain from posting photos of the extensive error messages.
They were similar to what I've seen with the F7 sata driver.

The GUI interface came up correctly and I did  
  fdisk -l
which correctly displayed /dev/sda and /dev/sdb, but not /dev/sdc.

fdisk -l also listed two other disks that are a mystery to me:
  /dev/dm-0:  4294 MB
  /dev/dm-1:  4294 MB

Sadly, the sata driver in F8T3 is still broken and unable to see my SATA disk.


Comment 9 Kevin Fenzi 2007-10-17 20:10:19 UTC
Odd. My main test machine here uses sata_sil and works great... 
from the bootup messages: 

libata version 2.21 loaded.
sata_sil 0000:00:12.0: version 2.3
ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22
scsi0 : sata_sil
scsi1 : sata_sil
ata1: SATA max UDMA/100 cmd 0xffffc200001f8080 ctl 0xffffc200001f808a bmdma
0xffffc200001f8000 irq 22
ata2: SATA max UDMA/100 cmd 0xffffc200001f80c0 ctl 0xffffc200001f80ca bmdma
0xffffc200001f8008 irq 22
input: PS/2 Logitech Mouse as /class/input/input1
usb 2-4: new full speed USB device using ohci_hcd and address 2
usb 2-4: configuration #1 chosen from 1 choice
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: SAMSUNG SP2504C, VT100-33, max UDMA7
ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/100
ata2: SATA link down (SStatus 0 SControl 300)
scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG SP2504C  VT10 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO
or FUA
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO
or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk

lspci: 

00:12.0 IDE interface: ATI Technologies Inc 4379 Serial ATA Controller

uname:

2.6.23-6.fc8

Happy to provide further info if it would help track this down. 

Comment 10 Juha Anon 2007-10-18 10:35:33 UTC
I think I have this problem now in Fedora 8 Test 3. But I got it only after
todays updates, which included a kernel change. I think the update was the cause
of the problem, because there were nothing like that before. And now it's
repeatable. My system now fails to reboot every second time with something like
this:

Red Hat nash version 6.0.19 Starting handlers:
[<f88d558c>] (ata_interrupt 0x0/0x1c0 [libata])
Disabling IRQ #22
ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata 3.01: cmd c8/00:08:00:00:00/00:00:00:00:00/f0
tag 0 cdb 0x0 data 4096 in
ata3.00: revaluation failed (errno=-5)
... 

I can repeat that with exactly the same result. But every second time I can boot
the system.

Before the above booting log, there's a message about a BIOS bug, about a memory
area, I think: I haven't quite catched it yet, but will do so if it would help.

I have both ATA and SATA disks in my system: lspci reports this:

00:00.0 Host bridge: Intel Corporation 82975X Memory Controller Hub (rev c0)
00:01.0 PCI bridge: Intel Corporation 82975X PCI Express Root Port (rev c0)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express
Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express
Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
(rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE
Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link)
02:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
02:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20)
04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20)
06:00.0 VGA compatible controller: nVidia Corporation G70 [GeForce 7600 GS] (rev a1)

  -- Juha


Comment 11 Will Woods 2007-10-24 16:18:42 UTC
We've been unable to reproduce this bug. It might be because of the combination
of sata_sil and pata_amd?

Have you tried anything more recent than Test3?

Comment 12 David A. De Graaf 2007-10-25 15:36:02 UTC
My latest test was of F8Test3, which failed to detect and initialize the SATA 
third disk.
I'd be happy to test whatever you point me to, but I've not been following the 
rapid evolution toward F8.

My machine that won't boot any Fedora kernel escept the original
distribution - vmlinuz-2.6.21-1.3194.fc7 - has 5 (!) ATA devices:
  ata1, master - Maxtor 6Y120L0 122.9 GB
  ata1, slave  - Pioneer DVD-RW, DVR-105
  ata2, master - Seagate ST310240A 102 GB
  ata2, slave  - NEC CD-ROM, drive: 28G Rev: 3.24

  ata3, master - WDC WD1200JB-00E, 120.0 GB

The motherboard, ABIT NF7-S, has the usual 2 Ultra DMA 33/66/100/133
IDE connectors, plus two SATA 150 MB/s data channels.  It came with a
"Serillel" adapter which claims to allow "Serial ATA RAID Now!".
This adapter plugs directly onto an ATA disk and has a seral socket.
A normal SATA cable connects from this socket to the mobo SATA socket.

Purportedly, this converts an ATA drive to SATA.  It is this device
that gives rise to this bugzilla report, since it works fine with
kernel 2.6.21-1.3194.fc7, but not with any newer kernel.

I have just now tried to swap the NEC CD-ROM and the WD hard drive
so that all three hard drives are on the IDE ports and the CD-ROM
uses the "Serillel" adapter.  This did not work - the 2.6.21-1.3194.fc7
kernel drops error messages right after it starts, eg,

  ata3:  COMRESET failed (device not ready)
        [3 of these]
  ata3:  reset failed, giving up

Thereafter, all three hard drives are properly detected and mounted, but the
CD-ROM is not available.
The newest kernel, 2.6.22.9-91, produces similar but more extensive
errors and cannot access the Serillel-adapted device.

I have no further info about this "Serillel" adapter, and I see that
ABIT has undergone a "corporate restructuring".  I will probably solve
this problem by the purchase of a PCI card with additional IDE ports.

If I am the only one with this problem, it seems unproductive to spend
any more effort on this bugzilla report.  Of course, the intellectual
question remains - what change in the kernel makes it unable to detect
and initialize a pseudo-SATA disk?


Comment 13 Juha Anon 2007-10-27 21:41:35 UTC
As I mentioned in a comment above (on 2007-10-18 06:35 EST) I have something of
this kind. Right now with the latest Fedora 8 Test 3 update from today (with
kernel vmlinuz-2.6.23.1-35.fc8). But for me, it always succeeds to boot when I
try it a second time. That may help getting diagnostics from my system.

After the Red Hat nash version 0.6.19 starting
handers:
I get the text:
   Reading all physical volumes. 
   ...
when it succeeds. When it fails I get:
   [<If88d55825>] (ata_interrupt+0x0/ox1be [libata])
   ...

and it doesn't seem to be able to read my SATA disks.
   
Changing kernels can make the problem seems to disappear. But not permanently.
With the original ISO image, I had no problems of this kind at all. I got it at
an update. It disappeared after a later update. But now it's back. (I currently
have the problem with two different versions of the kernel, since I'm sometimes
running the KDE version, and there I currently have kernel: vmlinuz-2.6.23.1-31.fc8)

  -- Juha


Comment 14 Chuck Ebbert 2007-10-29 19:01:50 UTC
Does adding "pci=nomsi,nommconf" to the kernel boot options make a difference?

Comment 15 Juha Anon 2007-10-30 05:56:01 UTC
I made some tries with and without those kernel options and think there's a
difference in the frequency of success. But it can fail also with those options.

With the kernel options: booted, booted, booted, booted, failed, failed.
Without them: booted, failed, booted, failed.

It would need more tries to be sure about any change. 
I don't think there has been two failures in direct sequence without any extra
options: it has always booted on the next try after a failure.

  -- Juha



Comment 16 Juha Anon 2007-10-30 05:57:24 UTC
I made some tries with and without those kernel options and think there's a
difference in the frequency of success. But it can fail also with those options.

With the kernel options: booted, booted, booted, booted, failed, failed.
Without them: booted, failed, booted, failed.

It would need more tries to be sure about any change. 
I don't think there has been two failures in direct sequence without any extra
options: it has always booted on the next try after a failure.

  -- Juha



Comment 17 David A. De Graaf 2007-10-30 16:46:10 UTC
I'm sorry to report that adding the kernel boot option:
"pci=nomsi,nommconf" had no effect.  The error messages were unchanged
with kernel 2.6.22.9-91, eg, when nash runs it reports:

  ata3.00:  revalidation failed (errno=-5)
  ata3.00:  failed to set xfermode(err_mask=0x40)
  ata3.00:  failed to set xfermode(err_mask=0x40)
  ata3.00:  exception Emask 0x10 SAct 0xo SErrr 0x0 action 0x2 frozen

and the disk connected with the Serillel converter cannot be accessed.
I'm afraid the Serillel ATA-to-SATA converter is not usable. 
It is consigned to my junk bin.

I have, however, purchased a Creative I/O Ultra ATA IDE Controller
pci card for $15.99 and it works perfectly.  I have five ATA devices
connected: 3 disks, a CD-ROM and a DVD-RW.  All are working well.


Comment 18 Christopher Brown 2008-01-13 18:25:51 UTC
Closing NOTABUG as the original reporters indicates it may have been faulty
hardware and others have tried to reproduce and have failed. Please re-open if I
have somehow misunderstood the above comment and thank you for filing the bug
originally.