Bug 179564 - dma does not get enabled on boot for ide harddisk [OLDIDE NVIDIA ODDSECTOR]
dma does not get enabled on boot for ide harddisk [OLDIDE NVIDIA ODDSECTOR]
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
All Linux
medium Severity high
: ---
: ---
Assigned To: Alan Cox
Brian Brock
http://www.redhat.com/archives/fedora...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-01 09:02 EST by drago01
Modified: 2007-11-30 17:11 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-21 13:14:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
smartctl output (8.57 KB, text/plain)
2006-03-27 00:45 EST, drago01
no flags Details

  None (edit)
Description drago01 2006-02-01 09:02:55 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de-DE; rv:1.8) Gecko/20051202 Fedora/1.5-1 Firefox/1.5

Description of problem:
after updating to kernel-2.6.15-1.1829_FC4 and udev-071-0.FC4.2 dma is not working for my ide harddisk on boot. (error messages see url)
but it works if I enable it using hdparm.
with 1824 and udev-058 I don't had this problems.
 

Version-Release number of selected component (if applicable):
kernel-2.6.15-1.1829_FC4

How reproducible:
Always

Steps to Reproduce:
1. boot
2. look at dmesg or hdparm /dev/hda


Actual Results:  no dma at boot + error messages

Expected Results:  dma should be enabled at boot + no error messages

Additional info:

00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:07.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)
01:07.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a)
01:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80)
01:0a.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13)
05:00.0 VGA compatible controller: nVidia Corporation GeForce 7800 GTX (rev a1)
Comment 1 Dave Jones 2006-02-03 00:38:16 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 2 drago01 2006-02-03 14:57:49 EST
this bug still exists in 2.6.15-1.1830_FC4 (which is almost the same as 1829).
Comment 3 drago01 2006-02-09 05:44:56 EST
could this bug be more udev related than kernel?
Alan Cox wrote on the fedora-test-list that some app is trying to read beyond
the device.
If I look at the dmesg output I see this:
---
Probing IDE interface ide0...
hda: ST340823A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: LITE-ON LTR-52246S, ATAPI CD/DVD-ROM drive
hdd: LITE-ON DVDRW LDW-811S, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: Host Protected Area detected.
        current capacity is 78156288 sectors (40016 MB)
        native  capacity is 78156289 sectors (40016 MB)
hda: Host Protected Area disabled.
hda: 78156289 sectors (40016 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes not supported
 hda: hda1
hdc: ATAPI 52X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
hdd: ATAPI 40X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
ide-floppy driver 0.99.newide
---
and at the end after selinux init:
-----------------
SELinux: initialized (dev bdev, type bdev), uses genfs_contexts
SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78156288, sector=78156288
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78156288, sector=78156288
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[...]
-----
is this correct or did I miss something?
should this bug be reasigned to udev?
Comment 4 drago01 2006-03-06 11:09:17 EST
the same problem exist when I use 2.6.16-rc5 (vanilla)
Comment 5 drago01 2006-03-06 11:25:51 EST
http://bugzilla.kernel.org/show_bug.cgi?id=6162
seems like ide is somehow broken on my box?
have anything changed about ide from udev 058 -> 071 ?
Comment 6 drago01 2006-03-10 03:05:17 EST
I googled for it and found this:
http://forums.gentoo.org/viewtopic.php?t=372550
(no solution)
could lvm be buggy? (I don't use it)
Comment 7 Jon Burgess 2006-03-25 15:37:49 EST
According to the Seagate manual for your drive it should have 78,165,360
sectors. http://www.seagate.com/support/disc/manuals/ata/u5pmb01.pdf

The figure for your ST340823A drive doesn't match. It is missing 9072 sectors
(78165360 vs 78156288). I don't know where the difference comes from. Two
possibilities:-

1) Enable LBA mode in your BIOS.
What does "hdparm -i /dev/hda" report? It should tell you about the current LBA
status and number of LBA sectors.

2) The drive might be losing some capacity due to some failing sectors. 
Could you try inspecting the SMART stats of the drive and get it to run a
thorough disk test?
# smartctl -s on /dev/hda
# smartctl -a /dev/hda
# smartctl -t long /dev/hda

I did a Google search for kernel boot logs and found a couple of references
where people have that same drive working on a 2.6.15 kernel with the correct
number of sectors, e.g. http://kerneltrap.org/node/6306

...
Linux version 2.6.15.5 (root@dadslinux) (gcc version 3.3.6) #1 PREEMPT Tue Mar 7
09:24:36 CST 2006
...
hda: ST340823A, ATA DISK drive
hdb: ST340823A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: CD-W58E, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: Host Protected Area detected.
current capacity is 78165360 sectors (40020 MB)
native capacity is 78165361 sectors (40020 MB)
hda: Host Protected Area disabled.
hda: 78165361 sectors (40020 MB) w/512KiB Cache, CHS=65535/16/63, UDMA(66)
hda: cache flushes not supported
hda: hda1 hda2 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 >
hdb: max request size: 128KiB
hdb: 78165360 sectors (40020 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(66)
Comment 8 drago01 2006-03-26 03:30:29 EST
(In reply to comment #7)
> According to the Seagate manual for your drive it should have 78,165,360
> sectors. http://www.seagate.com/support/disc/manuals/ata/u5pmb01.pdf
> 
> The figure for your ST340823A drive doesn't match. It is missing 9072 sectors
> (78165360 vs 78156288). I don't know where the difference comes from. Two
> possibilities:-
> 
> 1) Enable LBA mode in your BIOS.
> What does "hdparm -i /dev/hda" report? It should tell you about the current LBA
> status and number of LBA sectors.
> 
/dev/hda:

 Model=ST340823A, FwRev=3.05, SerialNo=6EF0CAAP
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=1024kB, MaxMultSect=16, MultSect=1
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=78156288
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4

 * signifies the current active mode


> 2) The drive might be losing some capacity due to some failing sectors. 
> Could you try inspecting the SMART stats of the drive and get it to run a
> thorough disk test?
> # smartctl -s on /dev/hda
> # smartctl -a /dev/hda
> # smartctl -t long /dev/hda
> 
no drive seems to be ok:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours) 
LBA_of_first_error
# 1  Extended offline    Completed without error       00%     15837         -
# 2  Short offline       Completed without error       00%     13627         -
# 3  Short offline       Completed without error       00%     13627         -
# 4  Short offline       Completed without error       00%     12063         -
# 5  Short offline       Completed without error       00%     12063         -
# 6  Short captive       Completed without error       00%         1         -



> I did a Google search for kernel boot logs and found a couple of references
> where people have that same drive working on a 2.6.15 kernel with the correct
> number of sectors, e.g. http://kerneltrap.org/node/6306
> 
> ...
> Linux version 2.6.15.5 (root@dadslinux) (gcc version 3.3.6) #1 PREEMPT Tue Mar 7
> 09:24:36 CST 2006
> ...
> hda: ST340823A, ATA DISK drive
> hdb: ST340823A, ATA DISK drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> Probing IDE interface ide1...
> hdc: CD-W58E, ATAPI CD/DVD-ROM drive
> ide1 at 0x170-0x177,0x376 on irq 15
> hda: max request size: 128KiB
> hda: Host Protected Area detected.
> current capacity is 78165360 sectors (40020 MB)
> native capacity is 78165361 sectors (40020 MB)
> hda: Host Protected Area disabled.
> hda: 78165361 sectors (40020 MB) w/512KiB Cache, CHS=65535/16/63, UDMA(66)
> hda: cache flushes not supported
> hda: hda1 hda2 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 >
> hdb: max request size: 128KiB
> hdb: 78165360 sectors (40020 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(66)
> 

Comment 9 drago01 2006-03-26 03:33:12 EST
problem still exist in FC5 (2.6.16-1.2074_FC5)
Comment 10 Jon Burgess 2006-03-26 11:43:01 EST
What do the top level SMART stats about the drive look like? There should be a
line like:
  SMART overall-health self-assessment test result: PASSED
followed by a table showing the power on hours, number of remapped sectors etc.
Could you write all the output from "smartctl -a /dev/hda" to a file and attach
that to this bug?
It may be possible for a drive to pass the tests even if it has marked 9000
sectors as bad. I had a new drive once which initially reported a bad sector,
but it disappeared it was remapped by the firmwar once i'd wiped the drive. I
believe drives only have a small number of reserved sectors available for
remapping before you start losing real capacity from the drive.

How about trying the disk diagnostic tools from Seagate? They ought to reliably
detect the info about the number of sectors on the drive. They come as a
bootable CD image so can run on anything which can boot from a CD-Rom.
http://www.seagate.com/support/seatools/B7a.html

I'm wondering whether the problem is with the kernel trying to access one more
sector than the drive has available, or whether the drive claims to have a
sector which it isn't able to read. It is interesting that the kernel HPA code
appears to increment the drive sector count by 1, perhaps this is where the
problem lies. Unfortunately I can't seems to see an easy user-visible way to
switch this off without patching and recompiling the kernel.

Note that the whole idea of clipping disks the disk capacity using these "host
protected area" features was introduced around the time of the first 40GB disks
(due to BIOS and hardware problems with disks > 33.8GB). I believe your disk is
one of these very first 40GB disks to hit the market, so I wouldn't be too
surprised if it was a firmware issue in the drive (or disk controller).

The "large disk howto" 
http://www.tldp.org/HOWTO/html_single/Large-Disk-HOWTO/#s11
notes that your Seagate drive might have some drive specific quirk:
...For models ST-340016A, ST-340823A, ST-340824A, ST-360021A, ST-380021A: The
ATA Set Features F1 sub-command will cause Identify Data words 60-61 to report
the true full capacity.

One fix might be to repartition the drive so that you don't use the last few MB
of the drive. 
Comment 11 Jon Burgess 2006-03-26 11:57:43 EST
It might be worth taking a look here
http://www.ussg.iu.edu/hypermail/linux/kernel/0104.3/1190.html 
This is another user with the same drive as you where the drive doesn't seem to
respond to the normal HPA command sequence, but does work with the "seagate.c"
program which he attached to that message. Maybe it is worth trying that program
and see if hdparm reports the full capacity afterwards?
Comment 12 drago01 2006-03-27 00:45:27 EST
Created attachment 126787 [details]
smartctl output

I will provide more info when I come home but for now I can tell you that I
already do not use the last 8mb of the disk.
smartctl output is attached
Comment 13 Rodrigo Hernandez 2006-05-13 13:35:21 EDT
I believe the bug is actually in the official NVidia driver.

After reinstalling FC5 a couple of times and both times getting system libraries
corrupted because of this bug, I decided to get a new drive, while I wait for
the new drive (assuming this was a linux kernel issue) I installed FreeBSD 6.1
on the ST340823A.

The instalation went fine and the drive showed no signs of errors, that is until
I installed the NVIDIA-FreeBSD-x86-1.0-8756 driver, after that, the same trying
to read past the size of the disk showed up.

At first I thought the drive was faulty, but then after realizing that it was
working fine before installing the NVidia Driver, I decided to remove the
driver, and everything is back to normal, the bug just has to be there.
Comment 14 Dave Jones 2006-09-16 22:19:23 EDT
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.
Comment 15 drago01 2006-09-17 08:01:42 EDT
still happens with 2.6.17-1.2187_FC5
Comment 16 Alan Cox 2006-09-17 11:42:54 EDT
If its only reproducable with the Nvidia provided 3d drivers installed on the
machine then you need to take it up with Nvidia instead, they have the Linux
code we don't have theirs so only they can debug it.

Comment 17 drago01 2006-09-17 11:49:26 EDT
for me it happens with and without the nvidia drivers.
Comment 18 Alan Cox 2006-09-17 17:07:00 EDT
Dave - did you turn on the GPT partitioning stuff ?

Working through all of this:
- The size issue is a red herring. Actual size depends on disk, firmware and
geometry.
- The error trace is an attempt to read the last valid sector. I suspect however
we issued a 1K read  for it.
- Its almost the only drive type that has both odd sector counts *and* errors a
1K read for the last sector.

That to me smells like the "last sector' peek stuff one of the partition table
handlers use would be the logical trigger. Especially as the box then runs fine.


Comment 19 Steve 2006-09-20 23:15:17 EDT
you may want to check out: bug 163418
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=163418
i filed this a while ago when i couldn't get DMA on my DVD drive

what exact chipset are you using? eg Nforce 4, 5?
FC5 has support for both of those, as Fedora has support for my 915 intel chipset.
But Fedora kernel doesn't correctly load correct drivers for me.
It loads IDE driver when it should load SATA (libata) driver
i compile my own kernel with custom config so thats and i don't have problems
with new fedora kernels i can add:
hdc=noprobe combined_mode=libata

consider looking into this.
Comment 20 drago01 2006-09-21 13:06:27 EDT
no this isn't the problem.
I have a nforce4 sli chipset and dma works fine for the cd drives (2) and I can
enable dma using hdparm (and get no erros after doing this).
Comment 21 Dave Jones 2006-10-16 14:29:07 EDT
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.
Comment 22 drago01 2006-10-18 14:38:48 EDT
This still happens :(
Comment 23 drago01 2006-11-05 03:59:45 EST
how can I disable lvm beeing started at boot?
its seems that its causing this. 
Comment 24 drago01 2007-01-10 16:23:59 EST
just wanted to note that 2.6.19-1.2888.fc6 still don't fix it..
maybe the old ide -> libata ide change in F7 will fix it? 
if someone has any idea which kind of info I can/should provide feel free to ask.
Comment 25 drago01 2007-03-14 16:43:28 EDT
kernel-2.6.20-1.2925.fc6 
=> still the same :(
Comment 26 Rodrigo Hernandez 2007-03-14 17:36:48 EDT
This may actually have to do with an underpowered power supply, I have an nVidia
6800GT card, 2 IDE DVD ROMS and 3 Hard drives hooked to a 400w power supply, I
replaced the Seagate drive this bug refers to with a Western Digital Sata drive,
it ran fine for a while but started giving similar problems, turns out it was
hooked to the same power line as the video card, and apparently that causes
fluctuations on the power line, I made it so the DVD ROMS are now the ones
sharing the power line with the card, and now the HDs show no problems, though
my DVD burner creates a lot of coasters.
Comment 27 drago01 2007-03-14 17:54:22 EDT
I have a Tagan easycon 480W PSU and a 7800GTX but the videocard is attached to
its own power cord (does not share with anything). the same drive also works
fine with windows and all livecds I tested so far. (had not mounted it with a f7
livecd but I can try tomorrow if this helps)
Comment 28 drago01 2007-04-15 08:53:01 EDT
why does the kernel even allow to read beyond the end of a device?
and why does it disable dma in this case? wouldn't returning EOF be the better
action that should be done in this case?
Comment 29 Alan Cox 2007-04-17 11:14:02 EDT
That would require an essay sized answer for the disk question.
Comment 30 Alan Cox 2007-05-21 13:14:14 EDT
I'm going to close this WONTFIX, simply because the required work to fix it in
the old IDE layer is huge and would be risky. Realistically it won't happen. The
SCSI core (used by libata) appears to already handle the odd sized media cases
correctly.

Comment 31 drago01 2007-06-01 12:43:10 EDT
this is indeed fixed in kernel-2.6.21-1.3194.fc7 (which uses libata)

Note You need to log in before you can comment on or make changes to this bug.