Bug 162314

Summary: Complete system lockup when using IDE with DMA-enabled on ServerWorks chipset
Product: [Fedora] Fedora Reporter: Markus Hakansson <mh>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: davej, pfrields, teicher-fedora, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-05 01:16:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Scimark2, C version compiled with gcc2.95.3 none

Description Markus Hakansson 2005-07-02 11:13:24 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; sv-SE; rv:1.7.6) Gecko/20050524 Firefox/1.0 (Ubuntu package 1.0.2 MFSA2005-44)

Description of problem:
When DMA is enabled for an IDE drive and I try to copy a large file the machine hangs completely within 4 seconds, no kernel-panic. If I disable DMA it works but the machine is almost completely unresponsive during large operations.
The macine is a Compaq ProLiant ML350G2 with the ServerWorks LE 3.0 chipset.
The IDE-disc is master on the secondary channel, there is a CD-Rom as master on the first channel.
This worked with the latest kernel on FC2, however we had to boot the system with ide1=66 (or similar), I tried to remove this line aswell.

Version-Release number of selected component (if applicable):
kernel-2.6.11-1.1369_FC4.i686

How reproducible:
Always

Steps to Reproduce:
1. Install FC3 on Compaq ML350g2
2. Install IDE disc with DMA enabled (default)
3. Copy large files to it


Actual Results:  The machine freezes (no panic), just a complete freeze

Additional info:

VMware binary-kernel modules are inserted into the kernel, aswell as HPs kernelmodules.
I will try to reproduce without these modules early next week.

Comment 1 Markus Hakansson 2005-07-04 09:04:37 UTC
Updated the kernel to 2.6.12-1.1387_FC4 and removed the tainting kernel-modules,
exactly the same issue.
I also removed the ide1=ata66 from the boot-parameters, no difference:

This is the dmesg output when probing IDE:

ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller at PCI slot 0000:00:0f.1
SvrWks CSB5: chipset revision 146
SvrWks CSB5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0x2000-0x2007, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0x2008-0x200f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: Compaq CRD-8402B, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: HDS722516VLAT80, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 1024KiB
hdc: 321672960 sectors (164696 MB) w/7938KiB Cache, CHS=20023/255/63
hdc: cache flushes supported
 hdc: hdc1
hda: ATAPI 40X CD-ROM drive, 128kB Cache


Comment 2 Markus Hakansson 2005-07-04 09:23:51 UTC
Noticed that there was an old configuration in rc.local that I had forgotten about:
 hdparm -d1 -Xudma4 /dev/hdc

When I removed it, the device stays in the working (but extremely slow) PIO mode.
I also tested with:
 hdparm -d 1 /dev/hdc

Instead of freezing the machine gave these errors:
attempt to access beyond end of device
hdc1: rw=0, want=34197063424, limit=321669432
attempt to access beyond end of device
hdc1: rw=0, want=34197210440, limit=321669432
attempt to access beyond end of device
<... snip ...>
hdc1: rw=0, want=30201610288, limit=321669432
attempt to access beyond end of device
hdc1: rw=0, want=17181966336, limit=321669432
EXT3-fs error (device hdc1): ext3_readdir: bad entry in directory #131089:
rec_len is smaller than minimal - offset=0, inode=1179647, rec_len=2, name_len=12
Aborting journal on device hdc1.
ext3_abort called.
EXT3-fs error (device hdc1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
hdc: DMA disabled
__journal_remove_journal_head: freeing b_committed_data


Comment 3 Markus Hakansson 2005-07-04 14:55:38 UTC
When booting without ide1=ata66 and issuing 'hdparm -d1 /dev/hdc' the system
cannot find a filesystem on the disk.
If I instead use 'hdparm -d1 -X66 /dev/hdc' I can mount the device but the
system freezes when copying a file.
I also tried booting with noacpi, same result.

Comment 4 Alan Cox 2005-07-07 23:10:28 UTC
The BIOS appears to have selected PIO. We honour the BIOS by default in this case.


Comment 5 Markus Hakansson 2005-07-08 07:06:10 UTC
Yes, the BIOS does not have any way to change this setting. With the earlier
kernels I could override the BIOS-settings by passing 'ide1=ata66' and then
setting 'hdparm -d1 -X66 /dev/hdc'.
When using FC2 this worked.

Comment 6 Dave Jones 2005-07-15 21:12:13 UTC
[This comment has been added as a mass update for all FC4 kernel bugs.
 If you have migrated this bug from an FC3 bug today, ignore this comment.]

Please retest your problem with todays 2.6.12-1.1398_FC4 update.

If your problem involved being unable to boot, or some hardware not being
detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE*
installing any kernel updates.
If in doubt, you can recreate this file using..

mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak
mv /etc/modprobe.conf /etc/modprobe.conf.bak
kudzu


Thank you.


Comment 7 Charles C. Van Tilburg 2005-07-31 22:45:43 UTC
FWIW:

I can get a lockup on 2.6.12-1.1398_FC4 pretty fast using 
the C version of scimark, compiled with gcc 2.95.3, -O2... 

nothing logged... don't know if this is related or not.

Further, an attempt to recompile gcc2.95.3, just in case that
was required, resulted in failure on a make bootstrtap looking
for a now non-existant header file:

/home/software/gcc-2.95.3/gcc/xgcc -B/home/software/gcc-2.95.3/gcc/
-B/usr/local/i686-pc-linux-gnu/bin/ -c -g -O2 -I. -I. -D_IO_MTSAFE_IO iogetline.c
In file included from libio.h:167,
                 from iolibio.h:1,
                 from libioP.h:47,
                 from iogetline.c:26:
/usr/include/bits/stdio-lock.h:24: lowlevellock.h: No such file or directory
make[2]: *** [iogetline.o] Error 1
make[2]: Leaving directory `/home/software/gcc-2.95.3/i686-pc-linux-gnu/libio'
make[1]: *** [all-target-libio] Error 2
make[1]: Leaving directory `/home/software/gcc-2.95.3'
make: *** [bootstrap] Error 2


Comment 8 Charles C. Van Tilburg 2005-07-31 22:51:45 UTC
Created attachment 117329 [details]
Scimark2, C version compiled with gcc2.95.3

Here is the scimark2 binary compiled with gcc2.95.3,
built under FC3.

The compiler was built under FC2 or FC3, I don't recall
which.

Comment 9 Charles C. Van Tilburg 2005-08-01 16:52:16 UTC
Interestingly, the scimark2 binary provided does NOT cause a 
lockup on an otherwise identical FC4 software installation on 
a similair machine.  

The major differences are: 

system that locks up: ECS [Apollo KT266/A/333]/AMD XP3000/
  Nvidia GeForce 4/ti4200 AGP running X with Nvidia 7667 
  drivers and using the system AGPGART, AGP 4x mode.

system that does not lock up: Foxconn [KT400/KT600 AGP]/
  AMD XP2600/PCI S3 86c764/765 [Trio32/64/64V+] not running X

I have reported this to Nvidia in their forum and via email
on the outside chance that it involves their driver.

I think the lowlevellock.h missing file should be addressed.

Comment 10 Charles C. Van Tilburg 2005-08-02 12:45:25 UTC
Interestingly, disabling ACPI helps when gcc 2.95.3 scimark2
is run in the console (X not running), but does not solve the
problem... it just takes more runs to lock up.  Having X 
running means it locks up at the first run.

Comment 11 Charles C. Van Tilburg 2005-08-02 17:01:22 UTC
After a new boot, at console mode, the nvidia kernel module 
is not loaded, so this cannot involve their driver or other
libraries.

I noticed via the nvidia log file that the 2.6.12-1.1398_FC4
kernel was compiled with gcc 4.0.0... I am going to try to
rebuild with the current system 4.0.1

Comment 12 Dave Jones 2005-08-02 19:44:39 UTC
in private mail, it transpired that Charles issues were nothing to do with
serverworks IDE, so ignore the last few comments related to this bug.


Comment 13 Markus Hakansson 2005-09-08 08:55:22 UTC
I today retested this with 2.6.12-1.1447_FC4 with the same result.
I passed the ide1=dma66 to the kernel and ran hdparm -d1 -Xudma4 /dev/hdc
I then tried to copy some files and i hung within 10 seconds.

Comment 14 Dave Jones 2005-09-30 06:19:18 UTC
Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.


Comment 15 Markus Hakansson 2005-10-03 08:47:41 UTC
Tested with kernel 2.6.13-1.1526_FC4 and the problem still exists.

Comment 16 Dave Jones 2005-11-10 19:17:54 UTC
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.


Comment 17 Dave Jones 2006-02-03 06:40:08 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 18 John Thacker 2006-05-05 01:16:23 UTC
Closing per previous comment.