Bug 40387 - IO-APIC + IDE (PDC20262) => lost interrupts and filesystem corruption
IO-APIC + IDE (PDC20262) => lost interrupts and filesystem corruption
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-05-12 21:31 EDT by Need Real Name
Modified: 2008-08-01 12:22 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:38:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2001-05-12 21:31:47 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.3-2.14.10 i586)

Description of problem:
I have discovered that if I plug IDE drives into both channels of the
Promise Ultra66 controller in my DEC Celebris 5133/DP (dual Pentium 133
machine) and access drives on both channels at the same time, *bad
things* can happen: lost interrupts leading to filesystem corruption.

I discovered the problem initially when I plugged an ATAPI CDRW
drive into the second IDE channel and attempted to burn a disk using
cdrecord.   That caused lost interrupts, and kernel messages looking
roughly as follows:

kernel: ide_dmaproc: chipset supported ide_dma_lostirq func only: 13
kernel: hdf: lost interrupt
kernel: ide_dmaproc: chipset supported ide_dma_lostirq func only: 13
kernel: hdg: lost interrupt
kernel: hdf: timeout waiting for DMA
kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hdf: irq timeout: status=0x50 { DriveReady SeekComplete }

I can't give you the exact errors, because shortly thereafter the
/var filesystem got trashed and the log files were too smashed for
fsck to be able to recover them.

After replacing the CDRW with a hard drive, I got similar errors with
a large file copy (CD-ROM image) leaving me to assume that the problem
was not drive related.  Updating to the most recent kernel from Red
Hat's Rawhide distribution did not fix the problem.

I tried rebooting in uniprocessor mode, but continued to see problems
with lost interrupts.  Then, on a hunch, I tried booting in uniprocessor
mode with the ``noapic'' option, and the problem of ``lost interrupts''
seems to have vanished, allowing me to use my CD-RW drive to create
disks. That said, one attempt to burn a CD at 8x speed produced the
following error:

kernel: ide-scsi: CoD != 0 in idescsi_pc_intr
kernel: hdg: ATAPI reset complete
kernel: hdg: status error: status=0x58 { DriveReady SeekComplete
DataRequest }
kernel: hdg: drive not ready for command

I've enclosed some information about the machine.  If you need more
details, please let me know.

    M.E.O.


How reproducible:
Always

Steps to Reproduce:
1.Connect drives to both channels
2.Boot without disabling the IO-APIC
3.Copy data between drives on the two channels
	

Actual Results:  lost interrupts, corrupted filesystems

Expected Results:  normal data transfer between drives

Additional info:

Information about my machine

linux% uname -a
Linux tenby 2.4.3-2.14.10 #1 Mon Apr 30 19:26:24 EDT 2001 i586 unknown

linux% lspci
00:00.0 Host bridge: Intel Corporation 82434LX [Mercury/Neptune] (rev 11)
00:01.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c810
(rev 02)
00:02.0 Non-VGA unclassified device: Intel Corporation 82378IB [SIO ISA
Bridge] (rev 88)
00:06.0 Unknown mass storage controller: Promise Technology, Inc. 20262
(rev 01)00:07.0 VGA compatible controller: S3 Inc. 86c968 [Vision 968 VRAM]
rev 0
00:08.0 Ethernet controller: DigitalEquipment Corporation DECchip 21040
[Tulip] (rev 24)
                                                                               
linux% lsdev
Device            DMA   IRQ  I/O Ports
------------------------------------------------
3c509                        0230-023f
cascade             4     2 
Digital                      e880-e8ff
dma                          0080-008f
dma1                         0000-001f
dma2                         00c0-00df
eth0                     15  e880-e8ff
eth1                      5 
fpu                          00f0-00ff
ide2                         e828-e82f e83a-e83a e840-e847
ide3                     10  e830-e837 e83e-e83e e848-e84f
isapnp                       0213-0213 0a79-0a79
keyboard                  1  0060-006f
Mouse                    12 
MPU-401                      0330-0333
ncr53c8xx                11  ec00-ec7f
parport0                     0378-037a 037b-037f
PCI                          0cf8-0cfb
PDC20262                     e850-e87f
pic1                         0020-003f
pic2                         00a0-00bf
Promise                      e828-e82f e830-e837 e838-e83b e83c-e83f
e840-e87f
rtc                       8  0070-007f
serial                       03f8-03ff
soundblaster              7  0220-022f
SoundBlaster16      5       
SoundBlaster8       1       
Symbios                      ec00-ecff
timer                     0  0040-005f
vga+                         03c0-03df

linux% lsmod
Module                  Size  Used by
ipt_MASQUERADE          1200   1  (autoclean)
ipt_LOG                 3296   1  (autoclean)
ipt_limit                896   4  (autoclean)
ipt_state                576   2  (autoclean)
ip_conntrack_ftp        3344   0  (unused)
iptable_nat            13888   0  [ipt_MASQUERADE]
ip_conntrack           12720   3  [ipt_MASQUERADE ipt_state
ip_conntrack_ftp iptable_nat]
autofs                  9088   1  (autoclean)
nfsd                   66112   8  (autoclean)
lockd                  48752   1  (autoclean) [nfsd]
sunrpc                 58960   1  (autoclean) [nfsd lockd]
parport_pc             17520   1  (autoclean)
lp                      5488   0  (autoclean)
parport                24416   1  (autoclean) [parport_pc lp]
3c509                   7152   1  (autoclean)
tulip                  33840   1  (autoclean)
iptable_mangle          1696   0  (autoclean) (unused)
iptable_filter          1728   0  (autoclean) (unused)
ip_tables              11104   9  [ipt_MASQUERADE ipt_LOG ipt_limit
ipt_state iptable_nat iptable_mangle iptable_filter]
ide-scsi                7776   0 
ide-cd                 27008   0 
cdrom                  28096   0  [ide-cd]
sb                      7280   0 
sb_lib                 32880   0  [sb]
uart401                 6096   0  [sb_lib]
sound                  54784   0  [sb_lib uart401]
soundcore               3632   5  [sb_lib sound]
ncr53c8xx              52400   0  (unused)
sd_mod                 11200   0  (unused)
scsi_mod               86448   3  [ide-scsi ncr53c8xx sd_mod]

linux% cat /proc/ide/pdc202xx 

                                PDC20262 Chipset.
------------------------------- General Status
---------------------------------Burst Mode                           :
enabled
Host Mode                            : Normal
Bus Clocking                         : 33 PCI Internal
IO pad select                        : 6 mA
Status Polling Period                : 0
Interrupt Check Status Polling Delay : 0
--------------- Primary Channel ---------------- Secondary Channel
-------------                enabled                          enabled 
66 Clocking     enabled                          disabled
           Mode PCI                         Mode PCI   
                FIFO Empty                       FIFO Empty  
--------------- drive0 --------- drive1 -------- drive0 ---------- drive1
------DMA enabled:    yes              yes             no                no 
DMA Mode:       UDMA 4           UDMA 4          NOTSET            NOTSET
PIO Mode:       PIO 4            PIO 4           NOTSET            NOTSET

linux% dmesg
Linux version 2.4.3-2.14.10 (root@porky.devel.redhat.com) (gcc version 2.96
20000731 (Red Hat Linux 7.1 2.96-82)) #1 Mon Apr 30 19:26:24 EDT 2001
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f5990 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000006000000 (usable)
 BIOS-e820: 00000000fec00000 - 00000000fed00000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
 BIOS-e820: 00000000ffff5990 - 0000000100000000 (reserved)
On node 0 totalpages: 24576
zone(0): 4096 pages.
zone(1): 20480 pages.
zone(2): 0 pages.
hm, page 01000000 reserved twice.
Kernel command line: auto BOOT_IMAGE=linux-noapic ro root=2107
BOOT_FILE=/boot/vmlinuz-2.4.3-2.14.10 noapic hdg=ide-scsi
ide_setup: hdg=ide-scsi
Initializing CPU#0
Detected 133.338 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 266.24 BogoMIPS
Memory: 88380k/98304k available (1223k kernel code, 8764k reserved, 93k
data, 232k init, 0k highmem)
Dentry-cache hash table entries: 16384 (order: 5, 131072 bytes)
Buffer-cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
VFS: Diskquotas version dquot_6.5.0 initialized
CPU: Before vendor init, caps: 000003bf 00000000 00000000, vendor = 0
Intel Pentium with F0 0F bug - workaround enabled.
Intel old style machine check architecture supported.
Intel old style machine check reporting enabled on CPU#0.
CPU: After vendor init, caps: 000003bf 00000000 00000000 00000000
CPU:     After generic, caps: 000003bf 00000000 00000000 00000000
CPU:             Common caps: 000003bf 00000000 00000000 00000000
CPU: Intel Pentium 75 - 200 stepping 0c
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: none
PCI: PCI BIOS revision 2.10 entry at 0xfbedf, last bus=0
PCI: Using configuration type 2
PCI: Probing PCI hardware
  got res[10000000:13ffffff] for resource 0 of S3 Inc. 86c968 [Vision 968
VRAM] rev 0
isapnp: Scanning for PnP cards...
isapnp: Card '3Com 3C509B EtherLink III'
isapnp: 1 Plug & Play card detected total
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.1 Flags 0x03 (Driver version 1.14)
Starting kswapd v1.8
pty: 512 Unix98 ptys configured
block: queued sectors max/low 58573kB/19524kB, 192 slots per queue
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PDC20262: IDE controller on PCI bus 00 dev 30
PDC20262: chipset revision 1
PDC20262: not 100% native mode: will probe irqs later
PDC20262: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode.
    ide2: BM-DMA at 0xe840-0xe847, BIOS settings: hde:DMA, hdf:DMA
    ide3: BM-DMA at 0xe848-0xe84f, BIOS settings: hdg:pio, hdh:pio
hde: QUANTUM FIREBALL CX13.0A, ATA DISK drive
hdf: IBM-DTLA-307045, ATA DISK drive
hdg: LG CD-RW CED-8120B, ATAPI CD/DVD-ROM drive
ide2 at 0xe828-0xe82f,0xe83a on irq 10
ide3 at 0xe830-0xe837,0xe83e on irq 10
hde: 25429824 sectors (13020 MB) w/418KiB Cache, CHS=25228/16/63, UDMA(33)
hdf: 90069840 sectors (46116 MB) w/1916KiB Cache, CHS=89355/16/63, UDMA(66)
Partition check:
 hde: [PTBL] [1582/255/63] hde1 hde2 hde3 hde4 < hde5 hde6 hde7 hde8 hde9
hde10 >
 hdf: [PTBL] [5606/255/63] hdf1 hdf2 hdf3 hdf4 < hdf5 hdf6 >
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 332k freed
Serial driver version 5.05a (2001-03-20) with MANY_PORTS MULTIPORT
SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
Real Time Clock Driver v1.10d
md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md.c: sizeof(mdp_super_t) = 4096
autodetecting RAID arrays
autorun ...
... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
ncr53c8xx: at PCI bus 0, device 1, function 0
ncr53c8xx: 53c810 detected 
ncr53c810-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11
ncr53c810-0: ID 7, Fast-10, Parity Checking
scsi0 : ncr53c8xx-3.4.3-20010212
VFS: Mounted root (ext2 filesystem) readonly.
change_root: old root has d_count=3
Trying to unmount old root ... okay
Freeing unused kernel memory: 232k freed
Adding Swap: 257000k swap-space (priority -1)
Soundblaster audio driver Copyright (C) by Hannu Savolainen 1993-1996
sb: No ISAPnP cards found, trying standard ones...
SB 4.13 detected OK (220)
scsi1 : SCSI host adapter emulation for IDE ATAPI devices
  Vendor: LG        Model: CD-RW CED-8120B   Rev: 1.02
  Type:   CD-ROM                             ANSI SCSI revision: 02
Winbond Super-IO detection, now testing ports 3F0,370,250,4E,2E ...
SMSC Super-IO detection, now testing Ports 2F0, 370 ...
parport0: PC-style at 0x378 [PCSPP,TRISTATE,EPP]
parport0: cpp_daisy: aa5500ff(80)
parport0: assign_addrs: aa5500ff(80)
parport0: cpp_daisy: aa5500ff(88)
parport0: assign_addrs: aa5500ff(80)
ip_tables: (c)2000 Netfilter core team
Linux Tulip driver version 0.9.14 (February 20, 2001)
eth0: Digital DC21040 Tulip rev 36 at 0xe880, 00:00:92:B6:00:56, IRQ 15.
eth1: 3c5x9 at 0x230, 10baseT port, address  00 60 97 1f dd 4a, IRQ 5.
3c509.c:1.18 12Mar2001 becker@scyld.com
http://www.scyld.com/network/3c509.html
eth0: No link beat found.
eth1: Setting Rx mode to 1 addresses.
Winbond Super-IO detection, now testing ports 3F0,370,250,4E,2E ...
SMSC Super-IO detection, now testing Ports 2F0, 370 ...
parport0: PC-style at 0x378 [PCSPP,TRISTATE,EPP]
parport0: cpp_daisy: aa5500ff(80)
parport0: assign_addrs: aa5500ff(80)
parport0: cpp_daisy: aa5500ff(88)
parport0: assign_addrs: aa5500ff(80)
lp0: using parport0 (polling).
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
ip_conntrack (768 buckets, 6144 max)
Comment 1 Arjan van de Ven 2001-05-13 04:19:27 EDT
Quick question: Are you using the special 80 ribbon cables for the promise
controller ?
Comment 2 Need Real Name 2001-05-13 15:17:57 EDT
The two drives on the primary channel use the 80-wire cable, but the
drive on the secondary channel is using a 40-wire cable.  My
understanding was that the ATAPI CDRW (and the old 1GB drive I also
tested on that channel) drive dose not support transfer rates faster
than Ultra/33 performance, so it was fine to use a 40-wire cable in that case.

I'd be happy to buy and try an 80-wire cable for that channel, but I'm
not sure that it would change things.
Comment 3 Arjan van de Ven 2001-05-13 16:14:25 EDT
To get things clear: you never saw problems while copying stuff
from disk 0 to disk 1 or other way around (I mean, both disks on the first
channel), just from second-to-first channel ?
Comment 4 Arjan van de Ven 2001-05-13 16:30:20 EDT
Other question: does it happen if you disable IDE on the device in question?
(eg "hdparm -d0 /dev/hdf")
Comment 5 Need Real Name 2001-05-13 16:48:45 EDT
I've been using the two drives on the primary channel pretty
heavily (moving huge files between them) for several months,
without seeing any problems.  Prior to installing RH 7.1,
I ran my own custom kernel which had the backported IDE
patches so that I could use the Promise controller.

To answer your other question, I tried turning off DMA (with
``hdparm -d0'') on the drives, but it didn't help.  The lost
interrupt problem persisted.

FWIW, I installed another OS just to double-check my hardware
configuration, and everything appears to work fine there.  That,
combined with the fact that turning off the IO-APIC and SMP seems to
fix the problem in Linux leads me to believe that it is a software
rather than hardware problem.
Comment 6 Arjan van de Ven 2001-05-13 17:20:27 EDT
What version of the Promise bios is this ?
Comment 7 Need Real Name 2001-05-13 17:49:26 EDT
It runs the 2.0 BIOS, which is the latest one available from Promise.
(I flash upgraded it shortly after gettting the card).
Comment 8 Bugzilla owner 2004-09-30 11:38:59 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.