Bug 111998 - Random lockups under heavy IDE disk write
Random lockups under heavy IDE disk write
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
9
athlon Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-12-12 12:33 EST by Doug Oakes
Modified: 2007-04-18 13:00 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-12-26 22:55:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Doug Oakes 2003-12-12 12:33:41 EST
Description of problem:

I recently replaced my motherboard and hard drive and since have been
experiencing regular, random system lockups when the system is under
heavy disk writes.  Sometimes I get a kernel oops with the process
usually listed as kjournald or kswapd. Sometimes there is no Oops, the
system just freezes. 

Version-Release number of selected component (if applicable):

This has happened with all kernel versions that I have tried since
RedHat 7.3 (currently running 2.4.20-24.9)

How reproducible:
Heavy writes to disk.  Tar or download a large file will usually cause
the problem.  Often it happen overnight without any evidence as to
what is running and no log is made.

Steps to Reproduce:
1.  tar cvf /tmp/usr.tar /usr
2.
3.
  
Actual results:
System freezes or stalls with no new connections allowed

Expected results:
No problem

Additional info:
Comment 1 Doug Oakes 2003-12-12 15:19:40 EST
Example of Oops message:

Dec 12 11:49:28 vader kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000
Dec 12 11:49:28 vader kernel:  printing eip:
Dec 12 11:49:28 vader kernel: 00000000
Dec 12 11:49:28 vader kernel: *pde = 00000000
Dec 12 11:49:28 vader kernel: Oops: 0000
Dec 12 11:49:28 vader kernel: ide-cd cdrom parport_pc lp parport
via82cxxx_audio uart401 ac97_codec sound soundcore smbfs tg3 via-rhine
mii ipchains st keybdev mousedev hid input ehci-hcd
Dec 12 11:49:28 vader kernel: CPU:    0
Dec 12 11:49:28 vader kernel: EIP:    0060:[<00000000>]    Not tainted
Dec 12 11:49:28 vader kernel: EFLAGS: 00010282
Dec 12 11:49:28 vader kernel:
Dec 12 11:49:28 vader kernel: EIP is at [unresolved] (2.4.20-24.9)
Dec 12 11:49:28 vader kernel: eax: c015ba10   ebx: c2ef1840   ecx:
00000000   edx: c2ef1850
Dec 12 11:49:28 vader kernel: esi: c030b120   edi: c25a0800   ebp:
d9117dc0   esp: d788bf4c
Dec 12 11:49:28 vader kernel: ds: 0068   es: 0068   ss: 0068
Dec 12 11:49:28 vader kernel: Process sendmail (pid: 1221,
stackpage=d788b000)
Dec 12 11:49:28 vader kernel: Stack: c015b9e3 c2ef1840 cbe54ac0
00000019 d9117dc0 c2ef1840 c2ef1840 c0158e60
Dec 12 11:49:28 vader kernel:        c2ef1840 cbe54ac0 cbe54ac0
c25a2340 c0145cd5 d9117dc0 c259d1c0 cbe54ac0
Dec 12 11:49:28 vader kernel:        d848a740 00000000 bfffbe88
c014436d cbe54ac0 d848a740 cbe54ac0 00000001
Dec 12 11:49:28 vader kernel: Call Trace:   [<c015b9e3>] iput [kernel]
0x273 (0xd788bf4c))
Dec 12 11:49:28 vader kernel: [<c0158e60>] dput [kernel] 0xa0
(0xd788bf68))
Dec 12 11:49:28 vader kernel: [<c0145cd5>] fput [kernel] 0xd5
(0xd788bf7c))
Dec 12 11:49:28 vader kernel: [<c014436d>] filp_close [kernel] 0x4d
(0xd788bf98))
Dec 12 11:49:28 vader kernel: [<c01443ee>] sys_close [kernel] 0x4e
(0xd788bfb0))
Dec 12 11:49:28 vader kernel: [<c010939f>] system_call [kernel] 0x33
(0xd788bfc0))


Dmesg:

Linux version 2.4.20-24.9 (bhcompile@daffy.perf.redhat.com) (gcc
version 3.2.2 2
0030222 (Red Hat Linux 3.2.2-5)) #1 Mon Dec 1 11:43:36 EST 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
 BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
On node 0 totalpages: 131056
zone(0): 4096 pages.
zone(1): 126960 pages.
zone(2): 0 pages.
Kernel command line: ro root=LABEL=/
Initializing CPU#0
Detected 1000.051 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1992.29 BogoMIPS
Memory: 511248k/524224k available (1333k kernel code, 10412k reserved,
1001k dat
a, 132k init, 0k highmem)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0183fbff c1c7fbff 00000000 00000000
CPU:             Common caps: 0183fbff c1c7fbff 00000000 00000000
CPU: AMD Athlon(tm) processor stepping 02
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfb3e0, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router VIA [1106/3177] at 00:11.0
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16)
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT
SHARE_IRQ SER
IAL_PCI ISAPNP enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta3-.2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
VP_IDE: IDE controller at PCI slot 00:11.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci00:11.1
    ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:pio, hdd:pio
hda: Maxtor 6Y160P0, ATA DISK drive
hdb: ATAPI CD-ROM MAX 58X, ATAPI CD/DVD-ROM drive
blk: queue c03c5920, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 320173056 sectors (163929 MB) w/7936KiB Cache, CHS=19929/255/63,
UDMA(133)
ide-floppy driver 0.99.newide
Partition check:
 hda: hda1 hda2 hda3
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 270k freed
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
PCI: Found IRQ 11 for device 00:0a.0
PCI: Sharing IRQ 11 with 00:10.1
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
        <Adaptec 29160B Ultra160 SCSI adapter>
        aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

blk: queue c2552e14, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SEAGATE   Model: DAT    06240-XXX  Rev: 8210
  Type:   Sequential-Access                  ANSI SCSI revision: 03
blk: queue dfd68014, I/O limit 4095Mb (mask 0xffffffff)
Journalled Block Device driver loaded
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: ide0(3,3): orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 2932911
EXT3-fs: ide0(3,3): 1 orphan inode deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 132k freed
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
PCI: Found IRQ 11 for device 00:10.3
ehci-hcd 00:10.3: VIA Technologies, Inc. USB 2.0
ehci-hcd 00:10.3: irq 11, pci mem e088e000
usb.c: new USB bus registered, assigned bus number 1
PCI: 00:10.3 PCI cache line size set incorrectly (32 bytes) by BIOS/FW.
PCI: 00:10.3 PCI cache line size corrected to 64.
ehci-hcd 00:10.3: USB 2.0 enabled, EHCI 1.00, driver 2003-Jan-22
hub.c: USB hub found
hub.c: 4 ports detected
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
mice: PS/2 mouse device common for all mice
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,3), internal journal
Adding Swap: 1004052k swap-space (priority -1)
st: Version 20030406, bufsize 32768, max init. bufs 4, s/g segs 16
Attached scsi tape st0 at scsi0, channel 0, id 6, lun 0
parport0: PC-style at 0x378 [PCSPP,TRISTATE]
ip_conntrack version 2.1 (4095 buckets, 32760 max) - 292 bytes per
conntrack
via-rhine.c:v1.10-LK1.1.17  March-1-2003  Written by Donald Becker
  http://www.scyld.com/network/via-rhine.html
PCI: Found IRQ 10 for device 00:12.0
PCI: Sharing IRQ 10 with 00:10.0
divert: allocating divert_blk for eth0
eth0: VIA VT6102 Rhine-II at 0xe800, 00:30:1b:21:d3:ee, IRQ 10.
eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link
41e1.
eth0: Setting full-duplex based on MII #1 link partner capability of 41e1.
tg3.c:v1.5 (March 21, 2003)
PCI: Found IRQ 5 for device 00:0b.0
PCI: Sharing IRQ 5 with 00:11.5
divert: allocating divert_blk for eth1
eth1: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit)
10/100/10
00BaseT Ethernet 00:40:f4:47:0a:fb
tg3: eth1: Link is up at 1000 Mbps, full duplex.
tg3: eth1: Flow control is on for TX and on for RX.
eth0: Promiscuous mode enabled.
device eth0 entered promiscuous mode
smbfs: Unrecognized mount option noauto
smbfs: Unrecognized mount option noauto
smbfs: Unrecognized mount option noauto
Via 686a/8233/8235 audio driver 1.9.1-ac3rh2
PCI: Found IRQ 5 for device 00:11.5
PCI: Sharing IRQ 5 with 00:0b.0
via82cxxx: Six channel audio available
PCI: Setting latency timer of device 00:11.5 to 64
ac97_codec: AC97 Audio codec, id: ALG32 (ALC650)
via82cxxx: board #1 at 0xE400, IRQ 5
parport0: PC-style at 0x378 [PCSPP,TRISTATE]
lp0: using parport0 (polling).
lp0: console ready

Comment 2 Doug Oakes 2003-12-12 15:30:19 EST
A better Oops example:
Dec  9 19:28:18 vader kernel: Code:  Bad EIP value.
Dec  9 19:28:18 vader kernel:  <1>Unable to handle kernel NULL pointer
dereference at virtual address 00000000
Dec  9 19:28:18 vader kernel:  printing eip:
Dec  9 19:28:18 vader kernel: 00000000
Dec  9 19:28:18 vader kernel: *pde = 00000000
Dec  9 19:28:18 vader kernel: Oops: 0000
Dec  9 19:28:18 vader kernel: parport_pc lp parport via82cxxx_audio
uart401 ac97_codec sound soundcore smbfs tg3 via-rhine mii ipchains st
ehci-hcd usbcore aic7xxx sd_mod scsi_mod
Dec  9 19:28:18 vader kernel: CPU:    0
Dec  9 19:28:18 vader kernel: EIP:    0060:[<00000000>]    Not tainted
Dec  9 19:28:18 vader kernel: EFLAGS: 00010282
Dec  9 19:28:18 vader kernel:
Dec  9 19:28:18 vader kernel: EIP is at [unresolved] (2.4.20-24.9)
Dec  9 19:28:18 vader kernel: eax: c015ba10   ebx: c5fdb900   ecx:
00000000   edx: c5fdb910
Dec  9 19:28:18 vader kernel: esi: c030b120   edi: c25a0800   ebp:
d7c5c7c0   esp: c0f57d14
Dec  9 19:28:18 vader kernel: ds: 0068   es: 0068   ss: 0068
Dec  9 19:28:18 vader kernel: Process top (pid: 2244, stackpage=c0f57000)
Dec  9 19:28:18 vader kernel: Stack: c015b9e3 c5fdb900 c03099c0
c1038030 d7c5c7c0 c5fdb900 c5fdb900 c0158e60
Dec  9 19:28:18 vader kernel:        c5fdb900 00000001 d7402bc0
c25a2340 c0145cd5 d7c5c7c0 bfffc000 d7402bc0
Dec  9 19:28:18 vader kernel:        d196f740 00000000 00000001
c014436d d7402bc0 d196f740 0000003f 00000003
Dec  9 19:28:18 vader kernel: Call Trace:   [<c015b9e3>] iput [kernel]
0x273 (0xc0f57d14))
Dec  9 19:28:18 vader kernel: [<c0158e60>] dput [kernel] 0xa0
(0xc0f57d30))
Dec  9 19:28:18 vader kernel: [<c0145cd5>] fput [kernel] 0xd5
(0xc0f57d44))
Dec  9 19:28:18 vader kernel: [<c014436d>] filp_close [kernel] 0x4d
(0xc0f57d60))
Dec  9 19:28:18 vader kernel: [<c011ea7d>] put_files_struct [kernel]
0x5d (0xc0f57d78))
Dec  9 19:28:18 vader kernel: [<c011f180>] do_exit [kernel] 0x110
(0xc0f57d94))
Dec  9 19:28:18 vader kernel: [<c0109a80>] do_divide_error [kernel]
0x0 (0xc0f57db0))
Dec  9 19:28:18 vader kernel: [<c0117014>] do_page_fault [kernel]
0x2b4 (0xc0f57dc4))
Dec  9 19:28:18 vader kernel: [<c021c492>] tcp_send_ack [kernel] 0x82
(0xc0f57de4))
Dec  9 19:28:18 vader kernel: [<c021d423>] tcp_delack_timer [kernel]
0x133 (0xc0f57dfc))
Dec  9 19:28:18 vader kernel: [<c021d2f0>] tcp_delack_timer [kernel]
0x0 (0xc0f57e0c))
Dec  9 19:28:18 vader kernel: [<c01259be>] run_timer_list [kernel]
0xee (0xc0f57e14))
Dec  9 19:28:18 vader kernel: [<c0121342>] bh_action [kernel] 0x22
(0xc0f57e34))
Dec  9 19:28:18 vader kernel: [<c0121256>] tasklet_hi_action [kernel]
0x46 (0xc0f57e38))
Dec  9 19:28:18 vader kernel: [<c010aa8f>] do_IRQ [kernel] 0xaf
(0xc0f57e5c))
Dec  9 19:28:18 vader kernel: [<c0116d60>] do_page_fault [kernel] 0x0
(0xc0f57e6c))
Dec  9 19:28:18 vader kernel: [<c0109490>] error_code [kernel] 0x34
(0xc0f57e74))
Dec  9 19:28:18 vader kernel: [<c015ba10>] force_delete [kernel] 0x0
(0xc0f57e98))
Dec  9 19:28:18 vader kernel: [<c015b9e3>] iput [kernel] 0x273
(0xc0f57eb4))
Dec  9 19:28:18 vader kernel: [<c0158e93>] dput [kernel] 0xd3
(0xc0f57ec0))
Dec  9 19:28:18 vader kernel: [<c0158e60>] dput [kernel] 0xa0
(0xc0f57ed0))
Dec  9 19:28:18 vader kernel: [<c014fd35>] path_release [kernel] 0x15
(0xc0f57ee4))
Dec  9 19:28:18 vader kernel: [<c01502bf>] link_path_walk [kernel]
0x2cf (0xc0f57ef0))
Dec  9 19:28:18 vader kernel: [<c0150889>] path_lookup [kernel] 0x39
(0xc0f57f30))
Dec  9 19:28:18 vader kernel: [<c0150cde>] open_namei [kernel] 0x7e
(0xc0f57f40))
Dec  9 19:28:18 vader kernel: [<c0143ee9>] filp_open [kernel] 0x49
(0xc0f57f70))
Dec  9 19:28:18 vader kernel: [<c01442a3>] sys_open [kernel] 0x53
(0xc0f57fa8))
Dec  9 19:28:18 vader kernel: [<c010939f>] system_call [kernel] 0x33
(0xc0f57fc0))
Dec  9 19:28:18 vader kernel:
Dec  9 19:28:18 vader kernel:
Dec  9 19:28:18 vader kernel: Code:  Bad EIP value.
Comment 3 Doug Oakes 2003-12-26 22:56:30 EST
New kernel (2.4.20-27.9) seems to have solved the problem

Note You need to log in before you can comment on or make changes to this bug.