Bug 53257 - System hardlocks when using SMP kernel when reading from CD-ROM
Summary: System hardlocks when using SMP kernel when reading from CD-ROM
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-09-05 17:03 UTC by David Juran
Modified: 2007-04-18 16:36 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-12-01 20:44:15 UTC
Embargoed:


Attachments (Terms of Use)
My configuration for 2.4.9 (14.45 KB, patch)
2001-09-17 18:58 UTC, David Juran
no flags Details | Diff

Description David Juran 2001-09-05 17:03:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.3-12smp i686)

Description of problem:
My system is a dual PIII 1GHz. with the Asus CUV4X-D mainboard. When I try
to read a cd using cdparanoia from my SAMSUNG CD-ROM SC-148F cdrom. the
system hardlocks after a while and must be rebooted the hard way using the
power-switch )-:


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. modprobe ide-scsi
2.cdparanoia -d /dev/sg2 1-
3.Do some other stuff on the computer and brace yourself for the crash )-:
	

Actual Results:  Suddenly the computer locks up )-:

Additional info:

Do note that that although this is a IDE-cdrom, I use it through the
scsi-interface using the ide-scsi module.
Also note that If I run a uniprocessor kernel, everything works fine
(though slow).
I _have tried to give the boot patameters "noapic" and "ide=nodma" but the
system still crashes.
Also note that this crash might be hard to reproduse, since you might have
to read a coupple of CD:s before it happens. I've noticed that if you do
other stuff on the computer (like using netscape) it comes quicker. I'm not
sure how to capture relevant debugging info since my system is completely
locked after it has crashed, but if you have any ideas I'll be happy to be
of service...

Some more things that might be of interest...
[root@as18-1-3 hdb]# lspci -v
00:00.0 Host bridge: VIA Technologies, Inc. VT82C691 [Apollo PRO] (rev c4)
        Subsystem: Asustek Computer, Inc.: Unknown device 8038
        Flags: bus master, medium devsel, latency 0
        Memory at fc000000 (32-bit, prefetchable) [size=32M]
        Capabilities: [a0] AGP version 2.0
        Capabilities: [c0] Power Management version 2

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo
MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, medium devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Memory behind bridge: f7000000-f86fffff
        Prefetchable memory behind bridge: f9f00000-fbffffff
        Capabilities: [80] Power Management version 2

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 40)
        Subsystem: Asustek Computer, Inc.: Unknown device 8038
        Flags: bus master, stepping, medium devsel, latency 0
        Capabilities: [c0] Power Management version 2

00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
(prog-if 8a [Master SecP PriP])
        Flags: bus master, stepping, medium devsel, latency 32
        I/O ports at d800 [size=16]
        Capabilities: [c0] Power Management version 2

00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
(rev 40)
        Flags: medium devsel
        Capabilities: [68] Power Management version 2

00:09.0 Multimedia video controller: Brooktree Corporation Bt878 (rev 11)
        Subsystem: Hauppauge computer works Inc. WinTV/GO
        Flags: medium devsel, IRQ 3
        Memory at f9000000 (32-bit, prefetchable) [size=4K]
        Capabilities: [44] Vital Product Data
        Capabilities: [4c] Power Management version 2

00:09.1 Multimedia controller: Brooktree Corporation Bt878 (rev 11)
        Subsystem: Hauppauge computer works Inc. WinTV/GO
        Flags: bus master, medium devsel, latency 32, IRQ 3
        Memory at f8800000 (32-bit, prefetchable) [size=4K]
        Capabilities: [44] Vital Product Data
        Capabilities: [4c] Power Management version 2

00:0a.0 SCSI storage controller: Adaptec AIC-7881U
        Flags: bus master, medium devsel, latency 32, IRQ 4
        I/O ports at b800 [size=256]
        Memory at f6800000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at <unassigned> [disabled] [size=64K]

00:0b.0 Ethernet controller: 3Com Corporation 3c905 100BaseTX [Boomerang]
        Flags: bus master, medium devsel, latency 32, IRQ 10
        I/O ports at b400 [size=64]
        Expansion ROM at <unassigned> [disabled] [size=64K]

00:0d.0 Multimedia audio controller: Creative Labs SB Live! EMU10000 (rev
07)
        Subsystem: Creative Labs: Unknown device 8061
        Flags: bus master, medium devsel, latency 32, IRQ 3
        I/O ports at b000 [size=32]
        Capabilities: [dc] Power Management version 1

00:0d.1 Input device controller: Creative Labs SB Live! (rev 07)
        Subsystem: Creative Labs Gameport Joystick
        Flags: bus master, medium devsel, latency 32
        I/O ports at a800 [size=8]
        Capabilities: [dc] Power Management version 1

01:00.0 VGA compatible controller: nVidia Corporation Riva TnT2 [NV5] (rev
15) (prog-if 00 [VGA])
        Subsystem: Diamond Multimedia Systems Viper V770 Ultra
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 11
        Memory at f7000000 (32-bit, non-prefetchable) [size=16M]
        Memory at fa000000 (32-bit, prefetchable) [size=32M]
        Expansion ROM at f9ff0000 [disabled] [size=64K]
        Capabilities: [60] Power Management version 1
        Capabilities: [44] AGP version 2.0

Comment 1 Arjan van de Ven 2001-09-05 17:13:59 UTC
"CUV4X-D" is a known problematic case. Several people have reported that the
latest bios solves a lot of problems.
Could you also check if DMA is _really_ off when using ide-scsi ?
(eg by checking the last line in the /proc/ide/hdX/settings file, X is the ide
letter of the disk)


Comment 2 David Juran 2001-09-05 18:48:39 UTC
yes, it's seems to be off....

/home/david>lsmod
Module                  Size  Used by
ide-scsi                7840   0
autofs                  9504   1 (autoclean)
3c59x                  25088   1 (autoclean)
ipchains               32000   0 (unused)
raid0                   3088   2
aic7xxx               113840   5
sd_mod                 11040   5
scsi_mod               88864   3 [ide-scsi aic7xxx sd_mod]

[root@as18-1-3 hdb]# cat /proc/ide/hdb/settings 
name                    value           min             max             mode
----                    -----           ---             ---             ----
bios_cyl                0               0               1023            rw
bios_head               0               0               255             rw
bios_sect               0               0               63              rw
current_speed           34              0               69              rw
ide_scsi                0               0               1               rw
init_speed              12              0               69              rw
io_32bit                1               0               3               rw
keepsettings            0               0               1               rw
log                     0               0               1               rw
nice1                   1               0               1               rw
number                  1               0               3               rw
pio_mode                write-only      0               255             w
slow                    0               0               1               rw
transform               1               0               3               rw
unmaskirq               1               0               1               rw
using_dma               0               0               1               rw

Also, I guess I should have stated from the start...
/home/david>cat /proc/version
Linux version 2.4.3-12smp (root.redhat.com) (gcc version 2.96
20000731 (Red Hat Linux 7.1 2.96-85)) #1 SMP Fri Jun 8 14:38:50 EDT 2001

You say that the latest BIOS solves problems. Which is the latest Bios? I have
the 1010 Bios, but on
ftp://ftp.asuscom.de/pub/ASUSCOM/BIOS/Socket_370/VIA_Chipset/Apollo_Pro_133A/CUV4X-D/
there is a 1014 which seems to have beta-status. Have you heard of anybody who
have had success with the Beta Bios? (I'm feeling a bit reluctant to just try a
beta version of a bios without even having a changelog to go by...)

OT:
Does anybody know of any other dual-processor mainboard besides the Asus CUV4X-D
(in about the same pricerange) which is better supported under Linux?



Comment 3 Arjan van de Ven 2001-09-05 18:51:59 UTC
I'm quite sure that the guy wasn't using a beta bios.

To rule out a kernel bug; could you try the 2.4.7 kernel at
http://people.redhat.com/arjanv/testkernels/ ?
(you need to use --nodeps as it's the kernel used for the final beta stage of
the next Red Hat Linux kernel, and it requires mkinitrd for grub)

Comment 4 David Juran 2001-09-06 20:19:47 UTC
OK, I've tried the  2.4.7-2.19smp kernel (with the ide-nodma boot-parameter) and
it freezes just as before )-:
Also, while booting up both this and the old SMP kernels I get this message:

ENABLING IO-APIC IRQs
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-13, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23
not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 00178011
.......     : max redirection entries: 0017
.......     : IO APIC version: 0011
 WARNING: unexpected IO-APIC, please mail
          to linux-smp.org
.... register #02: 00000000
.......     : arbitration: 00

Could that have anything to do with it? Any clues of how to get some more
debugging data?

Comment 5 Need Real Name 2001-09-13 09:25:19 UTC
This sounds like a problem I'm having

Acer Altros 9100B dual Pent II 400Mhz with 256Mb RAM
All harddiscs and CDROM are attached to a DPT SmartRaid IV PM333UW RAID 
controller.

Using noapic to stop 2x clock speed problem

After booting up if I use GnoRPM to install from a CD the systems either stops 
responding or I get read errors but only in smp - single processor is fine.

By stop I mean you can switch screens but can't login or start new processes 
(top if started before will continue to run), can't shut server down.

Errors reported in /var/log/messages (only if system doesn't stop)

Sep 13 10:14:35 linux2 gnome-name-server[827]: starting
Sep 13 10:14:35 linux2 gnome-name-server[827]: name server starting
Sep 13 10:14:35 linux2 kernel: Attached scsi CD-ROM sr0 at scsi0, channel 0, id 
5, lun 0
Sep 13 10:14:35 linux2 kernel: sr0: scsi3-mmc drive: 14x/32x cd/rw xa/form2 
cdda tray
Sep 13 10:14:35 linux2 kernel: Uniform CD-ROM driver Revision: 3.12
Sep 13 10:14:35 linux2 insmod: Note: /etc/modules.conf is more recent 
than /lib/modules/2.4.3-12smp/modules.dep
Sep 13 10:15:17 linux2 insmod: Note: /etc/modules.conf is more recent 
than /lib/modules/2.4.3-12smp/modules.dep
Sep 13 10:16:43 linux2 kernel: EATA0: ihdlr, mbox 13, err 0x6:0, target 0.5:0, 
pid 0, reg 0x51, count 8973.
Sep 13 10:16:43 linux2 kernel: EATA0: abort, target 0.0:0, pid 0 inactive.
Sep 13 10:16:43 linux2 last message repeated 2 times
Sep 13 10:16:43 linux2 kernel: EATA0: abort, target 0.5:0, pid 0 inactive.
Sep 13 10:16:43 linux2 kernel: EATA0: abort, target 0.5:0, pid 0 inactive.
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 235968
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 236220
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 236472
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 236724
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 236976
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 216
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 237228
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 249212
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 249480
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 251260
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 251736
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 255856
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 264084
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 265376
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 272820
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 279784
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 281860
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 282600
Sep 13 10:16:43 linux2 kernel: Device 0b:00 not ready.
Sep 13 10:16:43 linux2 kernel:  I/O error: dev 0b:00, sector 216
Sep 13 10:16:53 linux2 kernel: VFS: busy inodes on changed media.


Comment 6 David Juran 2001-09-14 07:29:17 UTC
No it doesn't (-:
When I say my system locks, I mean it _really_ locks... Sounds to me that your
system is just busy accesiing the drive. Anyway, for me accessing the CD usually
works. That is until something (a race-condition?) occurs and everything hangs.
What is the "2x clock speed problem" you mention?

Comment 7 Need Real Name 2001-09-17 12:52:49 UTC
True, but same end result - have to switch server off to reboot.
I suspect the read errors are due to the inablility to start new processes.
There is definiatly no problems in single processor mode.

When we first installed we noticed the clock was running at twice the speed it 
should, tech. support suggested using noapic which cured that.

Comment 8 David Juran 2001-09-17 18:58:10 UTC
Created attachment 31898 [details]
My configuration for 2.4.9

Comment 9 David Juran 2001-09-17 19:00:15 UTC
Still not the same problem and most likely differnt causes... Stay of my bug,
get your own, There are plenty of them out there (-:
Well, serious again... the 2.4.9 kernel seems to have at least partially
remedied the problem. The performance is horrible.  I tried to write a CD using
cdrdao (I have the image on my IDE-drive) in my IDE CD-writer, while at the same
time reading the TOC of another cd in my IDE CD-ROM using cdrdao read-toc, but
the write-buffer kept on shrinking until I got nervous and had to turn the
reading off...
I'm attaching my kernel configuration

Comment 10 David Juran 2003-12-01 20:44:15 UTC
Since the CD-rom referred to in this bug is dead I can no longer
reproduce it, so I'm closing this bug


Note You need to log in before you can comment on or make changes to this bug.