Bug 48763 - Kernel crash with no apparent reason
Kernel crash with no apparent reason
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-07-11 10:15 EDT by Need Real Name
Modified: 2007-04-18 12:34 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-06-06 10:34:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2001-07-11 10:15:37 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2 i686)

Description of problem:
Random crashes, better freeze, of the machine ONLY with kernel 2.4.3-12
(all 
RH updates installed up to today). 
NO crashes with 2.4.2-2 !

Crashes happen mostly when X is running and other programs like Netscape,
Mozilla, 
Staroffice, Emacs... Seems also that network activity and in particular DNS
resolution
helps to crash. But I got some crashes in runlevel 3 when checking password
authentication -> no login at all.

How reproducible:
Sometimes

Steps to Reproduce:
1. wait a little bit and do things....
2.
3.
	

Actual Results:  first some processes gets in uninterruptable status (D),
then some (X)windows freeze,
most often the CPU/MEM usage applet in Gnome crashes, then ... lockup. Only
way out is RESET. Nothing in logs except for lines like 

Jul 11 13:29:35 berenice sendmail[905]: rejecting connections on daemon
MTA:load average: 14

but they do not appear always (load average goes way up to 20 o 30).

Expected Results:  work normally

Additional info:

Machine has 2 NIC
eth0: 3Com PCI 3c905C Tornado at 0xdc00,  00:01:...., IRQ 10
eth1: D-Link DFE-538TX (RealTek RTL8139) at 0xc88a5000, 00:50:...., IRQ 11

and 2 disks in software RAID1 
 hda: FUJITSU MPF3204AT, ATA DISK drive
 hdc: _NEC NR-7500A, ATAPI CD/DVD-ROM drive
 hdd: FUJITSU MPF3204AT, ATA DISK drive
 
other info:
CPU: Intel Pentium III (Coppermine) 
 Detected 799.549 MHz processor
 Memory: 126216k/131072k available 

I am back running with kernel 2.4.2-2 and everything seems to be ok, as it
has always been

Thanks for any help, Andrea.
Comment 1 Need Real Name 2001-08-26 14:50:50 EDT
One+1/2 month later I got not a single comment about this one. Have no time
right now to dig in kernel source or try kernel 2.4.9 or similar.  
kernel 2.4.2-2 working fine with me at the moment but I will need to upgrade
soon...

My comment is that out of a few tests it seems that the problem has to do with
swapping. Once the box starts swapping a little bit, something goes wrong 
and cannot recover. Actually it can run for a little time still, but then all of
a 
sudden it freezes. Could be that it is an already known bug of this
kernel, no time to check right now. Sorry folks.

Andrea (RHCE, Cisco CCDA etc.)
Comment 2 Arjan van de Ven 2001-08-26 14:57:18 EDT
This is not known behavior. We do know that this kernel needs more swapspace
than ram, and the releasenotes and the installer mention this; the installer
even makes a swapfile to fix this.

Kernel 2.4.7-2 (in rawhide) has the swap>ram problem fixed, and is otherwise
very stable. (It has survived 2 public betatests without serious crasher bugs in
core code)
Comment 3 Need Real Name 2001-08-27 06:02:35 EDT
No problem with swap, 128MB RAM and 512MB swap (waiting for upgrade to 256MB
RAM).

I'll try as soon as I will have time the new kernel (is it safe to put the rpms
of the  2.4.7-2 from
rawhide directly in a 7.1 installation? the machine is running some company's 
important jobs, cannot be down for more than a couple of hours).

Anyway, my tests show that it is something relating to swapping in my opinion.
What I did is to run
50 short processes at the same time, if the machine started with empty swap and
RAM used 50%, 
I got the machine to 90% of RAM, average load 5 or 6 (as reported by xosview for
example)
and no big slow down in usual activities (just a few seconds more to open a ssh
connection at the
same time as the 50 processes). Same behaviour with kernel 2.4.2-2 and 2.4.3-12
The 50 processes concluded in about 1 minute.

Starting instead with full RAM (runlevel 5 and open netscape and something else
is enough) and 
approx 30MB swap used but nothing else running. Now with kernel 2.4.2-2 machine
s
lows down quite a bit (at moments
the mouse and keyboard are dead), disks run madly, swap increases by 50MB (total
used swap
approx 80MB of 512MB), average load 
goes up to approx 50, but after 6 or 7 minutes the processes are finished and
the
machine (SWAP included) go back to the previous state. Same tests with kernel
2.4.3-12 I get
a freeze as soon as the processes kick in and the swap and load average start to
grow.

Decreasing the number of processes to 15, kernel 2.4.2-2 runs with no problem,
kernel 
2.4.3-12  slows down a lot, and sometimes freezes other times manages to get to
the end 
of the jobs but it is quite unstable and at any moment after that can freeze. 
In my tests it has survived at most 10 minutes, no more.

My tests have been a little bit brutal but not too far from the actual use of
the machine, in the
sense that I have 50 processes starting all at the same time, and soon I will
have 100.

Last comment is that I have 2 disk in software RAID1, swap included. Anyway the
machine 
run the same programs etc. with RedHat 6.2 and 7.0 and never had a problem,
crash or anything.

Andrea
Comment 4 Need Real Name 2001-10-22 06:44:58 EDT
Just installed 2.4.9-6-i686, no problems at all. 
The problem is ONLY with kernel 2.4.3-12. 
Thus, if you agree, we can close this totally useless bug report.

Andrea
Comment 5 Arjan van de Ven 2001-10-22 06:53:46 EDT
I'm glad 2.4.9-6 has it fixed; we did a lot of work getting the VM in better shape.

Note You need to log in before you can comment on or make changes to this bug.