Bug 51696 - System locks up after 38 days, 16 hours
Summary: System locks up after 38 days, 16 hours
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-08-13 23:15 UTC by Clarence Donath
Modified: 2007-04-18 16:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-08-15 14:39:59 UTC
Embargoed:


Attachments (Terms of Use)

Description Clarence Donath 2001-08-13 23:15:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)

Description of problem:
Every 38 days and 16 hours the system locks up tight.  No services are 
running, the console screen saver does not turn off.  Only fix is to 
reboot.

uname -a
Linux m5.do-box.net 2.4.2-2smp #1 SMP Sun Apr 8 20:21:34 EDT 2001 i686 
unknown

Running on dual-processor Micronics Pentium Pro 200

A friend is experiencing the same crash...

uname -a
Linux Linux233 2.4.2-2 #1 Sun Apr 8 19:37:14 EDT 2001 i586 unknown

Running on AMD233MMX K6


How reproducible:
Always

Steps to Reproduce:
1. Simply keep the server running for 38 days, 16 hours
2.
3.
	

Actual Results:  System locked

Expected Results:  Indefinite uptime ;-)

Additional info:

There is nothing reported in /var/log/messages.  It only shows a gap in 
time between the crash and the restart.

Comment 1 Alan Cox 2001-08-14 20:14:35 UTC
Thats utterly utterly bizarre. 38 days 24 isnt any clock or timer we use that I
know of.

No idea right now.


Comment 2 Clarence Donath 2001-08-14 20:25:07 UTC
Some additional information from my friend who experienced the same crash at the
same time as I...


I am in Pompey England.  My above box locked up Sunday night (just froze, ala
M$, lost all processes, gateway etc.) without any intervention.  Although I do
not keep a running total on uptime, netcraft reports the last record as 38.17
days.

My mate is in R.I. USA.  He is also running R7.1.  Yesterday, his box locked up
exactly the same as mine.  Uptime 38 days 16 hours - this is the second time
his has crashed at 38 days (this was the first time my box had hit 38 days).

Both machines have two NIC's (mine is ADSL gateway, his Cable).  Neither
machine has any information in the logs... just the last proper entries prior
to the lock-up.  Will both use the Red Hat 'up2date' programme to keep our
systems current.  Niether of us have had Linux (any flavour) lock up like this
without one of use messing and breaking something.

Neither of us run any 'un-toward' processes... just the usual services.

Nick Warne
nw

Comment 3 Clarence Donath 2001-08-14 20:47:01 UTC
Here is a list of services running on my server...
Linux m5.do-box.net 2.4.2-2smp #1 SMP Sun Apr 8 20:21:34 EDT 2001 i686 unknown

amd is stopped
anacron dead but subsys locked
arpwatch is stopped
atalkd (pid 1079) is running...
atd is stopped
Configured Mount Points:
------------------------
/usr/sbin/automount --timeout 60 /misc file /etc/auto.misc  

Active Mount Points:
--------------------
/usr/sbin/automount --timeout 60 /misc file /etc/auto.misc
crond (pid 6770 1002) is running...
Not starting gated:  [60G[  OK  ]
gpm (pid 974) is running...
httpd (pid 7265 7261 7250 7243 6613 5648 990) is running...
identd (pid 857 855 854 853 852) is running...
Chain input (policy ACCEPT):
<SNIP>
Chain forward (policy DENY):
<SNIP>
Chain output (policy ACCEPT):
<SNIP>
ircd (pid 1133) is running...
No status available for this package
lpd is stopped
nwserv is stopped
nwbind is stopped
ncpserv is stopped
rndc: connect: connection refused
Configured devices:
lo eth0 eth1 ppp0
Devices that are down:

Devices with modified configuration:

rpc.mountd (pid 916) is running...
nfsd (pid 928 927 926 925 924 923 922 921) is running...
rpc.rquotad (pid 911) is running...
rpc.statd (pid 718) is running...
nscd is stopped
ntpd is stopped
Port Manger (portmgr) is running
Power Alert Server (paserver) is running
portmap (pid 703) is running...
The random data source exists
rhnsd (pid 1148) is running...
rpc.rstatd (pid 942) is running...
rpc.rusersd is stopped
rpc.rwalld is stopped
rwhod is stopped
sendmail (pid 961) is running...
smbd (pid 1247 1103) is running...
nmbd (pid 1108) is running...
snmpd is stopped
squid (pid 1029 1028) is running...
sshd (pid 2269 868) is running...
syslogd (pid 684) is running...
klogd (pid 689) is running...
tux is stopped
xfs (pid 1063) is running...
xinetd (pid 888) is running...
ypbind is stopped
rpc.yppasswdd is stopped
ypserv is stopped


Comment 4 Alan Cox 2001-08-14 20:50:30 UTC
Are both boxes running appletalk ?


Comment 5 Clarence Donath 2001-08-14 21:00:33 UTC
No, just mine is running atalk.  I experienced the 38 day lockup before I had
netatalk, ircd, mgetty and Poweralert installed and running, and before I
installed the second eth interface.

I shall get a list of Nick's services forthcoming.


Comment 6 Arjan van de Ven 2001-08-14 21:04:22 UTC
Could you also try to find out which modules (see lsmod) you guys have in
common? That might narrow the suspects down a lot

Comment 7 Clarence Donath 2001-08-15 14:35:25 UTC
Here is the lsmod output on my machine...
Linux m5.do-box.net 2.4.2-2smp #1 SMP Sun Apr 8 20:21:34 EDT 2001 i686 unknown

Module                  Size  Used by
appletalk              23792  12 
nfsd                   70976   8  (autoclean)
lockd                  53232   1  (autoclean) [nfsd]
sunrpc                 66352   1  (autoclean) [nfsd lockd]
autofs                 11808   1  (autoclean)
tulip                  39152   2  (autoclean)
ipchains               41632   0  (unused)
aic7xxx               136336   3 
sd_mod                 11744   3 
scsi_mod               98624   2  [aic7xxx sd_mod]


Comment 8 Nick Warne 2001-08-15 14:39:54 UTC
OK, here is my stats/conf.

Linux Linux233 2.4.2-2 #1 Sun Apr 8 19:37:14 EDT 2001 i586 unknown

lsmod
=====

Module                  Size  Used by
nfs                    76800   2  (autoclean)
lockd                  52336   1  (autoclean) [nfs]
sunrpc                 62448   1  (autoclean) [nfs lockd]
autofs                 11136   1  (autoclean)
8139too                16480   2  (autoclean)
ipchains               38944   0  (unused)
mousedev                4160   1 
hid                    11808   0  (unused)
input                   3456   0  [mousedev hid]
usb-uhci               20848   0  (unused)
usbcore                49632   1  [hid usb-uhci]


mounts
======

/dev/hda2 on / type ext2 (rw)
none on /proc type proc (rw)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
/dev/hda3 on /home type ext2 (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
automount(pid802) on /misc type autofs (rw,fd=5,pgrp=802,minproto=2,maxproto=3)
486Linux:/home/httpd/ on /var/www type nfs
486Linux_2:/home/nick/hdb3/mp3 on /var/www/html/noxster type nfs


Service status
==============

anacron dead but subsys locked
apmd (pid 755) is running...
atd (pid 817) is running...
Configured Mount Points:
------------------------
/usr/sbin/automount --timeout 60 /misc file /etc/auto.misc  

Active Mount Points:
--------------------
/usr/sbin/automount --timeout 60 /misc file /etc/auto.misc
crond (pid 2125 940) is running...
gpm (pid 912) is running...
httpd (pid 1839 1545 1444 1369 1039 1038 1034 1033 1032 928) is running...
identd is stopped
Chain input (policy ACCEPT):
**
Chain forward (policy DENY):
**
Chain output (policy ACCEPT):
**
ircd (pid 1018) is running...
No status available for this package
lpd (pid 867) is running...
mysqld is stopped
Active NFS mountpoints: 
/var/www
/var/www/html/noxster
/var/www
/var/www/html/noxster
Configured devices:
lo eth0 eth1
Devices that are down:

Devices with modified configuration:

rpc.statd (pid 671) is running...
nscd is stopped
ntpd is stopped
portmap (pid 656) is running...
The random data source exists
rhnsd (pid 1037) is running...
rwhod is stopped
sendmail (pid 899) is running...
smbd (pid 1954 988) is running...
nmbd (pid 993) is running...
sshd (pid 1957 829) is running...
syslogd (pid 637) is running...
klogd (pid 642) is running...
tux is stopped
xfs (pid 976) is running...
xinetd (pid 849) is running...
ypbind is stopped

==================================================

That's it really.  Everything just appears to run normal...

Nick


Comment 9 Clarence Donath 2001-10-19 13:07:17 UTC
The 38-day crash did not recurr on mine or Nick's machines.

We have been updating our systems when patches come available on Red Hat
Network.

I'm closing this bug out.

Regards,
Clarence


Note You need to log in before you can comment on or make changes to this bug.