17815 – system crash after long uptime

Bug 17815 - system crash after long uptime

Summary: system crash after long uptime

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Michael K. Johnson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-09-25 04:43 UTC by Chris Greer
Modified:	2008-05-01 15:37 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-12-15 01:55:53 UTC
Embargoed:

Attachments	(Terms of Use)

Description Chris Greer 2000-09-25 04:43:27 UTC

I have been running a stock RedHat 5.2 system for quite some time
now.  It has been reliable and solid.  I was using this machine
for my workstation and running some custom perl scripts
gathering information about the other machines on our network
and making the available via the web. 

About 2 months ago, the system started acting strange.  Any new
files created had the coorect timestamp and all date/time finctions
seemed normal except that ps showed every process starting January
18 (including the ps that was run to view the information).

Yesterday at exactly 500 days of uptime, the system crashed.  
Since it is the weekend, I have not had time to determine the
full extent of this, but the system is not responding to the
network in any way, and the Xwindows system is locked (including
the CTRL-ALT-BACKSPACE).  

Numlock and Capslock keys are not responding, the system is completely 
locked.  So far that is the extent of the damage.  I don't know
yet if file system coruption has occurred or not.

Comment 1 Chris Siebenmann 2000-10-03 00:41:34 UTC

 This is a known problem with the 2.0.* kernels, which are used in Redhat
5.2. The basic problem is that the internal kernel variable (jiffies) that
stores time since boot is only an unsigned 32-bit variable, and rolls over
back to zero around 497 days after a boot on i386 machines (where jiffies
are 1/100th of a second; the rollover happens faster on Alphas). This makes
various things doing interval arithmetic on jiffies unhappy ('wake me in
50 jiffies', for example, or 'if it has been more than 400 jiffies since
the last event, do X').

 Depending on the exact drivers and events happening around the rollover,
a given system may or may not be fine.

 I don't believe there's any real solution for 2.0.* series kernels. The
problem was fixed for the 2.2 series of kernels, but I believe it was a
fair amount of effort to find and fix all the code that needed to cope with
jiffie rollover.

Note You need to log in before you can comment on or make changes to this bug.