Bug 171099 - System crash , NO logs , NO messages
Summary: System crash , NO logs , NO messages
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 4
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jeff Moyer
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-10-18 08:29 UTC by Husam Afghani
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-05-05 14:58:02 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Husam Afghani 2005-10-18 08:29:12 UTC
Description of problem:

Dear Fedora Support;
i am running a clean FC3 installation (server) on Dell PE2600 server hardware 
with PERC/DI RAID controller , 2G Memory , DDS5 tape drive , Xeon 2.2G/512K 
processor.
the system installed with 2.6.9-1.667 kernel , worked fine , stable for about 
three month's of heavy load environment , untill i did an update we were fine .

i did an update and the 2.6.10-1.770_FC3 is installed , and started the 
problems , the system sudenly died no messeges , no log's ,no keyboard inputs. 
(just tun it off and on again ) , the system works fine for unspesified 
period , may be one day , or two , max time was a week .

i started to debug when the system crashes ??? , first we think about tar , 
but the system crashed during the day , no backup was running so we excluded 
tar from the problem . then we start to check hard drives , started to do 
large data transfers which take long time , read and write to disks . 

the result was that huge IO traffic caused the system to crash !!!!! ( Full 
backup , Heavy IO traffic) .

during the debugging new kernel released (2.6.12-1.1378_FC3) we upgraded to 
it , but still the same problem !!!!

lately we returned to the 2.6.9-1.667 , do traffic test , its more stable , 
but unfortionatly it seem we have new problems due to new packages are 
installed on the system .

i searched the bugs database for similar problem , their was a similar ones 
but with different environment , systems , so i preffered to send this 
messages .


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.do a huge traffic on any IO ( Tape , Disks ) 
2.
3.
  
Actual results:
system is died 
no ping
no console messages
no logs 
no screen

just turn off and on again.

Expected results:
expected the system to be up and stable

Additional info:
we are planning to upgrade to Fedora Core 4 as soon as we can , i hope new 
kernel will solve the problem , will check for the latest bios and firmware 
for th system.

Comment 1 Jeff Moyer 2005-10-28 02:08:14 UTC
If the kernel simply hangs, you may try enabling the NMI watchdog.  You can do
this with the kernel command line parameter "nmi_watchdog=1".  In order to see
the output, you will either need a serial console, or the system must be at a
console prompt (not X) when the hang occurs.  A serial console or netconsole is
preferred.

Is this something you can test?

Comment 2 Husam Afghani 2005-10-28 07:50:37 UTC
its not hanging , its dying ( no console , no thing you can do , just reset 
from power).

(In reply to comment #1)
> If the kernel simply hangs, you may try enabling the NMI watchdog.  You can 
do
> this with the kernel command line parameter "nmi_watchdog=1".  In order to 
see
> the output, you will either need a serial console, or the system must be at a
> console prompt (not X) when the hang occurs.  A serial console or netconsole 
is
> preferred.
> Is this something you can test?



Comment 3 Jeff Moyer 2005-10-28 13:20:22 UTC
What I mean by a hang is that the system is completely unresponsive.  This can
happen for a number of reasons.  If you enable the nmi_watchdog, we may be able
to get some debug output.

Please try it.

Comment 4 Ionut Nistor 2005-11-14 09:18:08 UTC
Hello,

I am experiencing the same problem. FC3 worked properly for many months. Last 
week I updated the system (yum update).

The yum log after the update:
Nov 08 12:26:26 Updated: glibc-common.i386 2.3.6-0.fc3.1
Nov 08 12:26:42 Updated: glibc.i686 2.3.6-0.fc3.1
Nov 08 12:26:50 Updated: ethereal.i386 0.10.13-1.FC3.1
Nov 08 12:26:53 Updated: pam.i386 0.77-66.2.13
Nov 08 12:26:55 Updated: lm_sensors.i386 2.8.7-2.FC3.1
Nov 08 12:26:55 Updated: nscd.i386 2.3.6-0.fc3.1
Nov 08 12:27:47 Installed: kernel.i686 2.6.12-1.1381_FC3

Previously, I was running:
kernel - 2.6.12-1.1378_FC3
glibc/glibc-common - 2.3.5-0.fc3.1
nscd - 2.3.5-0.fc3.1
pam - 0.77-66.2

I uninstalled 'lm_sensors' but the problem still persists.

The system configuration is:

[root@prod2 ~]# cat /proc/pci
PCI devices found:
  Bus  0, device   0, function  0:
    Class 0600: PCI device 8086:2530 (rev 2).
      Prefetchable 32 bit memory at 0xe0000000 [0xe1ffffff].
  Bus  0, device   1, function  0:
    Class 0604: PCI device 8086:2532 (rev 2).
      Master Capable.  Latency=64.  Min Gnt=6.
  Bus  0, device  30, function  0:
    Class 0604: PCI device 8086:244e (rev 4).
      Master Capable.  No bursts.  Min Gnt=14.
  Bus  0, device  31, function  0:
    Class 0601: PCI device 8086:2440 (rev 4).
  Bus  0, device  31, function  1:
    Class 0101: PCI device 8086:244b (rev 4).
      I/O at 0xf000 [0xf00f].
  Bus  0, device  31, function  2:
    Class 0c03: PCI device 8086:2442 (rev 4).
      IRQ 11.
      I/O at 0xd000 [0xd01f].
  Bus  0, device  31, function  3:
    Class 0c05: PCI device 8086:2443 (rev 4).
      IRQ 12.
      I/O at 0x5000 [0x500f].
  Bus  0, device  31, function  4:
    Class 0c03: PCI device 8086:2444 (rev 4).
      IRQ 9.
      I/O at 0xd800 [0xd81f].
  Bus  0, device  31, function  5:
    Class 0401: PCI device 8086:2445 (rev 4).
      IRQ 12.
      I/O at 0xdc00 [0xdcff].
      I/O at 0xe000 [0xe03f].
  Bus  2, device   0, function  0:
    Class 0200: PCI device 1113:1211 (rev 16).
      IRQ 11.
      Master Capable.  Latency=32.  Min Gnt=32.Max Lat=64.
      I/O at 0xc000 [0xc0ff].
      Non-prefetchable 32 bit memory at 0xe3800000 [0xe38000ff].
  Bus  2, device   1, function  0:
    Class 0401: PCI device 1274:1371 (rev 8).
      IRQ 12.
      Master Capable.  Latency=32.  Min Gnt=12.Max Lat=128.
      I/O at 0xc400 [0xc43f].
  Bus  2, device   4, function  0:
    Class 0300: PCI device 5333:8811 (rev 0).
      IRQ 11.
      Non-prefetchable 32 bit memory at 0xe3000000 [0xe37fffff].


[root@prod2 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 0
model name      : Intel(R) Pentium(R) 4 CPU 1500MHz
stepping        : 7
cpu MHz         : 1495.316
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 2957.31

[root@prod2 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hdc1[1] hda1[0]
      38547840 blocks [2/2] [UU]

unused devices: <none>

Some help would be appreciated.

Thank you,
Ionut Nistor

Comment 5 Ionut Nistor 2005-11-14 09:31:38 UTC
Update: 
I enabled nmi_watchdog:
-----
title Fedora Core (2.6.12-1.1381_FC3)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.12-1.1381_FC3 ro root=/dev/md0 nmi_watchdog=1
        initrd /boot/initrd-2.6.12-1.1381_FC3.img
-----

I got the system to hang but the console (video, not serial) was frozen - 
nothing was displayed.

Comment 6 Dave Jones 2006-01-16 22:07:49 UTC
This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.


Comment 7 Dave Jones 2006-02-03 06:23:24 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 8 John Thacker 2006-05-05 14:58:02 UTC
Closing per last comment.


Note You need to log in before you can comment on or make changes to this bug.