Bug 249652

Summary: System randomly freezes after kernel 2.6.22.1-27.fc7 update
Product: [Fedora] Fedora Reporter: Rafał Polak <rafpolak>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 7CC: chris.brown, mail, redhat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.22.4-65.fc7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-21 08:12:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rafał Polak 2007-07-26 07:11:19 UTC
Description of problem:
After recent kernel update (2.6.22.1-27.fc7) system freezes randomly without
reason. It is hard freeze and I can only reboot.

Version-Release number of selected component (if applicable):
Kernel 2.6.22.1-27.fc7

How reproducible:
Can't reproduce it, it happens random.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
System freezes.

Expected results:
System doesn't freeze.

Additional info:
There is nothing useful in logs before freeze so I don't know what kind of
information I could provide.

Comment 1 Richard Körber 2007-07-30 21:07:03 UTC
I can confirm this bug. With kernels 2.6.22.1-27 and 2.6.22.1-33, my system is
frozen every time within five minutes. I can only do a hardware reset. There is
no usable output in the log files.

I have tried the nohz=off kernel option once, and the system was running a
little longer, maybe half an hour, but then froze again.

The last stable kernel was 2.6.21-1.3228.

Smolt profile:
http://smolt.fedoraproject.org/show?UUID=f57fd3b1-bc30-4ab8-b0d0-70ca1a4cff07

Comment 2 Tino Didriksen 2007-07-31 17:10:16 UTC
I have also experienced random freezes under kernel 2.6.22.1-27.fc7. I'd like to
think it related to high I/O since it usually went down during heavier cron
jobs, but am not certain about it. No messages in any logs and since it's a
remote machine I have no way to get serial or screen output.

Reverted to kernel 2.6.20-1.2962.fc6 (yes, fc6) a few days ago which so far has
not crashed.

Smolt: http://smolt.fedoraproject.org/show?UUID=6e55d75b-36da-47a5-8de8-c9aa233a743c

Comment 3 Richard Körber 2007-07-31 20:31:35 UTC
It's not load dependend on my system. It even freezes when it is completely idle.

Comment 4 Rafał Polak 2007-08-14 11:15:43 UTC
It may be a NVidia proprietary driver problem. When I install it, my system has
mentioned problems, and even after uninstalling driver, system still freezes (so
it might be that NVidia driver from Livna repo changes some crucial libs too,
I'm just guessing). I did a fresh installation, I am not using NVidia
proprietary driver anymore and my system doesn't freeze now. OTOH it is really
hard for me to say who is "guilty" here, Fedora or NVidia driver or Livna
package so I'm not sure whether I should mark this bug as NOTABUG or
WONTFIX/CANTFIX. I will leave it as it is.

Comment 5 Tino Didriksen 2007-08-14 12:34:06 UTC
This bug is not nVidia related. My example is a dedicated headless server with
an ATI Rage XL card and no custom drivers.

The only thing I had to change to make it stop freezing was the kernel.
Everything else remains exactly the same. I have not tried the newer
2.6.22.1-41.fc7 kernel yet, and won't till I have to reboot for some other reason.

Comment 6 Christopher Brown 2007-09-20 14:50:45 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel? You may wish to try some
of the following in helping diagnose the problem:

    * If it's repeatable, hooking up a serial cable to a second box can be
useful for capturing kernel messages that may get printed just before the
lockup. Configure the machine being debugged to boot with console=ttyS0,115200
console=tty0 and run a terminal program such as minicom on the other end.
Configure the remote end to talk at the same baud rate (115200). (In minicom
ctrl-a, p, i, enter. More info on setting up a serial terminal can be found at
http://searchenterpriselinux.techtarget.com/tip/0,289483,sid39_gci1118136,00.html
    * Sometimes just getting lsmod output from users can yield enough clues if
there are multiple reports and common modules between both. (It also allows to
filter out reports from users of nvidia,vmware etc).
    * Hooking up serial console / netconsole can sometimes get debug info out of
the machine.
    * If the hang happened whilst in X, the machine may still respond to ssh
logins from other machines. Try this to get a dmesg.
    * The magic sysrq key might work. Enable it with sysctl kernel.sysrq=1 (or
put kernel.sysrq = 1 in your /etc/sysctl.conf). This will allow you to hit
ctrl-alt-sysrq and various keys to get debugging info. 

m will dump information about the current state of memory 
t will dump the state of every task the kernel knows about
s will sync all data pending writeback to disk. (This is useful so that this
debug info actually stands a chance of hitting the log files.)

    * You can also trigger magic sysrq functions by echo'ing the relevant one
letter command to /proc/sysrq-trigger
    * booting with nmi_watchdog=2 may cause a backtrace to occur when the lockup
happens.

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 7 Tino Didriksen 2007-09-20 23:50:55 UTC
I am currently using kernel 2.6.22.4-65.fc7 which has been nicely stable since I
updated. I consider the problem solved, whatever it was...

For reference:
So far only kernel I personally know to be bad was 2.6.22.1-27.fc7, and even
then only on some machines. It'd hang daily on the servers, but has been stable
for months now on a desktop machine.

Machines 2.6.22.1-27.fc7 would daily hang on:
http://smolt.fedoraproject.org/show?UUID=22389d7f-e24a-474c-ae79-c3904112486a
http://smolt.fedoraproject.org/show?UUID=6e55d75b-36da-47a5-8de8-c9aa233a743c

Machine 2.6.22.1-27.fc7 was stable on:
http://smolt.fedoraproject.org/show?UUID=694597f3-b14b-41c0-bf56-535d9f69280f

Comment 8 Christopher Brown 2007-09-21 08:12:18 UTC
Okay, thanks for the update Tino, I'm closing this bug as suggested then.

Cheers
Chris