Bug 159493 - Thinkpad hangs randomly using various FC3 and FC5 kernels
Thinkpad hangs randomly using various FC3 and FC5 kernels
Status: CLOSED CANTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-02 18:22 EDT by Derek Atkins
Modified: 2015-01-04 17:19 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-03-10 19:48:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
X Config (3.71 KB, text/plain)
2006-10-17 17:15 EDT, Derek Atkins
no flags Details
REPORTING_BUGS information (16.02 KB, text/plain)
2006-11-07 16:09 EST, adastra
no flags Details

  None (edit)
Description Derek Atkins 2005-06-02 18:22:34 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4

Description of problem:
I've got a Thinkpad T42p 2379-DYU running a fully updated FC3.  I was happily typing away into a gaim window when the machine hung.  It hangs like this often, and quite randomly.  Using mplayer a couple weeks ago I got it to hang about three times within an hour.

When it hangs everything stops working.  No mouse, no network, no SysReq, nothing.  It's dead.  All I can do is hard-boot.

As this is a laptop, and it's not reliably reproducible (but it IS reproducible, just not on demand), I've not yet attempted to run a serial console to see what's failing (if anything).

Nothing is printed in any of the logfiles when the hang occurs.

No, I do not use USB storage.
The system provides no warning prior to the hang.

This sounds (to me) like bug #156627, except it's happened with all FC3 kernels, not just 2.6.11-1.14.  Also, that bug asked others to created a new bug report, so I did.

Version-Release number of selected component (if applicable):
All FC3 kernels from 2.6.10-1.770 through 11-1.27

How reproducible:
Sometimes

Steps to Reproduce:
1. Work normally for anywhere from an hour to two weeks
2. System hangs
3.
  

Additional info:
Comment 1 Dave Jones 2005-07-15 14:21:10 EDT
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.
Comment 2 Derek Atkins 2005-07-23 10:13:47 EDT
I have updated to this kernel and unfortunately it has not corrected the
problem.  My thinkpad crashed overnight last night.  It crashed sometime after
3:01AM (the last cron.hourly entry in /var/log/cron was at 3:01:01 this morning).
Comment 3 Derek Atkins 2005-08-09 09:01:49 EDT
I've turned off ACPI (running 2.6.12-1.1372_FC3 with acpi=off) and we'll see if
this fixes the problem.  This was suggested in bug #158455.  Unfortunately it
can sometimes be multiple weeks between failures, so it's a bit hard to debug
this.  I'll see how long the system lasts this time (it lasted a good couple
weeks the last time before it hung yesterday).

FWIW, I haven't really ruled out a hardware problem, but ISTR that I didn't have
a problem when I first installed FC3 and it only started happening on later kernels.

I've left this in NEEDINFO_REPORTER on the theory that I'll still need to
respond in a week or two once my machine crashes (or shows no signs of
crashing).  I you have other ideas for me I'd be glad to test them!
Comment 4 Derek Atkins 2005-08-11 04:47:59 EDT
Nope, that didn't solve it.  The machine hung last night.  :(

Any more suggestions?
Comment 5 Dave Jones 2006-01-16 17:35:34 EST
This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.
Comment 6 Dave Jones 2006-02-03 00:37:29 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 7 John Thacker 2006-05-04 09:49:04 EDT
Closing per previous comment.
Comment 8 Derek Atkins 2006-06-15 09:32:08 EDT
Hi,

I just updated to FC5 and this problem is still happening..  However I think I
might have narrowed it down to the speedstep/cpufreq subsystem.  There appears
to either be some incompatibility with the speedstep implementation in the CPU,
or a bad set of CPUs that the cpuspeed code tickles.

I'm currently running with cpuspeed turned off for testing, and I'll report back
if it really seems to help (it has helped so far, but it's only been 15 hours).

I'm reopening this bug because it IS still in FC5..  I would have marked it
"NEEDINFO" because I do need to reply again whether disabling cpuspeed mitigates
the problem, but I couldn't do that.  I'll just have to remember to do that later.

Sorry for the late late reply.   I've also updated the summary and version
Comment 9 Dave Jones 2006-10-16 14:27:09 EDT
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.
Comment 10 Derek Atkins 2006-10-17 16:54:20 EDT
Nope, this doesn't seem to have fixed the problem.  I was running
2.6.18-1.2220.fc5 and the machine just crashed with a pretty red stripe through
the gnome taskbar at the top of the screen.  So, unfortunately this bug is still
there.  I'm not sure whether it's a software bug or a hardware bug..  And
unfortunately when it crashes/hangs while I'm sitting in X there's no log generated.
Comment 11 Dave Jones 2006-10-17 17:11:33 EDT
what video drivers is this using ?
Comment 12 Derek Atkins 2006-10-17 17:16:00 EDT
Created attachment 138730 [details]
X Config

Using radeon driver (see attached xorg.conf).

Here's the output from lsmod.  I'll point out that the crash still happens
without vmware, so the fact that vmware is loaded has been ruled out.

Module			Size  Used by
fuse		       44885  6
wlan_wep		7296  1
autofs4 	       21573  0
rfcomm		       37849  0
l2cap		       23873  5 rfcomm
bluetooth	       50085  4 rfcomm,l2cap
vmnet		       32044  13
vmmon		      175852  0
sunrpc		      153725  1
ipt_REJECT		5697  1
xt_state		2625  15
ip_conntrack	       52085  1 xt_state
nfnetlink		7513  1 ip_conntrack
xt_tcpudp		3521  17
iptable_filter		3392  1
ip_tables	       12937  1 iptable_filter
x_tables	       14405  4 ipt_REJECT,xt_state,xt_tcpudp,ip_tables
dm_mirror	       29073  0
dm_mod		       57433  1 dm_mirror
video		       17221  0
sbs		       16257  0
ibm_acpi	       27969  0
i2c_ec			5569  1 sbs
dock			8665  0
container		4801  0
button			7249  0
battery 	       10565  0
asus_acpi	       16857  0
ac			5701  0
ipv6		      246113  20
lp		       13065  0
parport_pc	       27493  1
parport 	       37001  2 lp,parport_pc
snd_intel8x0	       32605  1
snd_intel8x0m	       17357  0
snd_ac97_codec	       91360  2 snd_intel8x0,snd_intel8x0m
snd_ac97_bus		2753  1 snd_ac97_codec
snd_seq_dummy		4293  0
snd_seq_oss	       32705  0
snd_seq_midi_event	8001  1 snd_seq_oss
snd_seq 	       51633  5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
snd_seq_device		8781  3 snd_seq_dummy,snd_seq_oss,snd_seq
snd_pcm_oss	       42849  0
wlan_scan_sta	       13952  1
snd_mixer_oss	       16833  1 snd_pcm_oss
ath_pci 	       92836  0
snd_pcm 	       76485  4
snd_intel8x0,snd_intel8x0m,snd_ac97_codec,snd_pcm_oss
ath_rate_sample        14848  1 ath_pci
floppy		       57317  1
wlan		      186588  5 wlan_wep,wlan_scan_sta,ath_pci,ath_rate_sample
e1000		      119505  0
ehci_hcd	       31693  0
ath_hal 	      192208  3 ath_pci,ath_rate_sample
uhci_hcd	       23885  0
snd_timer	       23237  2 snd_seq,snd_pcm
snd		       52933  12
snd_intel8x0,snd_intel8x0m,snd_ac97_codec,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer

i2c_i801		8013  0
serio_raw		7493  0
soundcore	       10145  1 snd
i2c_core	       21697  2 i2c_ec,i2c_i801
ide_cd		       38625  2
snd_page_alloc	       10569  3 snd_intel8x0,snd_intel8x0m,snd_pcm
pcspkr			3521  0
cdrom		       34913  1 ide_cd
ext3		      129737  3
jbd		       58473  1 ext3
Comment 13 Dave Jones 2006-10-17 19:41:33 EDT
And atheros too ? That thing has been almost as notoriously bad for random
kernel memory corruption as the nvidia driver.
Comment 14 Derek Atkins 2006-10-17 21:35:57 EDT
Last time I tried (which granted hasn't been with 2.6.18) I was using the e1000
driver and it crashed.  If you really want I can reboot into a wired
configuration and wait...  But I have tried it (in 2.6.17) without atheros and
it still hung.
Comment 15 Dave Jones 2006-10-20 22:02:09 EDT
For the sake of this bug, it'll make things a lot easier to diagnose if you
don't load any of the part-binary drivers at all.  (Note, that even loading
them, even if they aren't in use still taints the kernel).

T42p's aren't exactly uncommon, so its somewhat unusual that only you seem to be
hitting this.  The lack of serial port on those is going to make capturing debug
info a bit tricky though.  Netconsole might be worth a try, just to see if we
get a backtrace when the hang occurs.

netconsole is fairly trivial to set up if you have a second machine you can log
to.  If you can't find working instructions on the internet, let me know and
I'll write up a quick recipe.
Comment 16 Derek Atkins 2006-10-20 22:08:08 EDT
I wish it were reliably reproducible.  Sometimes it'll hang multiple times in a
day.  Sometimes it'll go two weeks between hangs.  It does seem to happen more
often when I have the cpuspeed daemon installed and running.  But I'll try to
look at netconsole while I can (in about 3 weeks I'm going to be on the road for
two months).  (I'll try to keep this as NEEDINFO)
Comment 17 Derek Atkins 2006-10-22 21:25:05 EDT
Okay, it just happened again.  I had setup netconsole but there was nothing in
the remote logs.  :(
Comment 18 adastra 2006-11-07 16:09:24 EST
Created attachment 140602 [details]
REPORTING_BUGS information
Comment 19 petrosyan 2008-03-10 19:37:29 EDT
Fedora Core 5 is no longer maintained. Is this bug still present in Fedora 7 or
Fedora 8?
Comment 20 Derek Atkins 2008-03-10 19:43:29 EDT
I'm afraid I no longer have that piece of hardware so I don't know.

Note You need to log in before you can comment on or make changes to this bug.