Bug 440180 - random crashes all 2.6.24 kernels, 2.6.23 stable
Summary: random crashes all 2.6.24 kernels, 2.6.23 stable
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 8
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-02 02:57 UTC by Alex Eskin
Modified: 2008-08-19 18:34 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-19 18:34:18 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output of dmesg (120.88 KB, text/plain)
2008-04-02 02:57 UTC, Alex Eskin
no flags Details
dmesg output from kernel-debug-2.6.24.4-64.fc8 (42.16 KB, text/plain)
2008-04-02 23:35 UTC, Alex Eskin
no flags Details
/var/log/messages excrept from crash with current linus tree (825.18 KB, text/plain)
2008-04-05 21:19 UTC, Alex Eskin
no flags Details
.config from stable kernel (from comment #13) (79.66 KB, application/octet-stream)
2008-04-19 02:05 UTC, Alex Eskin
no flags Details

Description Alex Eskin 2008-04-02 02:57:50 UTC
Description of problem:

I see random hangs when running any 2.6.24 kernel. But if
I run 2.6.23.9-85, the system is stable. I managed to 
capture the output of dmesg after one crash, and it
seems to have an oops. 


Version-Release number of selected component (if applicable):
kernel-2.6.24.3-50.fc8

How reproducible:
Intermittent



Steps to Reproduce:
1. Boot with kernel 2.6.24-50.fc8
2. Use the system for a while (streaming video may make problem
   occur more quickly, but it may be an illusion)
3.
  
Actual results:
System freezes, or X crashes. Attached output of dmesg after
such a crash

Expected results:
It should be at least as stable as 2.6.23

Additional info:

Comment 1 Alex Eskin 2008-04-02 02:57:50 UTC
Created attachment 300007 [details]
output of dmesg

Comment 2 Alex Eskin 2008-04-02 23:35:39 UTC
Created attachment 300152 [details]
dmesg output from kernel-debug-2.6.24.4-64.fc8

Another crash log

Comment 3 Alex Eskin 2008-04-02 23:39:33 UTC
Problem is now fairly reproducible (watching 
streaming video causes crash within 20 minutes).
Hardware is Thinkpad T41, with radeonfb module in the initrd. 
memtest86+ has been run recently, with no problems reported.

Any other suggestions on how to debug this?


Comment 4 Chuck Ebbert 2008-04-03 02:24:59 UTC
(In reply to comment #3)
> Problem is now fairly reproducible (watching 
> streaming video causes crash within 20 minutes).

Watching streaming video using what network adapter?


Comment 5 Alex Eskin 2008-04-03 02:58:06 UTC
I was using the wifi  (ath5k driver) on kernel-debug-2.6.24.4
to get the crash. 

I guess I could try watching streaming video using 
a wired connection. Will report on this tomorrow.

Right now I am using the ath5k driver on kernel-2.6.23.9-85
and it is completely stable. 


Comment 6 Alex Eskin 2008-04-03 12:47:44 UTC
Four hours of streaming video using the wired network on kernel-debug-2.6.24.4
did not produce a crash. So perhaps the problem is some change in ath5k
or mac80211 after 2.6.23.9-85. Any suggestions on how to track it down further?


Comment 7 Chuck Ebbert 2008-04-04 02:11:38 UTC
Hmm, another report that ath5k is probably scribbling on memory.

Comment 8 Alex Eskin 2008-04-04 20:02:25 UTC
Is there a way to do the analogue of git bisect on fedora kernels? 

I could try bisecting the mainline kernels if I could figure out which 
mainline kernel corresponds to the versions of ath5k and mac80211 
in fedora kernel 2.6.23.9-85. I can trigger the crash within half an
hour or so, so bisection is not out of the question.


Comment 9 John W. Linville 2008-04-04 21:07:19 UTC
2.6.23.9-85.fc8 was built in early December, well before 2.6.24 was available.  
Yet, it probably contained wireless bits that will only be in 2.6.25 and 
beyond.

If you wouldn't mind, it might be helpful if you could determine whether or 
not the issue is observable in the current linux-2.6 kernel from Linus.  If 
so, you could bisect between 2.6.24 and the current HEAD.

If the problem is not observable in Linus' kernel, then you could bisect the 
wireless-testing tree.  If Linus' kernel is not affected, then the problem 
should occur between the merge-2008-04-01 tag and the master-2008-04-01.  
(FWIW, merge-2008-04-01 tag is equivalent to 2.6.25-rc8 and master-2008-04-01 
has all the wireless patches queued for 2.6.26 as of 2008-04-01.)

Thanks for offering the bisection service! :-)

Comment 10 Alex Eskin 2008-04-05 21:17:34 UTC
I tried the current linus kernel, but I am not sure how to interpret
the results. I was able to stream video for a few hours, but eventually
I got a crash. This time, I had a whole bunch of "assertion failed"
notes in /var/log/messages (which are attached). 

So I am not sure if this crash is the same issue as what I had before,
and I do not know what the next step should be. Any suggestions?




Comment 11 Alex Eskin 2008-04-05 21:19:40 UTC
Created attachment 301391 [details]
/var/log/messages excrept from crash with current linus tree

Comment 12 John W. Linville 2008-04-07 12:26:42 UTC
That looks like a different issue to me.  Perhaps you could try bisecting the 
wireless-testing tree as described in comment 9?

Comment 13 Alex Eskin 2008-04-07 15:09:38 UTC
I have been running the head of wireless-testing for two days now with no problems.
So at the moment it looks like the issue either got fixed or got masked by something
(or perhaps was not as reproducible as I thought). 

So I guess there is not much sense in bisecting, but I will give another report
in a week.



Comment 14 John W. Linville 2008-04-16 13:03:41 UTC
The kernels here should have the same wireless code as in the wireless-testing 
tree:

   http://koji.fedoraproject.org/koji/buildinfo?buildID=46311

Do they resolve the problem as well?

Comment 15 Alex Eskin 2008-04-19 02:04:09 UTC
It is very puzzling. I got two crashes within 20 minutes with the kernel from
comment #14
(but did not capture an oops yet). On the other hand,  wireless-testing head as
of 4/7/08
was running completely stable for more then a week.

So I am not sure what to think. Perhaps there is some difference in the kernel
configuration
which is responsible? I am attaching the .config from a kernel which works well
(comment #13).



Comment 16 Alex Eskin 2008-04-19 02:06:00 UTC
Created attachment 302964 [details]
.config from stable kernel (from comment #13)

Comment 17 John W. Linville 2008-07-08 20:04:54 UTC
Do you find the 2.6.25-based kernels satisfactory?

Comment 18 John W. Linville 2008-08-19 18:34:18 UTC
Closed due to lack of response...


Note You need to log in before you can comment on or make changes to this bug.