Bug 440180

Summary: random crashes all 2.6.24 kernels, 2.6.23 stable
Product: [Fedora] Fedora Reporter: Alex Eskin <alexeskin>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 8CC: kernel-maint, mcgrof
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-19 18:34:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of dmesg
none
dmesg output from kernel-debug-2.6.24.4-64.fc8
none
/var/log/messages excrept from crash with current linus tree
none
.config from stable kernel (from comment #13) none

Description Alex Eskin 2008-04-02 02:57:50 UTC
Description of problem:

I see random hangs when running any 2.6.24 kernel. But if
I run 2.6.23.9-85, the system is stable. I managed to 
capture the output of dmesg after one crash, and it
seems to have an oops. 


Version-Release number of selected component (if applicable):
kernel-2.6.24.3-50.fc8

How reproducible:
Intermittent



Steps to Reproduce:
1. Boot with kernel 2.6.24-50.fc8
2. Use the system for a while (streaming video may make problem
   occur more quickly, but it may be an illusion)
3.
  
Actual results:
System freezes, or X crashes. Attached output of dmesg after
such a crash

Expected results:
It should be at least as stable as 2.6.23

Additional info:

Comment 1 Alex Eskin 2008-04-02 02:57:50 UTC
Created attachment 300007 [details]
output of dmesg

Comment 2 Alex Eskin 2008-04-02 23:35:39 UTC
Created attachment 300152 [details]
dmesg output from kernel-debug-2.6.24.4-64.fc8

Another crash log

Comment 3 Alex Eskin 2008-04-02 23:39:33 UTC
Problem is now fairly reproducible (watching 
streaming video causes crash within 20 minutes).
Hardware is Thinkpad T41, with radeonfb module in the initrd. 
memtest86+ has been run recently, with no problems reported.

Any other suggestions on how to debug this?


Comment 4 Chuck Ebbert 2008-04-03 02:24:59 UTC
(In reply to comment #3)
> Problem is now fairly reproducible (watching 
> streaming video causes crash within 20 minutes).

Watching streaming video using what network adapter?


Comment 5 Alex Eskin 2008-04-03 02:58:06 UTC
I was using the wifi  (ath5k driver) on kernel-debug-2.6.24.4
to get the crash. 

I guess I could try watching streaming video using 
a wired connection. Will report on this tomorrow.

Right now I am using the ath5k driver on kernel-2.6.23.9-85
and it is completely stable. 


Comment 6 Alex Eskin 2008-04-03 12:47:44 UTC
Four hours of streaming video using the wired network on kernel-debug-2.6.24.4
did not produce a crash. So perhaps the problem is some change in ath5k
or mac80211 after 2.6.23.9-85. Any suggestions on how to track it down further?


Comment 7 Chuck Ebbert 2008-04-04 02:11:38 UTC
Hmm, another report that ath5k is probably scribbling on memory.

Comment 8 Alex Eskin 2008-04-04 20:02:25 UTC
Is there a way to do the analogue of git bisect on fedora kernels? 

I could try bisecting the mainline kernels if I could figure out which 
mainline kernel corresponds to the versions of ath5k and mac80211 
in fedora kernel 2.6.23.9-85. I can trigger the crash within half an
hour or so, so bisection is not out of the question.


Comment 9 John W. Linville 2008-04-04 21:07:19 UTC
2.6.23.9-85.fc8 was built in early December, well before 2.6.24 was available.  
Yet, it probably contained wireless bits that will only be in 2.6.25 and 
beyond.

If you wouldn't mind, it might be helpful if you could determine whether or 
not the issue is observable in the current linux-2.6 kernel from Linus.  If 
so, you could bisect between 2.6.24 and the current HEAD.

If the problem is not observable in Linus' kernel, then you could bisect the 
wireless-testing tree.  If Linus' kernel is not affected, then the problem 
should occur between the merge-2008-04-01 tag and the master-2008-04-01.  
(FWIW, merge-2008-04-01 tag is equivalent to 2.6.25-rc8 and master-2008-04-01 
has all the wireless patches queued for 2.6.26 as of 2008-04-01.)

Thanks for offering the bisection service! :-)

Comment 10 Alex Eskin 2008-04-05 21:17:34 UTC
I tried the current linus kernel, but I am not sure how to interpret
the results. I was able to stream video for a few hours, but eventually
I got a crash. This time, I had a whole bunch of "assertion failed"
notes in /var/log/messages (which are attached). 

So I am not sure if this crash is the same issue as what I had before,
and I do not know what the next step should be. Any suggestions?




Comment 11 Alex Eskin 2008-04-05 21:19:40 UTC
Created attachment 301391 [details]
/var/log/messages excrept from crash with current linus tree

Comment 12 John W. Linville 2008-04-07 12:26:42 UTC
That looks like a different issue to me.  Perhaps you could try bisecting the 
wireless-testing tree as described in comment 9?

Comment 13 Alex Eskin 2008-04-07 15:09:38 UTC
I have been running the head of wireless-testing for two days now with no problems.
So at the moment it looks like the issue either got fixed or got masked by something
(or perhaps was not as reproducible as I thought). 

So I guess there is not much sense in bisecting, but I will give another report
in a week.



Comment 14 John W. Linville 2008-04-16 13:03:41 UTC
The kernels here should have the same wireless code as in the wireless-testing 
tree:

   http://koji.fedoraproject.org/koji/buildinfo?buildID=46311

Do they resolve the problem as well?

Comment 15 Alex Eskin 2008-04-19 02:04:09 UTC
It is very puzzling. I got two crashes within 20 minutes with the kernel from
comment #14
(but did not capture an oops yet). On the other hand,  wireless-testing head as
of 4/7/08
was running completely stable for more then a week.

So I am not sure what to think. Perhaps there is some difference in the kernel
configuration
which is responsible? I am attaching the .config from a kernel which works well
(comment #13).



Comment 16 Alex Eskin 2008-04-19 02:06:00 UTC
Created attachment 302964 [details]
.config from stable kernel (from comment #13)

Comment 17 John W. Linville 2008-07-08 20:04:54 UTC
Do you find the 2.6.25-based kernels satisfactory?

Comment 18 John W. Linville 2008-08-19 18:34:18 UTC
Closed due to lack of response...