Hide Forgot
Created attachment 498725 [details] screen shot Description of problem: I have an ASUS E35M1-I PRO board, Zacate processor. I have AMANDA setup to backup from 4 machines, 4 disksets each. /home on 4 is quite large. Linux consistently crashes (usually a reboot with no messages). I got one message. It is attached as a picture as it was the only way I could get it. Version-Release number of selected component (if applicable): kernel-2.6.38.5-24.fc15.x86_64 How reproducible: Every time I do amdump.
This may or may not be caused by a bad board. However, I have another bug report I am filing which may or may not be related, on a new good board. Bug #707686
This is NOT due to faulty hardware and exists in kernel kernel-2.6.38.6-27.fc15.x86_64. I tried to take another picture, but it was too blurry and I had to reboot the machine. It looked almost identical to the screen shot already taken. This is on a board that doesn't exhibit the hardware problems mentioned above.
This happens with C6 on or off.
Ok, I think I am seeing three things that may or may not be causing this. The one I know for certain is at least causing reboots and a few crashes: RTL8168b/8111b (built in) -- when I switch to RTL8169sb/8110sb (add on card), they go away. I see the following types of messages on the 8168b, but not on the 8169sb May 28 01:07:51 FC kernel: [ 2015.591351] NOHZ: local_softirq_pending 08 May 28 01:07:52 FC kernel: [ 2016.029662] NOHZ: local_softirq_pending 08 May 28 01:07:52 FC kernel: [ 2016.029817] NOHZ: local_softirq_pending 08 May 28 01:07:52 FC kernel: [ 2016.096437] NOHZ: local_softirq_pending 08 May 28 01:07:56 FC kernel: [ 2019.945279] net_ratelimit: 50 callbacks suppressed The closest I have to this with the 8169 is that I occasionally get messages like this: BUG: soft lockup - CPU#1 stuck for 64s! [kworker/1:0:9]. But that is during a RAID 5 rebuild due to the previous crashes I think. The other possible ones causing some of the crashes with messages (a few of which I was able to post here) may be caused by C6 and EPU Power Saving being enabled. I will be testing those to see if they are red herrings next week. I know the R8168b bug is a long standing one (finding it accidentally is what led to me testing all this). It seems it may be time to fix it, if possible, since this is being found in new MotherBoards.
Created attachment 505611 [details] A freeze backtrace This may or may not show the bug from the screenshot since it locked the screen, I do not know.
I should mention that any backtraces after June 16 at 6:16 AM MDT is from kernel-2.6.38.8-32.fc15.x86_64
Created attachment 505618 [details] This one looks much different
Created attachment 505621 [details] a few more backtraces I do not think I will do anymore. While there are some unique parts, there appears to be a core that is repeated over and over. I imagine the trouble is there.
I switched Realtek 8169 to Intel e100e PCIe card. I have not been able to duplicate any of these problems since, even under very heavy load. The process is also much more idle (nearly completely used w/ 8169 and about 30-70% idle most of the time, more than 50 quite often, with the later card). I do not know if the 8169 chipset is just broken or if the driver is, but the problem lies with one of the two.
Have you happened to test this issue with the 2.6.40.6 kernels? I realize you have switched to an Intel card at this point, but thought it might be worth asking. If the issues are resolved for you then we might close the bug out unless you're willing to recreate it with the latest F15 kernel.
I cannot test this. I am sorry. For me, the issue is resolved. I understand that a kernel fix may have fixed this (something to do with DMA if I remember right).
OK, thank you for letting us know.