I'm running an Athlon 64 (running 32 bit though). It has locked up several times lately, and this time it logged something before freezing (I'll attach the log). I'm also updating my system to 2.6.11-1.27_FC3 in case the update should fix this (although I don't see any similar bugzilla reports right off).
Created attachment 115068 [details] crash log
2.6.11-1.27_FC3 froze up too while rsync was running (mirroring today's release of Fedora Extras 4). I think rsync was running when it froze up earlier today. Oops attached.
Created attachment 115070 [details] 2.6.11-1.27_FC3 oops
On the face of it these ones are of the "can't happen!" type, which is confusing: we're calling bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize); and journal->j_blocksize is a constant that we never, ever modify. This may be memory corruption somewhere that just happened to hit the filesystem. It's not something I've seen reported anywhere else. Does your system pass a memtest86 test? Do you have any prior oopses that you could attach? Thanks.
These are the only two oopses logged. I'll fire up memtest tonight, but the system is only a few months old and passed memtest when I built it.
It passed a full memtest86+ pass with no errors reported.
One pass gives very little confidence; I normally recommend a full overnight run at the very least! The fact that it just started locking up, and that the symptoms include corruption in memory that ext3 never touches, mean that we really need to eliminate hardware concerns first here.
I knew I'd get that, but I couldn't leave it running overnight last night. I have started memtest again on that system; I'll check it tomorrow after work.
memtest made 52 passes with no errors (around 20 hours).
OK, can you please run a "fsck -f" on the filesystems and record the output (eg. run it under "script"), and attach that if it shows up any problems? Jun 1 15:36:18 kosh kernel: EIP is at __mod_timer+0x1f9/0x6c5 is also a concern; it really looks like the call add_timer(journal->j_commit_timer); is oopsing, and that timer is again something that's just allocated once when the filesystem is mounted, and the memory never deallocated again afterwards until umount. I have *NEVER* seen this sort of thing before in ext3; the only instances I've ever seen of the journal struct itself getting corrupted like this have been down to bad hardware or random memory corruption by some other driver. It sounds like we may need a debug kernel to get to the bottom of who is doing this.
Okay, I've got 3 filesystems mounted: /: /dev/kosh32/root /boot: LABEL=fc32boot /data: LABEL=data The first two checked okay, but I got some errors on the third (which I let fsck fix). I'll attach the output. The /data fs is where my local mirror of FC, FE, and livna live, which may be related (I seemed to have crashes when rsync was running, but I can't correlate them for sure). The file that had problems is not a mirrored file however; I haven't accessed it in a while.
Created attachment 115318 [details] fsck -f output
I got another crash at the same place (during fairly heavy I/O on the fs with my mirrors). I'll attach the oops in case there is more info you can get from it. This is my main home PC. I have no problem running extra debugging if needed; just let me know what I can do to help.
Created attachment 115325 [details] Another crash oops log
Never mind; it must have been my computer acting flakey. I blew out some dust, reflashed the BIOS down a rev, and it has been up without a problem since. Unless I see something else, it must have been a fluke (weird that it passed memtest though). Sorry to have wasted your time on this one.
Well, my system actually crashed not longer after I closed the bug. I've been trying this and that, checking hardware, making sure nothing was overheating, etc. Tonight, I've been attempting to transcode a video. Under 2.6.11-1.27_FC3 and 2.6.11-1.35_FC3, I get crashes (I also got crashes playing bzflag, usually when I tried to quit). All the crashes under the 35 kernel are: kernel BUG at mm/rmap.c:482! invalid operand: 0000 [#1] (I'll attach a full boot to crash log from a serial console) The 27 kernel crashes are invalid operand but don't have the "kernel BUG" message. The only other kernel still available is the distribution kernel, 2.6.9-1.667. I loaded and booted it, and I have not had another crash. As a reminder, this is all running FC3 i386 on an Athlon64. I tried installing FC4 x86_64 and poked at it a little, and saw some oddness there. If I turn on "Cool & Quiet" in the BIOS, I get random segfaults during boot (sometimes can't get to a text login even). If I turn it off, powernow-k8 complains, but the system seems to run (I didn't try anything heavy yet though). Suggestions? If the 2.6.9-1.667 kernel didn't run just fine, I'd just write it off to bad hardware, but I haven't been able to find anything wrong (and in some testing, Win2000 seems to run okay on this box; I did a little bit of video conversion there even with no problem). memtest86+ still finds no problems (I ran it some more last night). Suggestions?
Created attachment 116556 [details] boot through crash log
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
It appears that this has been fixed. I'm running 2.6.12-1372_FC3 with no problems now for several days.