I'm running an Athlon 64 (running 32 bit though). It has locked up several
times lately, and this time it logged something before freezing (I'll attach the
log). I'm also updating my system to 2.6.11-1.27_FC3 in case the update should
fix this (although I don't see any similar bugzilla reports right off).
Created attachment 115068 [details]
2.6.11-1.27_FC3 froze up too while rsync was running (mirroring today's release
of Fedora Extras 4). I think rsync was running when it froze up earlier today.
Created attachment 115070 [details]
On the face of it these ones are of the "can't happen!" type, which is
confusing: we're calling
bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
and journal->j_blocksize is a constant that we never, ever modify. This may be
memory corruption somewhere that just happened to hit the filesystem. It's not
something I've seen reported anywhere else.
Does your system pass a memtest86 test? Do you have any prior oopses that you
could attach? Thanks.
These are the only two oopses logged. I'll fire up memtest tonight, but the
system is only a few months old and passed memtest when I built it.
It passed a full memtest86+ pass with no errors reported.
One pass gives very little confidence; I normally recommend a full overnight run
at the very least!
The fact that it just started locking up, and that the symptoms include
corruption in memory that ext3 never touches, mean that we really need to
eliminate hardware concerns first here.
I knew I'd get that, but I couldn't leave it running overnight last
I have started memtest again on that system; I'll check it tomorrow
memtest made 52 passes with no errors (around 20 hours).
OK, can you please run a "fsck -f" on the filesystems and record the output (eg.
run it under "script"), and attach that if it shows up any problems?
Jun 1 15:36:18 kosh kernel: EIP is at __mod_timer+0x1f9/0x6c5
is also a concern; it really looks like the call
is oopsing, and that timer is again something that's just allocated once when
the filesystem is mounted, and the memory never deallocated again afterwards
until umount. I have *NEVER* seen this sort of thing before in ext3; the only
instances I've ever seen of the journal struct itself getting corrupted like
this have been down to bad hardware or random memory corruption by some other
It sounds like we may need a debug kernel to get to the bottom of who is doing this.
Okay, I've got 3 filesystems mounted:
The first two checked okay, but I got some errors on the third (which I let fsck
fix). I'll attach the output.
The /data fs is where my local mirror of FC, FE, and livna live, which may be
related (I seemed to have crashes when rsync was running, but I can't correlate
them for sure). The file that had problems is not a mirrored file however; I
haven't accessed it in a while.
Created attachment 115318 [details]
fsck -f output
I got another crash at the same place (during fairly heavy I/O on the fs with my
mirrors). I'll attach the oops in case there is more info you can get from it.
This is my main home PC. I have no problem running extra debugging if needed;
just let me know what I can do to help.
Created attachment 115325 [details]
Another crash oops log
Never mind; it must have been my computer acting flakey.
I blew out some dust, reflashed the BIOS down a rev, and it has been up without
a problem since. Unless I see something else, it must have been a fluke (weird
that it passed memtest though).
Sorry to have wasted your time on this one.
Well, my system actually crashed not longer after I closed the bug. I've been
trying this and that, checking hardware, making sure nothing was overheating, etc.
Tonight, I've been attempting to transcode a video. Under 2.6.11-1.27_FC3 and
2.6.11-1.35_FC3, I get crashes (I also got crashes playing bzflag, usually when
I tried to quit). All the crashes under the 35 kernel are:
kernel BUG at mm/rmap.c:482!
invalid operand: 0000 [#1]
(I'll attach a full boot to crash log from a serial console)
The 27 kernel crashes are invalid operand but don't have the "kernel BUG" message.
The only other kernel still available is the distribution kernel, 2.6.9-1.667.
I loaded and booted it, and I have not had another crash.
As a reminder, this is all running FC3 i386 on an Athlon64. I tried installing
FC4 x86_64 and poked at it a little, and saw some oddness there. If I turn on
"Cool & Quiet" in the BIOS, I get random segfaults during boot (sometimes can't
get to a text login even). If I turn it off, powernow-k8 complains, but the
system seems to run (I didn't try anything heavy yet though).
Suggestions? If the 2.6.9-1.667 kernel didn't run just fine, I'd just write it
off to bad hardware, but I haven't been able to find anything wrong (and in some
testing, Win2000 seems to run okay on this box; I did a little bit of video
conversion there even with no problem). memtest86+ still finds no problems (I
ran it some more last night).
Created attachment 116556 [details]
boot through crash log
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem. Please update to this new kernel, and
report whether or not it fixes your problem.
If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.
It appears that this has been fixed. I'm running 2.6.12-1372_FC3 with no
problems now for several days.