159364 – Crash in 2.6.11-1.14_FC3: getblk(): invalid block size 1226845299 requested

Bug 159364 - Crash in 2.6.11-1.14_FC3: getblk(): invalid block size 1226845299 requested

Summary: Crash in 2.6.11-1.14_FC3: getblk(): invalid block size 1226845299 requested

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-06-01 22:36 UTC by Chris Adams
Modified:	2007-11-30 22:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-07-21 01:25:10 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
crash log (9.21 KB, text/plain) 2005-06-01 22:36 UTC, Chris Adams	no flags	Details
2.6.11-1.27_FC3 oops (2.39 KB, text/plain) 2005-06-01 23:36 UTC, Chris Adams	no flags	Details
fsck -f output (2.71 KB, text/plain) 2005-06-10 23:38 UTC, Chris Adams	no flags	Details
Another crash oops log (3.58 KB, text/plain) 2005-06-11 16:32 UTC, Chris Adams	no flags	Details
boot through crash log (13.75 KB, text/plain) 2005-07-10 03:26 UTC, Chris Adams	no flags	Details
View All

Description Chris Adams 2005-06-01 22:36:13 UTC

I'm running an Athlon 64 (running 32 bit though).  It has locked up several
times lately, and this time it logged something before freezing (I'll attach the
log).  I'm also updating my system to 2.6.11-1.27_FC3 in case the update should
fix this (although I don't see any similar bugzilla reports right off).

Comment 1 Chris Adams 2005-06-01 22:36:13 UTC

Created attachment 115068 [details]
crash log

Comment 2 Chris Adams 2005-06-01 23:35:59 UTC

2.6.11-1.27_FC3 froze up too while rsync was running (mirroring today's release
of Fedora Extras 4).  I think rsync was running when it froze up earlier today.

Oops attached.

Comment 3 Chris Adams 2005-06-01 23:36:33 UTC

Created attachment 115070 [details]
2.6.11-1.27_FC3 oops

Comment 4 Stephen Tweedie 2005-06-06 16:04:49 UTC

On the face of it these ones are of the "can't happen!" type, which is
confusing:  we're calling

	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);

and journal->j_blocksize is a constant that we never, ever modify.  This may be
memory corruption somewhere that just happened to hit the filesystem.  It's not
something I've seen reported anywhere else.

Does your system pass a memtest86 test?  Do you have any prior oopses that you
could attach?  Thanks.

Comment 5 Chris Adams 2005-06-07 02:03:30 UTC

These are the only two oopses logged.  I'll fire up memtest tonight, but the
system is only a few months old and passed memtest when I built it.

Comment 6 Chris Adams 2005-06-07 02:39:13 UTC

It passed a full memtest86+ pass with no errors reported.

Comment 7 Stephen Tweedie 2005-06-07 10:07:53 UTC

One pass gives very little confidence; I normally recommend a full overnight run
at the very least!

The fact that it just started locking up, and that the symptoms include
corruption in memory that ext3 never touches, mean that we really need to
eliminate hardware concerns first here.

Comment 8 Chris Adams 2005-06-08 02:30:34 UTC

I knew I'd get that, but I couldn't leave it running overnight last
night.

I have started memtest again on that system; I'll check it tomorrow
after work.

Comment 9 Chris Adams 2005-06-08 22:43:33 UTC

memtest made 52 passes with no errors (around 20 hours).

Comment 10 Stephen Tweedie 2005-06-09 08:57:26 UTC

OK, can you please run a "fsck -f" on the filesystems and record the output (eg.
run it under "script"), and attach that if it shows up any problems?

Jun  1 15:36:18 kosh kernel: EIP is at __mod_timer+0x1f9/0x6c5

is also a concern; it really looks like the call

	add_timer(journal->j_commit_timer);

is oopsing, and that timer is again something that's just allocated once when
the filesystem is mounted, and the memory never deallocated again afterwards
until umount.  I have *NEVER* seen this sort of thing before in ext3; the only
instances I've ever seen of the journal struct itself getting corrupted like
this have been down to bad hardware or random memory corruption by some other
driver.

It sounds like we may need a debug kernel to get to the bottom of who is doing this.

Comment 11 Chris Adams 2005-06-10 23:37:19 UTC

Okay, I've got 3 filesystems mounted:

/: /dev/kosh32/root
/boot: LABEL=fc32boot
/data: LABEL=data

The first two checked okay, but I got some errors on the third (which I let fsck
fix).  I'll attach the output.

The /data fs is where my local mirror of FC, FE, and livna live, which may be
related (I seemed to have crashes when rsync was running, but I can't correlate
them for sure).  The file that had problems is not a mirrored file however; I
haven't accessed it in a while.

Comment 12 Chris Adams 2005-06-10 23:38:02 UTC

Created attachment 115318 [details]
fsck -f output

Comment 13 Chris Adams 2005-06-11 16:31:20 UTC

I got another crash at the same place (during fairly heavy I/O on the fs with my
mirrors).  I'll attach the oops in case there is more info you can get from it.

This is my main home PC.  I have no problem running extra debugging if needed;
just let me know what I can do to help.

Comment 14 Chris Adams 2005-06-11 16:32:19 UTC

Created attachment 115325 [details]
Another crash oops log

Comment 15 Chris Adams 2005-06-21 02:13:27 UTC

Never mind; it must have been my computer acting flakey.

I blew out some dust, reflashed the BIOS down a rev, and it has been up without
a problem since.  Unless I see something else, it must have been a fluke (weird
that it passed memtest though).

Sorry to have wasted your time on this one.

Comment 16 Chris Adams 2005-07-10 03:26:08 UTC

Well, my system actually crashed not longer after I closed the bug.  I've been
trying this and that, checking hardware, making sure nothing was overheating, etc.

Tonight, I've been attempting to transcode a video.  Under 2.6.11-1.27_FC3 and
2.6.11-1.35_FC3, I get crashes (I also got crashes playing bzflag, usually when
I tried to quit).  All the crashes under the 35 kernel are:

kernel BUG at mm/rmap.c:482! 
invalid operand: 0000 [#1]
(I'll attach a full boot to crash log from a serial console)

The 27 kernel crashes are invalid operand but don't have the "kernel BUG" message.

The only other kernel still available is the distribution kernel, 2.6.9-1.667. 
I loaded and booted it, and I have not had another crash.

As a reminder, this is all running FC3 i386 on an Athlon64.  I tried installing
FC4 x86_64 and poked at it a little, and saw some oddness there.  If I turn on
"Cool & Quiet" in the BIOS, I get random segfaults during boot (sometimes can't
get to a text login even).  If I turn it off, powernow-k8 complains, but the
system seems to run (I didn't try anything heavy yet though).

Suggestions?  If the 2.6.9-1.667 kernel didn't run just fine, I'd just write it
off to bad hardware, but I haven't been able to find anything wrong (and in some
testing, Win2000 seems to run okay on this box; I did a little bit of video
conversion there even with no problem).  memtest86+ still finds no problems (I
ran it some more last night).

Suggestions?

Comment 17 Chris Adams 2005-07-10 03:26:56 UTC

Created attachment 116556 [details]
boot through crash log

Comment 19 Dave Jones 2005-07-15 19:02:27 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 20 Chris Adams 2005-07-21 01:25:10 UTC

It appears that this has been fixed.  I'm running 2.6.12-1372_FC3 with no
problems now for several days.

Note You need to log in before you can comment on or make changes to this bug.