448866 – gfs2: BUG: unable to handle kernel paging request at ffff81002690e000

Bug 448866 - gfs2: BUG: unable to handle kernel paging request at ffff81002690e000

Summary: gfs2: BUG: unable to handle kernel paging request at ffff81002690e000

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	high
Severity:	low
Target Milestone:	---
Assignee:	Robert Peterson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-29 04:54 UTC by Chuck Ebbert
Modified:	2008-06-24 22:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-06-24 22:11:44 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
gfs2 oops (5.78 KB, text/plain) 2008-05-29 04:54 UTC, Chuck Ebbert	no flags	Details
metadata (376.75 KB, application/octet-stream) 2008-06-10 19:55 UTC, Dave Jones	no flags	Details
Patch to fix the problem (598 bytes, patch) 2008-06-18 16:27 UTC, Robert Peterson	no flags	Details \| Diff
View All

Description Chuck Ebbert 2008-05-29 04:54:48 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Chuck Ebbert 2008-05-29 04:54:48 UTC

Created attachment 307018 [details]
gfs2 oops

Comment 2 Chuck Ebbert 2008-05-29 05:04:29 UTC

fs/gfs2/rgrp.c:200
                if (((*plong) & LBITMASK) != lskipval)
                        break;

plong is in %rdx and == ffff81002690dffb

we fall off the end of the page and onto the next one which is unmapped
(*plong spans two pages)

Comment 4 Robert Peterson 2008-06-03 17:14:16 UTC

Hey Chuck, can I get you to save off the metadata for this file system?
I'm pretty sure I see what the problem is, but I don't want to make any
code changes until I can recreate the problem so I can prove I've fixed it.
I've been trying to recreate this for a while now and haven't had any luck.
To save off the metadata do something like this:

gfs2_edit savemeta /dev/your/device /tmp/savemeta.448866
bzip2 /tmp/savemeta.448866

Then attach the resulting .bz file to the bugzilla.  Thanks.

Comment 5 Robert Peterson 2008-06-06 02:39:43 UTC

I suspect Chunk has been on vacation or something because he hasn't been
seen on irc this whole week.

I've been trying a long time to recreate this failure and have not been
successful.  I've developed scenarios that fill up the RGs up to a
certain point, then I push it over the boundary, but no failure (yet).
Here is a sequence that fills up the bitmaps up to the end of rg 3,
except for the last 12 bytes:

vgchange -an exxon_vg
lvremove /dev/exxon_vg/exxon_lv
lvcreate --name exxon_lv -l 63488 /dev/exxon_vg
mkfs.gfs2 -X -b4096 -r62 -O -j1 -p lock_nolock /dev/exxon_vg/exxon_lv
mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2
dd if=/dev/zero of=/mnt/gfs2/filler bs=4096 count=31004
for i in `seq 1 3096` ; do touch /mnt/gfs2/c$i ; done
umount /mnt/gfs2
mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2
for i in `seq 3101 3200` ; do touch /mnt/gfs2/c$i ; done
rm /mnt/gfs2/c3110
umount /mnt/gfs2

This scenario fills all but 4 bytes of rg 3:

mkfs.gfs2 -X -b4096 -r62 -O -j1 -p lock_nolock /dev/exxon_vg/exxon_lv
mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2
dd if=/dev/zero of=/mnt/gfs2/filler bs=4096 count=31004
for i in `seq 1 3096` ; do touch /mnt/gfs2/c$i ; done
umount /mnt/gfs2
mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2
for i in `seq 3101 3232` ; do touch /mnt/gfs2/c$i ; done
rm /mnt/gfs2/c3110
umount /mnt/gfs2

Now you're probably wondering why I'm playing these seemingly 
unnecessary games in the commands above.  The reason is simple:
There's something "fishy" with our block allocator.  (Not to say
it's wrong; it just doesn't behave as I would have expected).  If
I just use dd to push out a bunch of data to the file system, it
won't fill up the bitmaps to the end.  It always seems to leave a
good chunk of 0x0c bytes or more free at the end.  If I unmount the
file system and do a bunch of single-file touches, it will, in fact,
fill out those last several blocks of the bitmap.

After way too much tedium in my analysis, to help in this
investigation, I did some much needed enhancements to the
gfs2_edit tool.  For example, I can now go directly to RG 3
in interactive mode by doing this command:

gfs2_edit -s "rg 3" /dev/exxon_vg/exxon_lv

I also added the ability to enter a keyword, such as "rg 4" in the
block number field (at the top) to jump there directly.  That saves me
a ton of keystrokes traipsing from superblock to master directory,
master directory to rindex, and rindex to rg 4.

I was originally convinced that this problem had to do with the lines
of code mentioned in comment #2.  However, that line of code should
only be executed if the pointer is aligned on a proper long int
boundary (should be 8-byte boundary on x86_64), so I can't see how
it could possibly get there.

I may have to wait until Chuck gets back and gives me either a
scenario to recreate the problem or a copy of his metadata.

Comment 6 Robert Peterson 2008-06-06 02:42:13 UTC

Sorry for the typo--fingers flying too fast; I meant Chuck.

Comment 7 Dave Jones 2008-06-10 19:55:06 UTC

Created attachment 308854 [details]
metadata

Comment 8 Robert Peterson 2008-06-10 22:27:56 UTC

I examined the metadata and didn't find anything unusual.  I managed
to create a GFS2 file system with the EXACT same resource group and
bitmap layout as Dave's metadata by doing this:

[root@exxon-01 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 60799.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-60799, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-60799, default 60799): +1014075K

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@exxon-01 ~]# mkfs.gfs2 -O -p lock_nolock /dev/sdb1
Device:                    /dev/sdb1
Blocksize:                 4096
Device Size                0.97 GB (255024 blocks)
Filesystem Size:           0.97 GB (255021 blocks)
Journals:                  1
Resource Groups:           4
Locking Protocol:          "lock_nolock"
Lock Table:                ""

Then I mounted it, and compiled and ran fsx from:
http://www.codemonkey.org.uk/projects/fsx/fsx-linux.c
No errors for me.

Next, I restored Dave's metadata over top of this device, remounted,
and reran fsx.  Again, no problem for me.
The one thing I haven't tried yet it running the file system over
md raid0, which is apparently what Dave was using.  I'll try that next.

Comment 9 Robert Peterson 2008-06-11 22:21:02 UTC

I set up a software RAID0 device of a similar size.  I had a hard time
getting gfs2.mkfs use the same RG boundaries as Dave's metadata, so I
coded up a patch for bug #450764 which allowed me to specify the block
size I required to mkfs.gfs2.  That enabled me to create a MD device
with the exact same configuration as Dave.  Then I ran fsx on it, but
it ran for nearly an hour without failing, on RHEL5.

Next, I restored Dave's metadata over the top of that same MD device.
After a reboot, I ran fsx again for another hour, but it still did not
fail.

I tried using the latest and greatest nwm git tree, but it no longer
appears to contain a lock_nolock module, and so it doesn't want to
mount.  I get this message from the mount helper:
./mount.gfs2: error mounting /dev/md0 on /mnt/gfs2: No such device
even though /dev/md0 is a valid device at that point.

I tried to run it on a kernel-2.6.26-0.54.rc4.git5.fc10.src.rpm kernel,
(compiled from source rpm) which should at least be close to what
Dave's running.  Unfortunately, it panics the kernel at bootup.
I verified that rgrp.c is the same as the one in the nmw tree.

Comment 10 Robert Peterson 2008-06-16 22:52:57 UTC

I recreated this problem by scratching roth-02 to F9, installing a
rawhide kernel and gfs2-utils and running fsx on an MD device.
I'm just using partitions on a local hard disk: sdb1 & sdb2.

Comment 11 Robert Peterson 2008-06-18 16:27:20 UTC

Created attachment 309757 [details]
Patch to fix the problem

Tested on roth-02 with the same scenario I could reliably recreate.

Comment 12 Robert Peterson 2008-06-24 22:11:44 UTC

This patch has been posted upstream so I'm closing the bug as UPSTREAM.

Note You need to log in before you can comment on or make changes to this bug.