Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 307018 [details] gfs2 oops
fs/gfs2/rgrp.c:200 if (((*plong) & LBITMASK) != lskipval) break; plong is in %rdx and == ffff81002690dffb we fall off the end of the page and onto the next one which is unmapped (*plong spans two pages)
Hey Chuck, can I get you to save off the metadata for this file system? I'm pretty sure I see what the problem is, but I don't want to make any code changes until I can recreate the problem so I can prove I've fixed it. I've been trying to recreate this for a while now and haven't had any luck. To save off the metadata do something like this: gfs2_edit savemeta /dev/your/device /tmp/savemeta.448866 bzip2 /tmp/savemeta.448866 Then attach the resulting .bz file to the bugzilla. Thanks.
I suspect Chunk has been on vacation or something because he hasn't been seen on irc this whole week. I've been trying a long time to recreate this failure and have not been successful. I've developed scenarios that fill up the RGs up to a certain point, then I push it over the boundary, but no failure (yet). Here is a sequence that fills up the bitmaps up to the end of rg 3, except for the last 12 bytes: vgchange -an exxon_vg lvremove /dev/exxon_vg/exxon_lv lvcreate --name exxon_lv -l 63488 /dev/exxon_vg mkfs.gfs2 -X -b4096 -r62 -O -j1 -p lock_nolock /dev/exxon_vg/exxon_lv mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2 dd if=/dev/zero of=/mnt/gfs2/filler bs=4096 count=31004 for i in `seq 1 3096` ; do touch /mnt/gfs2/c$i ; done umount /mnt/gfs2 mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2 for i in `seq 3101 3200` ; do touch /mnt/gfs2/c$i ; done rm /mnt/gfs2/c3110 umount /mnt/gfs2 This scenario fills all but 4 bytes of rg 3: mkfs.gfs2 -X -b4096 -r62 -O -j1 -p lock_nolock /dev/exxon_vg/exxon_lv mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2 dd if=/dev/zero of=/mnt/gfs2/filler bs=4096 count=31004 for i in `seq 1 3096` ; do touch /mnt/gfs2/c$i ; done umount /mnt/gfs2 mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2 for i in `seq 3101 3232` ; do touch /mnt/gfs2/c$i ; done rm /mnt/gfs2/c3110 umount /mnt/gfs2 Now you're probably wondering why I'm playing these seemingly unnecessary games in the commands above. The reason is simple: There's something "fishy" with our block allocator. (Not to say it's wrong; it just doesn't behave as I would have expected). If I just use dd to push out a bunch of data to the file system, it won't fill up the bitmaps to the end. It always seems to leave a good chunk of 0x0c bytes or more free at the end. If I unmount the file system and do a bunch of single-file touches, it will, in fact, fill out those last several blocks of the bitmap. After way too much tedium in my analysis, to help in this investigation, I did some much needed enhancements to the gfs2_edit tool. For example, I can now go directly to RG 3 in interactive mode by doing this command: gfs2_edit -s "rg 3" /dev/exxon_vg/exxon_lv I also added the ability to enter a keyword, such as "rg 4" in the block number field (at the top) to jump there directly. That saves me a ton of keystrokes traipsing from superblock to master directory, master directory to rindex, and rindex to rg 4. I was originally convinced that this problem had to do with the lines of code mentioned in comment #2. However, that line of code should only be executed if the pointer is aligned on a proper long int boundary (should be 8-byte boundary on x86_64), so I can't see how it could possibly get there. I may have to wait until Chuck gets back and gives me either a scenario to recreate the problem or a copy of his metadata.
Sorry for the typo--fingers flying too fast; I meant Chuck.
Created attachment 308854 [details] metadata
I examined the metadata and didn't find anything unusual. I managed to create a GFS2 file system with the EXACT same resource group and bitmap layout as Dave's metadata by doing this: [root@exxon-01 ~]# fdisk /dev/sdb The number of cylinders for this disk is set to 60799. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-60799, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-60799, default 60799): +1014075K Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@exxon-01 ~]# mkfs.gfs2 -O -p lock_nolock /dev/sdb1 Device: /dev/sdb1 Blocksize: 4096 Device Size 0.97 GB (255024 blocks) Filesystem Size: 0.97 GB (255021 blocks) Journals: 1 Resource Groups: 4 Locking Protocol: "lock_nolock" Lock Table: "" Then I mounted it, and compiled and ran fsx from: http://www.codemonkey.org.uk/projects/fsx/fsx-linux.c No errors for me. Next, I restored Dave's metadata over top of this device, remounted, and reran fsx. Again, no problem for me. The one thing I haven't tried yet it running the file system over md raid0, which is apparently what Dave was using. I'll try that next.
I set up a software RAID0 device of a similar size. I had a hard time getting gfs2.mkfs use the same RG boundaries as Dave's metadata, so I coded up a patch for bug #450764 which allowed me to specify the block size I required to mkfs.gfs2. That enabled me to create a MD device with the exact same configuration as Dave. Then I ran fsx on it, but it ran for nearly an hour without failing, on RHEL5. Next, I restored Dave's metadata over the top of that same MD device. After a reboot, I ran fsx again for another hour, but it still did not fail. I tried using the latest and greatest nwm git tree, but it no longer appears to contain a lock_nolock module, and so it doesn't want to mount. I get this message from the mount helper: ./mount.gfs2: error mounting /dev/md0 on /mnt/gfs2: No such device even though /dev/md0 is a valid device at that point. I tried to run it on a kernel-2.6.26-0.54.rc4.git5.fc10.src.rpm kernel, (compiled from source rpm) which should at least be close to what Dave's running. Unfortunately, it panics the kernel at bootup. I verified that rgrp.c is the same as the one in the nmw tree.
I recreated this problem by scratching roth-02 to F9, installing a rawhide kernel and gfs2-utils and running fsx on an MD device. I'm just using partitions on a local hard disk: sdb1 & sdb2.
Created attachment 309757 [details] Patch to fix the problem Tested on roth-02 with the same scenario I could reliably recreate.
This patch has been posted upstream so I'm closing the bug as UPSTREAM.