Red Hat Bugzilla – Bug 433309
GFS2: attempt to grow filesystem segment faults
Last modified: 2010-01-11 22:40:48 EST
Description of problem:
Extended an lvm volume from 200GB to 400GB. When trying to grow the gfs2 volume
it did a segment fault. The work around was to mkfs.gfs2 the volume. It is ok
for the Oracle testing I am doing but not for production machines. Rebooting
cluster did not resolve the issue
Version-Release number of selected component (if applicable):
Happens any time trying to resize a volume
Steps to Reproduce:
lvextend -L XX (XX=size) /dev/mapper/msa-archieve
1. lvextend -L XX (XX=size)
2. mount /archieve
3. gfs2_grow /archieve - segment fault
segment fault: from /var/log/messages
Feb 18 09:37:47 et-virt08 kernel: gfs2_grow: segfault at 0000000c17de5868
rip 000000000040f026 rsp 00007fffc7b559d0 error 4
Feb 18 09:37:55 et-virt08 kernel: gfs2_grow: segfault at 0000000c16ed3868
rip 000000000040f026 rsp 00007fffad45b2e0 error 4
Feb 18 09:49:27 et-virt08 kernel: gfs2_grow: segfault at 0000000c02b65868
rip 000000000040f026 rsp 00007fff4793e7c0 error 4
Volume should be able to mount with new size (from 200GB to 400GB)
Additional info: Workaround was to mkfs.gfs2 the volume (no critical data on the
kernel version : 2.6.18-71.el5
gfs2 utilities : gfs2-utils-0.1.38-1.el5
I was not able to recreate this failure using my latest code of
gfs2-kernel and gfs2 userland, which should be close to what's in 5.2.
Can I get the exact commands you're using to recreate the failure
and the resulting call stack from the failure? I want to know if
this only happens under certain conditions, like certain block sizes,
RG sizes, number of journals, etc. Also, please check if there are
any messages in dmesg. Here is what I did:
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# lvcreate --name roth_lv -L 200G
Logical volume "roth_lv" created
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# mkfs.gfs2 -O -t bobs_roth:test_gfs -X
-p lock_dlm -j 3 /dev/roth_vg/roth_lv
Expert mode: on
Device Size 200.00 GB (52428800 blocks)
Filesystem Size: 200.00 GB (52428798 blocks)
Resource Groups: 800
Locking Protocol: "lock_dlm"
Lock Table: "bobs_roth:test_gfs"
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# /usr/sbin/lvresize -L +200G
Extending logical volume roth_lv to 400.00 GB
Logical volume roth_lv successfully resized
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# mount -tgfs2 /dev/roth_vg/roth_lv
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# gfs2_grow /mnt/gfs2
FS: Mount Point: /mnt/gfs2
FS: Device: /dev/mapper/roth_vg-roth_lv
FS: Size: 52428798 (0x31ffffe)
FS: RG size: 65535 (0xffff)
DEV: Size: 104857600 (0x6400000)
The file system grew by 204800MB.
To reproduce this issue
lvcreate --name archieve -L 200 /dev/mapper/msa-archieve
lvresize -L +200G /dev/mapper/msa-archieve
mount -t gfs2 /dev/mapper/msa-agrep rchieve /archive
/var/log/messages.2:Feb 8 14:02:15 et-virt08 kernel: multipathd:
segfault at 000000000000000a rip 000000356b06fa7d rsp 0000000008772220 error 4
Did you try this with DM Multipath? That is the difference I am seeing between
our two experiments
The two differences between what you did and what I did are:
(1) You used /dev/mapper devices on your commands, (2) DM Multipath.
I tried to recreate this problem using the device mapper device
(/dev/mapper/whatever) on the commands and still didn't get it to fail.
In other words: This is looking more and more like a DM Multipath
problem, especially based on the segfault message posted in comment #3.
Perhaps the DM Multipath kernel module died? I'm adding Ben M. to the
cc list to get his input.
I'd still like to get what appears on the console dmesgs at the point
of failure so we can see the complete call stacks.
Are you still having this problem? Perhaps I can get access to your
test system and debug gfs2_grow manually from there.
Since GFS2 was pushed back, I am using the storage on another cluster. Let
me see if I have enough space so you can test it... If I have enough, send me
your personal email and will let you know the details
I'm waiting to hear back on this, but the test is impacted by the move
in Westford. Until then, I'm putting the bug record in NEEDINFO.
This has been in NEEDINFO for long enough that it can't reasonably be considered
high priority any more.
This still looks like a possible dm multipath problem to me. I'm
assuming this is a duplicate of bug #426030, but I have no way to
This bug has hit the six-month limit in NEEDINFO, so I'm closing it as
INSUFFICIENT_DATA. I'm assuming the problem will be pursued in bug
#426030. If the problem can be reproduced, we can always reopen this