Bug 433309

Summary: GFS2: attempt to grow filesystem segment faults
Product: Red Hat Enterprise Linux 5 Reporter: Tom Tracy <ttracy>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED INSUFFICIENT_DATA QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: bmarzins, edamato
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-30 16:10:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tom Tracy 2008-02-18 15:33:30 UTC
Description of problem:

Extended an lvm volume from 200GB to 400GB. When trying to grow the gfs2 volume
it did a segment fault. The work around was to mkfs.gfs2 the volume. It is ok
for the Oracle testing I am doing but not for production machines. Rebooting
cluster did not resolve the issue




Version-Release number of selected component (if applicable):


How reproducible:

Happens any time trying to resize a volume


Steps to Reproduce:

lvextend -L XX (XX=size) /dev/mapper/msa-archieve
mount /archieve

1. lvextend -L XX (XX=size)
2. mount /archieve
3. gfs2_grow /archieve  - segment fault
  
Actual results:

segment fault: from /var/log/messages
Feb 18 09:37:47 et-virt08 kernel: gfs2_grow[9759]: segfault at 0000000c17de5868
rip 000000000040f026 rsp 00007fffc7b559d0 error 4
Feb 18 09:37:55 et-virt08 kernel: gfs2_grow[9963]: segfault at 0000000c16ed3868
rip 000000000040f026 rsp 00007fffad45b2e0 error 4
Feb 18 09:49:27 et-virt08 kernel: gfs2_grow[7776]: segfault at 0000000c02b65868
rip 000000000040f026 rsp 00007fff4793e7c0 error 4

Expected results:
 Volume should be able to mount with new size (from 200GB to 400GB)


Additional info: Workaround was to mkfs.gfs2 the volume (no critical data on the
volume)

Comment 1 Tom Tracy 2008-02-18 18:30:49 UTC
kernel version : 2.6.18-71.el5

gfs2 utilities : gfs2-utils-0.1.38-1.el5


Comment 2 Robert Peterson 2008-02-18 22:48:32 UTC
I was not able to recreate this failure using my latest code of
gfs2-kernel and gfs2 userland, which should be close to what's in 5.2.
Can I get the exact commands you're using to recreate the failure
and the resulting call stack from the failure?  I want to know if
this only happens under certain conditions, like certain block sizes,
RG sizes, number of journals, etc.  Also, please check if there are
any messages in dmesg.  Here is what I did:

[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# lvcreate --name roth_lv -L 200G
/dev/roth_vg
  Logical volume "roth_lv" created
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# mkfs.gfs2 -O -t bobs_roth:test_gfs -X
-p lock_dlm -j 3 /dev/roth_vg/roth_lv
Expert mode:               on
Device:                    /dev/roth_vg/roth_lv
Blocksize:                 4096
Device Size                200.00 GB (52428800 blocks)
Filesystem Size:           200.00 GB (52428798 blocks)
Journals:                  3
Resource Groups:           800
Locking Protocol:          "lock_dlm"
Lock Table:                "bobs_roth:test_gfs"

[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# /usr/sbin/lvresize -L +200G
/dev/roth_vg/roth_lv
  Extending logical volume roth_lv to 400.00 GB
  Logical volume roth_lv successfully resized
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# mount -tgfs2 /dev/roth_vg/roth_lv
/mnt/gfs2
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# gfs2_grow /mnt/gfs2
FS: Mount Point: /mnt/gfs2
FS: Device:      /dev/mapper/roth_vg-roth_lv
FS: Size:        52428798 (0x31ffffe)
FS: RG size:     65535 (0xffff)
DEV: Size:       104857600 (0x6400000)
The file system grew by 204800MB.
gfs2_grow complete.
[root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# 


Comment 3 Tom Tracy 2008-02-19 14:04:05 UTC
To reproduce this issue

lvcreate --name archieve -L 200 /dev/mapper/msa-archieve
lvresize -L +200G /dev/mapper/msa-archieve
mount -t gfs2 /dev/mapper/msa-agrep rchieve /archive
gfs2_grow /archieve
/var/log/messages.2:Feb  8 14:02:15 et-virt08 kernel: multipathd[18936]:
segfault at 000000000000000a rip 000000356b06fa7d rsp 0000000008772220 error 4

Did you try this with DM Multipath? That is the difference I am seeing between
our two experiments



Comment 4 Robert Peterson 2008-02-19 15:43:06 UTC
The two differences between what you did and what I did are:
(1) You used /dev/mapper devices on your commands, (2) DM Multipath.

I tried to recreate this problem using the device mapper device
(/dev/mapper/whatever) on the commands and still didn't get it to fail.
In other words:  This is looking more and more like a DM Multipath
problem, especially based on the segfault message posted in comment #3.
Perhaps the DM Multipath kernel module died?  I'm adding Ben M. to the
cc list to get his input.

I'd still like to get what appears on the console dmesgs at the point
of failure so we can see the complete call stacks.


Comment 5 Robert Peterson 2008-04-11 15:25:18 UTC
Tom,

Are you still having this problem?  Perhaps I can get access to your
test system and debug gfs2_grow manually from there.

Bob


Comment 6 Tom Tracy 2008-04-11 16:39:11 UTC
Bob
      Since GFS2 was pushed back, I am using the storage on another cluster. Let
me see if I have enough space so you can test it... If I have enough, send me
your personal email and will let you know the details

Tom

Comment 7 Robert Peterson 2008-04-30 14:03:52 UTC
I'm waiting to hear back on this, but the test is impacted by the move
in Westford.  Until then, I'm putting the bug record in NEEDINFO.


Comment 8 Steve Whitehouse 2008-06-25 13:16:40 UTC
This has been in NEEDINFO for long enough that it can't reasonably be considered
high priority any more.

Comment 10 Robert Peterson 2008-10-30 16:10:48 UTC
This still looks like a possible dm multipath problem to me.  I'm
assuming this is a duplicate of bug #426030, but I have no way to
prove it.

This bug has hit the six-month limit in NEEDINFO, so I'm closing it as
INSUFFICIENT_DATA.  I'm assuming the problem will be pursued in bug
#426030.  If the problem can be reproduced, we can always reopen this
bug record.