Description of problem: Extended an lvm volume from 200GB to 400GB. When trying to grow the gfs2 volume it did a segment fault. The work around was to mkfs.gfs2 the volume. It is ok for the Oracle testing I am doing but not for production machines. Rebooting cluster did not resolve the issue Version-Release number of selected component (if applicable): How reproducible: Happens any time trying to resize a volume Steps to Reproduce: lvextend -L XX (XX=size) /dev/mapper/msa-archieve mount /archieve 1. lvextend -L XX (XX=size) 2. mount /archieve 3. gfs2_grow /archieve - segment fault Actual results: segment fault: from /var/log/messages Feb 18 09:37:47 et-virt08 kernel: gfs2_grow[9759]: segfault at 0000000c17de5868 rip 000000000040f026 rsp 00007fffc7b559d0 error 4 Feb 18 09:37:55 et-virt08 kernel: gfs2_grow[9963]: segfault at 0000000c16ed3868 rip 000000000040f026 rsp 00007fffad45b2e0 error 4 Feb 18 09:49:27 et-virt08 kernel: gfs2_grow[7776]: segfault at 0000000c02b65868 rip 000000000040f026 rsp 00007fff4793e7c0 error 4 Expected results: Volume should be able to mount with new size (from 200GB to 400GB) Additional info: Workaround was to mkfs.gfs2 the volume (no critical data on the volume)
kernel version : 2.6.18-71.el5 gfs2 utilities : gfs2-utils-0.1.38-1.el5
I was not able to recreate this failure using my latest code of gfs2-kernel and gfs2 userland, which should be close to what's in 5.2. Can I get the exact commands you're using to recreate the failure and the resulting call stack from the failure? I want to know if this only happens under certain conditions, like certain block sizes, RG sizes, number of journals, etc. Also, please check if there are any messages in dmesg. Here is what I did: [root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# lvcreate --name roth_lv -L 200G /dev/roth_vg Logical volume "roth_lv" created [root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# mkfs.gfs2 -O -t bobs_roth:test_gfs -X -p lock_dlm -j 3 /dev/roth_vg/roth_lv Expert mode: on Device: /dev/roth_vg/roth_lv Blocksize: 4096 Device Size 200.00 GB (52428800 blocks) Filesystem Size: 200.00 GB (52428798 blocks) Journals: 3 Resource Groups: 800 Locking Protocol: "lock_dlm" Lock Table: "bobs_roth:test_gfs" [root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# /usr/sbin/lvresize -L +200G /dev/roth_vg/roth_lv Extending logical volume roth_lv to 400.00 GB Logical volume roth_lv successfully resized [root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ../RHEL5/cluster/gfs2/mkfs]# gfs2_grow /mnt/gfs2 FS: Mount Point: /mnt/gfs2 FS: Device: /dev/mapper/roth_vg-roth_lv FS: Size: 52428798 (0x31ffffe) FS: RG size: 65535 (0xffff) DEV: Size: 104857600 (0x6400000) The file system grew by 204800MB. gfs2_grow complete. [root@roth-01 ../RHEL5/cluster/gfs2/mkfs]#
To reproduce this issue lvcreate --name archieve -L 200 /dev/mapper/msa-archieve lvresize -L +200G /dev/mapper/msa-archieve mount -t gfs2 /dev/mapper/msa-agrep rchieve /archive gfs2_grow /archieve /var/log/messages.2:Feb 8 14:02:15 et-virt08 kernel: multipathd[18936]: segfault at 000000000000000a rip 000000356b06fa7d rsp 0000000008772220 error 4 Did you try this with DM Multipath? That is the difference I am seeing between our two experiments
The two differences between what you did and what I did are: (1) You used /dev/mapper devices on your commands, (2) DM Multipath. I tried to recreate this problem using the device mapper device (/dev/mapper/whatever) on the commands and still didn't get it to fail. In other words: This is looking more and more like a DM Multipath problem, especially based on the segfault message posted in comment #3. Perhaps the DM Multipath kernel module died? I'm adding Ben M. to the cc list to get his input. I'd still like to get what appears on the console dmesgs at the point of failure so we can see the complete call stacks.
Tom, Are you still having this problem? Perhaps I can get access to your test system and debug gfs2_grow manually from there. Bob
Bob Since GFS2 was pushed back, I am using the storage on another cluster. Let me see if I have enough space so you can test it... If I have enough, send me your personal email and will let you know the details Tom
I'm waiting to hear back on this, but the test is impacted by the move in Westford. Until then, I'm putting the bug record in NEEDINFO.
This has been in NEEDINFO for long enough that it can't reasonably be considered high priority any more.
This still looks like a possible dm multipath problem to me. I'm assuming this is a duplicate of bug #426030, but I have no way to prove it. This bug has hit the six-month limit in NEEDINFO, so I'm closing it as INSUFFICIENT_DATA. I'm assuming the problem will be pursued in bug #426030. If the problem can be reproduced, we can always reopen this bug record.