Bug 723672 - LV appears to be corrupt after resizing a LV when moving package to different cluster node
Summary: LV appears to be corrupt after resizing a LV when moving package to different...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2
Version: 5.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 1049888
TreeView+ depends on / blocked
 
Reported: 2011-07-20 19:20 UTC by Mark McDonald
Modified: 2014-02-05 15:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-05 15:43:38 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Mark McDonald 2011-07-20 19:20:43 UTC
Description of problem:
After resizing a MCSG package LV, the LV appears to get corrupted requiring an fsck when MCSG trys to start a package on another cluster node.

Version-Release number of selected component (if applicable):


How reproducible:
reproducable

Steps to Reproduce:
1. resize a MCSG package LV
2. halt package
3. start package on another cluster node.
  
Actual results:
from the package log file (on the cluster node starting the package):
Jul 16 12:30:32 root@NodeName master_control_script.sh[9313]: ###### Starting package oracle_SID ######
Jul 16 12:30:32 root@NodeName volume_group.sh[9395]: Attempting to addtag to vg vg_oracle_SID...
Jul 16 12:30:33 root@NodeName volume_group.sh[9395]: addtag was successful on vg vg_oracle_SID.
Jul 16 12:30:33 root@NodeName volume_group.sh[9395]: Activating volume group vg_oracle_SID .
Jul 16 12:30:33 root@NodeName filesystem.sh[9471]: Checking filesystems:
   /dev/vg_oracle_SID/lv_1
   /dev/vg_oracle_SID/lv_2
   /dev/vg_oracle_SID/lv_3
   /dev/vg_oracle_SID/lv_4
e2fsck 1.39 (29-May-2006)
/dev/vg_oracle_SID/lv_1: clean, 26/640000 files, 607201/1280000 blocks
e2fsck 1.39 (29-May-2006)
/dev/vg_oracle_SID/lv_2: clean, 19/1281696 files, 1057553/2560000 blocks
e2fsck 1.39 (29-May-2006)
The filesystem size (according to the superblock) is 258048 blocks
The physical size of the device is 129024 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? yes

Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Function sg_check_and_mount
Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Failed to fsck /dev/vg_oracle_SID/lv_3.
e2fsck 1.39 (29-May-2006)
/dev/vg_oracle_SID/lv_4: clean, 159/1537088 files, 1017168/3072000 blocks
Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Function sg_check_and_mount
Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Preceeding fsck call failed
Jul 16 12:30:34 root@NodeName master_control_script.sh[9313]: ##### Failed to start package oracle_SID, ro
llback steps #####
Jul 16 12:30:35 root@NodeName volume_group.sh[9571]: Deactivating volume group vg_oracle_SID
Jul 16 12:30:36 root@NodeName volume_group.sh[9571]: Attempting to deltag to vg vg_oracle_SID...
Jul 16 12:30:36 root@NodeName volume_group.sh[9571]: deltag was successful on vg vg_oracle_SID.


Expected results:
package should start without problem

Additional info:
Two work-arounds:
Halting multipathd on the node before starting the package
running vg

Comment 1 Jonathan Earl Brassow 2014-01-30 00:59:46 UTC
perhaps a bit late to be asking, but can you show the LVM and file system information before the move is attempted?  (e.g. 'pvs', 'vgs', 'lvs', 'df -Th')  I'd like to see how the LVs and file system are laid out.  Then, after the failed move, could you repeat the commands on the other node?  (You should be able to get the LVM information even if you can't get the FS info.)

Comment 2 Mark McDonald 2014-01-30 21:41:18 UTC
Yes, it's a 'little' too late to ask. That cluster was decommissioned and replaced with a RHCS on RHEL 6.3 cluster.  Two and a half years.

Comment 3 Peter Rajnoha 2014-02-05 15:43:38 UTC
(In reply to Mark McDonald from comment #2)
> Yes, it's a 'little' too late to ask. That cluster was decommissioned and
> replaced with a RHCS on RHEL 6.3 cluster.  Two and a half years.

We're really sorry for that, that was an oversight for this bug report.


Note You need to log in before you can comment on or make changes to this bug.