Bug 723672

Summary: LV appears to be corrupt after resizing a LV when moving package to different cluster node
Product: Red Hat Enterprise Linux 5 Reporter: Mark McDonald <air3mdm>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.6CC: agk, dwysocha, heinzm, jbrassow, lmiksik, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-05 15:43:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1049888    

Description Mark McDonald 2011-07-20 19:20:43 UTC
Description of problem:
After resizing a MCSG package LV, the LV appears to get corrupted requiring an fsck when MCSG trys to start a package on another cluster node.

Version-Release number of selected component (if applicable):


How reproducible:
reproducable

Steps to Reproduce:
1. resize a MCSG package LV
2. halt package
3. start package on another cluster node.
  
Actual results:
from the package log file (on the cluster node starting the package):
Jul 16 12:30:32 root@NodeName master_control_script.sh[9313]: ###### Starting package oracle_SID ######
Jul 16 12:30:32 root@NodeName volume_group.sh[9395]: Attempting to addtag to vg vg_oracle_SID...
Jul 16 12:30:33 root@NodeName volume_group.sh[9395]: addtag was successful on vg vg_oracle_SID.
Jul 16 12:30:33 root@NodeName volume_group.sh[9395]: Activating volume group vg_oracle_SID .
Jul 16 12:30:33 root@NodeName filesystem.sh[9471]: Checking filesystems:
   /dev/vg_oracle_SID/lv_1
   /dev/vg_oracle_SID/lv_2
   /dev/vg_oracle_SID/lv_3
   /dev/vg_oracle_SID/lv_4
e2fsck 1.39 (29-May-2006)
/dev/vg_oracle_SID/lv_1: clean, 26/640000 files, 607201/1280000 blocks
e2fsck 1.39 (29-May-2006)
/dev/vg_oracle_SID/lv_2: clean, 19/1281696 files, 1057553/2560000 blocks
e2fsck 1.39 (29-May-2006)
The filesystem size (according to the superblock) is 258048 blocks
The physical size of the device is 129024 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? yes

Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Function sg_check_and_mount
Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Failed to fsck /dev/vg_oracle_SID/lv_3.
e2fsck 1.39 (29-May-2006)
/dev/vg_oracle_SID/lv_4: clean, 159/1537088 files, 1017168/3072000 blocks
Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Function sg_check_and_mount
Jul 16 12:30:34 root@NodeName filesystem.sh[9471]: ERROR: Preceeding fsck call failed
Jul 16 12:30:34 root@NodeName master_control_script.sh[9313]: ##### Failed to start package oracle_SID, ro
llback steps #####
Jul 16 12:30:35 root@NodeName volume_group.sh[9571]: Deactivating volume group vg_oracle_SID
Jul 16 12:30:36 root@NodeName volume_group.sh[9571]: Attempting to deltag to vg vg_oracle_SID...
Jul 16 12:30:36 root@NodeName volume_group.sh[9571]: deltag was successful on vg vg_oracle_SID.


Expected results:
package should start without problem

Additional info:
Two work-arounds:
Halting multipathd on the node before starting the package
running vg

Comment 1 Jonathan Earl Brassow 2014-01-30 00:59:46 UTC
perhaps a bit late to be asking, but can you show the LVM and file system information before the move is attempted?  (e.g. 'pvs', 'vgs', 'lvs', 'df -Th')  I'd like to see how the LVs and file system are laid out.  Then, after the failed move, could you repeat the commands on the other node?  (You should be able to get the LVM information even if you can't get the FS info.)

Comment 2 Mark McDonald 2014-01-30 21:41:18 UTC
Yes, it's a 'little' too late to ask. That cluster was decommissioned and replaced with a RHCS on RHEL 6.3 cluster.  Two and a half years.

Comment 3 Peter Rajnoha 2014-02-05 15:43:38 UTC
(In reply to Mark McDonald from comment #2)
> Yes, it's a 'little' too late to ask. That cluster was decommissioned and
> replaced with a RHCS on RHEL 6.3 cluster.  Two and a half years.

We're really sorry for that, that was an oversight for this bug report.