Bug 235948 - clvm 2-way mirrored volume with log crashes if one mirror leg and the log is lost
clvm 2-way mirrored volume with log crashes if one mirror leg and the log is ...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: lvm2-cluster (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jonathan Earl Brassow
Corey Marthaler
http://intranet.corp.redhat.com/ic/in...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-10 19:07 EDT by Mattias Haern
Modified: 2010-01-11 23:06 EST (History)
7 users (show)

See Also:
Fixed In Version: beta1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-04-19 14:42:35 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Cluster configuration file (2.18 KB, application/octet-stream)
2007-04-10 19:07 EDT, Mattias Haern
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 233708 None None None Never

  None (edit)
Description Mattias Haern 2007-04-10 19:07:30 EDT
Description of problem:
After creating a 2-way clustered LVM2 mirror with log, the volume crashes if
both one mirror leg and the volume mirror log is removed at the same time.

Version-Release number of selected component (if applicable): 4.5 beta

How reproducible:
Every time.

Steps to Reproduce:
1. Install RHEL 4.5 beta
2. Install RHEL 4.5 cluster beta
3. Configure a mirrored clustered LVM2 volume with a log
4. Remove one mirror leg and the log

Actual results:
Volume crashed.

Expected results:
Volume continues to be available, with only one copy.

Additional info:

Test environment
----------------
Infrastructure;
*	2 x IBM xSeries 346 installed with Redhat ES 4U5beta_64
*	EMC SAN with shared disks (2 x Emulex LP10000 HBAs on each server)

Cluster configuration;
*	2 nodes
*	Fencing based on RSA II
*	Cluster service based on following resources;
	o	IP address
	o	Logical volume on shared disk
	o	Mount of LVM based filesystem

Tests with cluster (all tests are done on SAN disk)
---------------------------------------------------
* Convert linear volume to mirror volume with mirror log on disk
  OK.

* Initially create mirror volume with mirror log on disk
  OK.

* Initially create mirror volume with mirror log in memory (corelog)
  OK.

* Force sudden removal of mirror disk with mirror log volume intact
  OK. Volume automatically converted to linear volume. Cluster in unchanged status.

* Force sudden removal of mirror disk and mirror log disk
  Not OK as expected. Volume crash when log disk is removed. Writing to the file
system stopped and corruption occurred.

* Force sudden power off on active node in cluster (log disk), with both sides
of mirror intact
  OK. Mirrored volume is moved to remaining node in cluster.

* Force sudden removal of mirror disk and mirror log disk (corelog)
  Not OK. Volume is online and can be accessed, but status of volume is strange:

	[root@tnscl02cn001 ~]# vgdisplay -v testvg1
    	Loaded external locking library liblvm2clusterlock.so
    	Using volume group(s) on command line
    	Finding volume group "testvg1"
    	Wiping cache of LVM-capable devices
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Volume group "testvg1" not found

We do not understand. First we removed only one part of the mirror , but the
vgdisplay output indicates problems. Still it is possible to write to the file
system. But when the node fails (and the mirror log disappears then because it's
kept in memory of the failing node) the service fail to come up on adaptive
node, because the logical volume is not possible to activate.

* Force sudden removal of mirror disk and mirror log disk (corelog) and
simultaneously force suddenly power off on active node in cluster
  Not OK. Cluster is trying to failover volume, but volume is in strange status
as in previous test, and it is not possible to reactivate.
Comment 1 Mattias Haern 2007-04-10 19:07:30 EDT
Created attachment 152187 [details]
Cluster configuration file
Comment 2 Jonathan Earl Brassow 2007-04-11 11:01:59 EDT
perform lvmdump to gather lvm/device-mapper information.

"* Force sudden removal of mirror disk and mirror log disk
  Not OK as expected. Volume crash when log disk is removed. Writing to the file
system stopped and corruption occurred."

"Volume crash" - what does this mean?  What was printed/logged?
"corruption occurred" - what kind of corruption?  Data corruption?  Metadata corruption?
Comment 3 Mattias Haern 2007-04-19 10:43:40 EDT
New tests with beta1 showed that this no longer occurs.
Comment 4 Jonathan Earl Brassow 2007-04-19 11:13:01 EDT
If your continued testing shows that this is truly fixed, please close bug.

assigned -> modified

Note You need to log in before you can comment on or make changes to this bug.