Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 235948

Summary:

clvm 2-way mirrored volume with log crashes if one mirror leg and the log is lost

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Mattias Haern <mattias.haern>

Component:

lvm2-cluster

Assignee:

Jonathan Earl Brassow <jbrassow>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Corey Marthaler <cmarthal>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

agk, ccaulfie, dwysocha, jbrassow, mbroz, prockai, rkenna

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

URL:

http://intranet.corp.redhat.com/ic/intranet/ClusterMirrorBeta45.html

Whiteboard:

Fixed In Version:

beta1

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-04-19 18:42:35 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Cluster configuration file	none

Description Mattias Haern 2007-04-10 23:07:30 UTC

Description of problem:
After creating a 2-way clustered LVM2 mirror with log, the volume crashes if
both one mirror leg and the volume mirror log is removed at the same time.

Version-Release number of selected component (if applicable): 4.5 beta

How reproducible:
Every time.

Steps to Reproduce:
1. Install RHEL 4.5 beta
2. Install RHEL 4.5 cluster beta
3. Configure a mirrored clustered LVM2 volume with a log
4. Remove one mirror leg and the log

Actual results:
Volume crashed.

Expected results:
Volume continues to be available, with only one copy.

Additional info:

Test environment
----------------
Infrastructure;
*	2 x IBM xSeries 346 installed with Redhat ES 4U5beta_64
*	EMC SAN with shared disks (2 x Emulex LP10000 HBAs on each server)

Cluster configuration;
*	2 nodes
*	Fencing based on RSA II
*	Cluster service based on following resources;
	o	IP address
	o	Logical volume on shared disk
	o	Mount of LVM based filesystem

Tests with cluster (all tests are done on SAN disk)
---------------------------------------------------
* Convert linear volume to mirror volume with mirror log on disk
  OK.

* Initially create mirror volume with mirror log on disk
  OK.

* Initially create mirror volume with mirror log in memory (corelog)
  OK.

* Force sudden removal of mirror disk with mirror log volume intact
  OK. Volume automatically converted to linear volume. Cluster in unchanged status.

* Force sudden removal of mirror disk and mirror log disk
  Not OK as expected. Volume crash when log disk is removed. Writing to the file
system stopped and corruption occurred.

* Force sudden power off on active node in cluster (log disk), with both sides
of mirror intact
  OK. Mirrored volume is moved to remaining node in cluster.

* Force sudden removal of mirror disk and mirror log disk (corelog)
  Not OK. Volume is online and can be accessed, but status of volume is strange:

	[root@tnscl02cn001 ~]# vgdisplay -v testvg1
    	Loaded external locking library liblvm2clusterlock.so
    	Using volume group(s) on command line
    	Finding volume group "testvg1"
    	Wiping cache of LVM-capable devices
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Volume group "testvg1" not found

We do not understand. First we removed only one part of the mirror , but the
vgdisplay output indicates problems. Still it is possible to write to the file
system. But when the node fails (and the mirror log disappears then because it's
kept in memory of the failing node) the service fail to come up on adaptive
node, because the logical volume is not possible to activate.

* Force sudden removal of mirror disk and mirror log disk (corelog) and
simultaneously force suddenly power off on active node in cluster
  Not OK. Cluster is trying to failover volume, but volume is in strange status
as in previous test, and it is not possible to reactivate.

Comment 1 Mattias Haern 2007-04-10 23:07:30 UTC

Created attachment 152187 [details]
Cluster configuration file

Comment 2 Jonathan Earl Brassow 2007-04-11 15:01:59 UTC

perform lvmdump to gather lvm/device-mapper information.

"* Force sudden removal of mirror disk and mirror log disk
  Not OK as expected. Volume crash when log disk is removed. Writing to the file
system stopped and corruption occurred."

"Volume crash" - what does this mean?  What was printed/logged?
"corruption occurred" - what kind of corruption?  Data corruption?  Metadata corruption?

Comment 3 Mattias Haern 2007-04-19 14:43:40 UTC

New tests with beta1 showed that this no longer occurs.

Comment 4 Jonathan Earl Brassow 2007-04-19 15:13:01 UTC

If your continued testing shows that this is truly fixed, please close bug.

assigned -> modified