449110 – cmirror down conversion after failure is broken

Bug 449110 - cmirror down conversion after failure is broken

Summary: cmirror down conversion after failure is broken

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	cmirror-kernel
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-30 14:00 UTC by Corey Marthaler
Modified:	2010-01-12 02:12 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-07-24 16:31:14 UTC
Embargoed:

Attachments	(Terms of Use)

Description Corey Marthaler 2008-05-30 14:00:54 UTC

Description of problem:
Our cmirror device failure tests failed during our 4.7 regression runs due to
known issues. That caused us to over look the fact that it appears cmirror
device failure has regressed to not work at all. The simplest test case of
failing the primary leg of a fully sync'ed cmirror fails to down convert to a
linear. I've reproduced this now quite a few times.

[root@taft-01 ~]# lvs -a -o +devices
  /dev/sde1: read failed after 0 of 2048 at 0: Input/output error
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Convert Devices   
  LogVol00                         VolGroup00     -wi-ao  58.34G               
                                              /dev/sda2(0)
  LogVol01                         VolGroup00     -wi-ao   9.75G               
                                              /dev/sda2(1867)
  syncd_primary_2legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_2legs_1_mlog 100.00        
syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
  [syncd_primary_2legs_1_mimage_0] helter_skelter iwi-so 800.00M               
                                                        
  [syncd_primary_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M               
                                              /dev/sdh1(0)
  [syncd_primary_2legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                              /dev/sdg1(0)



Version-Release number of selected component (if applicable):
lvm2-2.02.36-1.el4
lvm2-cluster-2.02.36-1.el4
cmirror-1.0.1-1 Build Date: Tue 30 Jan 2007 05:28:02 PM CST
cmirror-kernel-2.6.9-41.3 Build Date: Mon 19 May 2008 02:00:31 PM CDT

Comment 1 Corey Marthaler 2008-05-30 14:51:41 UTC

Single machine mirror device failures work just fine.

Comment 2 Jonathan Earl Brassow 2008-06-10 16:26:56 UTC

What if you kill dmeventd and run 'vgreduce --removemissing <vg>' by hand?  That
would tell us if the problem is in dmeventd.

Comment 3 Jonathan Earl Brassow 2008-06-10 18:51:28 UTC

first try ok.

Comment 4 Jonathan Earl Brassow 2008-06-10 19:01:35 UTC

second time ok...  I think you may be omitting some information on how to reproduce?

Comment 5 Jonathan Earl Brassow 2008-06-11 15:49:14 UTC

Try testing with increased timeout for clvmd.... Seems to work for me.  I set
the command timeout to 600 (instead of 90)

Comment 6 Jonathan Earl Brassow 2008-06-11 21:26:28 UTC

trying to reduce logging in cmirror module to reduce response time... perhaps
bringing it under the clvmd timeout.

Comment 7 Corey Marthaler 2008-06-12 13:36:20 UTC

It appears that this bug has mysterious been fixed with the latest rpms:
device-mapper-1.02.25-2.el4
lvm2-cluster-2.02.37-2.el4
lvm2-2.02.37-2.el4

The clvmd locking timeout made no difference when I downgraded. I reproduced
this everytime regardless. Also, when I upgraded, I could no longer reproduce
this, even with the locking time out set to the default.

Marking this verified.

Comment 9 Chris Feist 2008-07-24 16:31:14 UTC

Closing this bug as it has been released in 4.7.

Note You need to log in before you can comment on or make changes to this bug.