Bug 449110 - cmirror down conversion after failure is broken
Summary: cmirror down conversion after failure is broken
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cmirror-kernel
Version: 4
Hardware: All
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-30 14:00 UTC by Corey Marthaler
Modified: 2010-01-12 02:12 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 16:31:14 UTC
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2008-05-30 14:00:54 UTC
Description of problem:
Our cmirror device failure tests failed during our 4.7 regression runs due to
known issues. That caused us to over look the fact that it appears cmirror
device failure has regressed to not work at all. The simplest test case of
failing the primary leg of a fully sync'ed cmirror fails to down convert to a
linear. I've reproduced this now quite a few times.

[root@taft-01 ~]# lvs -a -o +devices
  /dev/sde1: read failed after 0 of 2048 at 0: Input/output error
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Convert Devices   
  LogVol00                         VolGroup00     -wi-ao  58.34G               
                                              /dev/sda2(0)
  LogVol01                         VolGroup00     -wi-ao   9.75G               
                                              /dev/sda2(1867)
  syncd_primary_2legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_2legs_1_mlog 100.00        
syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
  [syncd_primary_2legs_1_mimage_0] helter_skelter iwi-so 800.00M               
                                                        
  [syncd_primary_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M               
                                              /dev/sdh1(0)
  [syncd_primary_2legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                              /dev/sdg1(0)



Version-Release number of selected component (if applicable):
lvm2-2.02.36-1.el4
lvm2-cluster-2.02.36-1.el4
cmirror-1.0.1-1 Build Date: Tue 30 Jan 2007 05:28:02 PM CST
cmirror-kernel-2.6.9-41.3 Build Date: Mon 19 May 2008 02:00:31 PM CDT

Comment 1 Corey Marthaler 2008-05-30 14:51:41 UTC
Single machine mirror device failures work just fine.

Comment 2 Jonathan Earl Brassow 2008-06-10 16:26:56 UTC
What if you kill dmeventd and run 'vgreduce --removemissing <vg>' by hand?  That
would tell us if the problem is in dmeventd.

Comment 3 Jonathan Earl Brassow 2008-06-10 18:51:28 UTC
first try ok.


Comment 4 Jonathan Earl Brassow 2008-06-10 19:01:35 UTC
second time ok...  I think you may be omitting some information on how to reproduce?


Comment 5 Jonathan Earl Brassow 2008-06-11 15:49:14 UTC
Try testing with increased timeout for clvmd.... Seems to work for me.  I set
the command timeout to 600 (instead of 90)


Comment 6 Jonathan Earl Brassow 2008-06-11 21:26:28 UTC
trying to reduce logging in cmirror module to reduce response time... perhaps
bringing it under the clvmd timeout.


Comment 7 Corey Marthaler 2008-06-12 13:36:20 UTC
It appears that this bug has mysterious been fixed with the latest rpms:
device-mapper-1.02.25-2.el4
lvm2-cluster-2.02.37-2.el4
lvm2-2.02.37-2.el4

The clvmd locking timeout made no difference when I downgraded. I reproduced
this everytime regardless. Also, when I upgraded, I could no longer reproduce
this, even with the locking time out set to the default.

Marking this verified.

Comment 9 Chris Feist 2008-07-24 16:31:14 UTC
Closing this bug as it has been released in 4.7.


Note You need to log in before you can comment on or make changes to this bug.