Bug 449110 - cmirror down conversion after failure is broken
cmirror down conversion after failure is broken
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror-kernel (Show other bugs)
All Linux
high Severity urgent
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: Regression
Depends On:
  Show dependency treegraph
Reported: 2008-05-30 10:00 EDT by Corey Marthaler
Modified: 2010-01-11 21:12 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-07-24 12:31:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2008-05-30 10:00:54 EDT
Description of problem:
Our cmirror device failure tests failed during our 4.7 regression runs due to
known issues. That caused us to over look the fact that it appears cmirror
device failure has regressed to not work at all. The simplest test case of
failing the primary leg of a fully sync'ed cmirror fails to down convert to a
linear. I've reproduced this now quite a few times.

[root@taft-01 ~]# lvs -a -o +devices
  /dev/sde1: read failed after 0 of 2048 at 0: Input/output error
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Convert Devices   
  LogVol00                         VolGroup00     -wi-ao  58.34G               
  LogVol01                         VolGroup00     -wi-ao   9.75G               
  syncd_primary_2legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_2legs_1_mlog 100.00        
  [syncd_primary_2legs_1_mimage_0] helter_skelter iwi-so 800.00M               
  [syncd_primary_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M               
  [syncd_primary_2legs_1_mlog]     helter_skelter lwi-ao   4.00M               

Version-Release number of selected component (if applicable):
cmirror-1.0.1-1 Build Date: Tue 30 Jan 2007 05:28:02 PM CST
cmirror-kernel-2.6.9-41.3 Build Date: Mon 19 May 2008 02:00:31 PM CDT
Comment 1 Corey Marthaler 2008-05-30 10:51:41 EDT
Single machine mirror device failures work just fine.
Comment 2 Jonathan Earl Brassow 2008-06-10 12:26:56 EDT
What if you kill dmeventd and run 'vgreduce --removemissing <vg>' by hand?  That
would tell us if the problem is in dmeventd.
Comment 3 Jonathan Earl Brassow 2008-06-10 14:51:28 EDT
first try ok.
Comment 4 Jonathan Earl Brassow 2008-06-10 15:01:35 EDT
second time ok...  I think you may be omitting some information on how to reproduce?
Comment 5 Jonathan Earl Brassow 2008-06-11 11:49:14 EDT
Try testing with increased timeout for clvmd.... Seems to work for me.  I set
the command timeout to 600 (instead of 90)
Comment 6 Jonathan Earl Brassow 2008-06-11 17:26:28 EDT
trying to reduce logging in cmirror module to reduce response time... perhaps
bringing it under the clvmd timeout.
Comment 7 Corey Marthaler 2008-06-12 09:36:20 EDT
It appears that this bug has mysterious been fixed with the latest rpms:

The clvmd locking timeout made no difference when I downgraded. I reproduced
this everytime regardless. Also, when I upgraded, I could no longer reproduce
this, even with the locking time out set to the default.

Marking this verified.
Comment 9 Chris Feist 2008-07-24 12:31:14 EDT
Closing this bug as it has been released in 4.7.

Note You need to log in before you can comment on or make changes to this bug.