Red Hat Bugzilla – Bug 449110
cmirror down conversion after failure is broken
Last modified: 2010-01-11 21:12:17 EST
Description of problem:
Our cmirror device failure tests failed during our 4.7 regression runs due to
known issues. That caused us to over look the fact that it appears cmirror
device failure has regressed to not work at all. The simplest test case of
failing the primary leg of a fully sync'ed cmirror fails to down convert to a
linear. I've reproduced this now quite a few times.
[root@taft-01 ~]# lvs -a -o +devices
/dev/sde1: read failed after 0 of 2048 at 0: Input/output error
LV VG Attr LSize Origin Snap%
Move Log Copy% Convert Devices
LogVol00 VolGroup00 -wi-ao 58.34G
LogVol01 VolGroup00 -wi-ao 9.75G
syncd_primary_2legs_1 helter_skelter mwi-ao 800.00M
[syncd_primary_2legs_1_mimage_0] helter_skelter iwi-so 800.00M
[syncd_primary_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M
[syncd_primary_2legs_1_mlog] helter_skelter lwi-ao 4.00M
Version-Release number of selected component (if applicable):
cmirror-1.0.1-1 Build Date: Tue 30 Jan 2007 05:28:02 PM CST
cmirror-kernel-2.6.9-41.3 Build Date: Mon 19 May 2008 02:00:31 PM CDT
Single machine mirror device failures work just fine.
What if you kill dmeventd and run 'vgreduce --removemissing <vg>' by hand? That
would tell us if the problem is in dmeventd.
first try ok.
second time ok... I think you may be omitting some information on how to reproduce?
Try testing with increased timeout for clvmd.... Seems to work for me. I set
the command timeout to 600 (instead of 90)
trying to reduce logging in cmirror module to reduce response time... perhaps
bringing it under the clvmd timeout.
It appears that this bug has mysterious been fixed with the latest rpms:
The clvmd locking timeout made no difference when I downgraded. I reproduced
this everytime regardless. Also, when I upgraded, I could no longer reproduce
this, even with the locking time out set to the default.
Marking this verified.
Closing this bug as it has been released in 4.7.