Bug 504699 - failed mirror leg behavior breaks QA regression tests
failed mirror leg behavior breaks QA regression tests
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2 (Show other bugs)
5.4
All Linux
high Severity high
: beta
: ---
Assigned To: Milan Broz
Cluster QE
: Regression, Reopened
Depends On: 505141 505143
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-08 16:55 EDT by Corey Marthaler
Modified: 2013-02-28 23:07 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:56:52 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log file from lvconvert (84.65 KB, text/plain)
2009-06-09 15:37 EDT, Corey Marthaler
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1393 normal SHIPPED_LIVE lvm2 bug-fix and enhancement update 2009-09-01 08:00:22 EDT

  None (edit)
Description Corey Marthaler 2009-06-08 16:55:16 EDT
Description of problem:
Last I had heard, this functionality wasn't going into 5.4. This is a pretty major change this late in the 5.4 cycle. 

Due to the changes made for bugs 249478 and 186437 (next time I'll read more carefully before qa'acking an lvm bug), mirror behavior after a failed leg has completely changed. This results in broken QA tests.

I need to know the proper/supported way to resolve mirror issues. The options given do not appear to work (vgreduce --removemissing or lvconvert --repair).

Here's what I see after a leg of a mirror is failed and a write is attempted:

[root@taft-03 ~]# lvs -a -o +devices
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629080064: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629137408: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669664768: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  LV                               VG             Attr   LSize   Origin Snap%  Move Log                        Copy%  Convert Devices                                                            
  LogVol00                         VolGroup00     -wi-ao  58.38G                                                              /dev/sda2(0)                                                       
  LogVol01                         VolGroup00     -wi-ao   9.75G                                                              /dev/sda2(1868)                                                    
  syncd_primary_3legs_1            helter_skelter mwi-ao 600.00M                    syncd_primary_3legs_1_mlog 100.00         syncd_primary_3legs_1_mimage_1(0),syncd_primary_3legs_1_mimage_2(0)
  syncd_primary_3legs_1_mimage_0   helter_skelter -wi-a- 600.00M                                                              unknown device(0)                                                  
  [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                                              /dev/sdd1(0)                                                       
  [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 600.00M                                                              /dev/sdg1(0)                                                       
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M                                                              /dev/sde1(0)


[root@taft-03 ~]# lvconvert --repair /dev/helter_skelter/syncd_primary_3legs_1 
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629080064: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629137408: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669664768: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  The mirror is consistent, nothing to repair.


[root@taft-03 ~]# vgreduce --removemissing helter_skelter
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629080064: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629137408: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669664768: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.

  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  WARNING: Partial LV syncd_primary_3legs_1_mimage_0 needs to be repaired or removed. 
  WARNING: There are still partial LVs in VG helter_skelter.
  To remove them unconditionally use: vgreduce --removemissing --force.
  Proceeding to remove empty missing PVs.

# The force is the only thing that seems to work. Is that what we now require?

[root@taft-03 ~]# vgreduce --removemissing --force helter_skelter
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629080064: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 629137408: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 145669664768: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  /dev/mapper/helter_skelter-syncd_primary_3legs_1_mimage_0: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'OqaZww-qzpc-m6mq-okT2-O690-sQeH-Wamfg2'.
  Wrote out consistent volume group helter_skelter
[root@taft-03 ~]# lvs -a -o +devices
  /dev/sdf1: open failed: No such device or address
  LV                               VG             Attr   LSize   Origin Snap%  Move Log                        Copy%  Convert Devices                                                            
  LogVol00                         VolGroup00     -wi-ao  58.38G                                                              /dev/sda2(0)                                                       
  LogVol01                         VolGroup00     -wi-ao   9.75G                                                              /dev/sda2(1868)                                                    
  syncd_primary_3legs_1            helter_skelter mwi-ao 600.00M                    syncd_primary_3legs_1_mlog 100.00         syncd_primary_3legs_1_mimage_1(0),syncd_primary_3legs_1_mimage_2(0)
  [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                                              /dev/sdd1(0)                                                       
  [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 600.00M                                                              /dev/sdg1(0)                                                       
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M                                                              /dev/sde1(0)


Where is this going to be documented?


Version-Release number of selected component if applicable):
lvm2-cluster-2.02.46-5.el5
lvm2-2.02.46-5.el5
Comment 1 Milan Broz 2009-06-08 17:23:16 EDT
Note that dmeventd calls "lvconvert --repair --use-policies ... "
(see lvconvert man page for exact description, use policies is important here)

But there is something strange (why is failed mirror image visible in lvs output?)
Comment 2 Petr Rockai 2009-06-09 01:30:24 EDT
Interesting. Well, the bug here is that even though the mirror is repaired, syncd_primary_3legs_1_mimage_0 is left behind as a visible broken LV. Issuing lvremove helter_skelter/syncd_primary_3legs_1_mimage_0 and then vgreduce --removemissing helter_skelter (without --force) should resolve the problem.

Your lvconvert --repair says the mirror is already consistent, and that's true: the mirror is all right: vgreduce --removemissing (without --force) tells you that there's a broken LV in the volume group (so that it can't, without --force, remove the PV used by the broken LV).

It's hard to say anything further about the issue now, I would need more data. If the syncd_primary_3legs_mimage_0 comes from automatic dmeventd-driven mirror repair, that's a bug than should be fixed. Other than that, I don't see any pathological behaviour in the log.

Just for the record (I'll write a more complete version for release notes), the behaviour is now this:

1) PVs are never removed from broken VGs automatically, you need manual vgreduce --removemissing [--force] for that
2) when mirrors become broken, you can use lvconvert --repair to manually fix them
3) when mirrors become broken and they are monitored, dmeventd runs lvconvert --repair --use-policies on the bad mirror, which will remove or replace missing bits of the mirror according to the lvm.conf policies activation/{mirror_device_fault_policy,mirror_log_fault_policy}
4) It is always safe to run vgreduce --removemissing and it will never touch any LVs. After you have repaired or removed all broken LVs, issuing vgremove --removemissing will put the VG back into consistent state, removing all broken (and now unused) PVs.
5) The catch-all do-all "bring this VG into consistent state no matter the cost" command is now vgreduce --removemissing --force. This will remove any non-mirror LV that uses the PV unconditionally. Use with care.
Comment 3 Corey Marthaler 2009-06-09 13:10:28 EDT
Just a note that when running failure tests with single machine mirrors, I don't see the issues above after the failure, like I do with cluster mirrors. However after re-enabling the failed PV, I see the following errors, I assume because I didn't do anything manually after the failure occurred (beside attempt a write to the mirror). 

[root@taft-03 ~]# lvs -a -o +devices
  Volume group "helter_skelter" inconsistent
  WARNING: Inconsistent metadata found for VG helter_skelter - updating to use version 8
  Cannot update volume group helter_skelter while physical volumes are missing.
  Automatic metadata correction failed
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert Devices        
  LogVol00 VolGroup00 -wi-ao 58.38G                                       /dev/sda2(0)   
  LogVol01 VolGroup00 -wi-ao  9.75G                                       /dev/sda2(1868)
  Internal error: Volume Group helter_skelter was not unlocked
  Device '/dev/sdd1' has been left open.
  Device '/dev/sde1' has been left open.
  Device '/dev/sdg1' has been left open.
  Device '/dev/sdf1' has been left open.
  Device '/dev/sdh1' has been left open.
  Device '/dev/sdg1' has been left open.
  Device '/dev/sdd1' has been left open.
  Device '/dev/sde1' has been left open.
  Device '/dev/sdf1' has been left open.
  Device '/dev/sdh1' has been left open.
  Device '/dev/sde1' has been left open.
  Device '/dev/sda2' has been left open.
  Device '/dev/sdd1' has been left open.
  Device '/dev/sda2' has been left open.
  Device '/dev/sdh1' has been left open.
  Device '/dev/sdf1' has been left open.
  Device '/dev/sdg1' has been left open.
Comment 4 Corey Marthaler 2009-06-09 15:37:27 EDT
Created attachment 347078 [details]
log file from lvconvert

Ran the following directly after the primary leg on the cluster mirror was failed: 'lvconvert -vvvv -y --repair /dev/helter_skelter/syncd_primary_2legs_1'
Comment 8 Corey Marthaler 2009-06-10 12:39:34 EDT
The fact that there was an actual bug wrt to cluster mirrors is a different issue than what this bug was originally opened for and qa'acked. A different bug should have been opened and flags marked accordingly. 

My original qa ack flag was for the better documentation and understanding of the impact of the changes that were made, not to just keep it and agree to test an upcoming cluster fix.
Comment 9 RHEL Product and Program Management 2009-06-10 12:48:02 EDT
Quality Engineering Management has reviewed and declined this request.  You may
appeal this decision by reopening this request.
Comment 10 Corey Marthaler 2009-06-10 12:58:40 EDT
There is still the issue of comment #3 which has to do with single machine
mirrors.

After a failed leg/PV is brought back, it can't be added back into the VG and the LV can no longer be seen.

From test output, after failed device is turned back on:

Enabling device sdg on taft-01

Recreating PVs /dev/sdg1
  WARNING: Volume group helter_skelter is not consistent
  Can't initialize physical volume "/dev/sdg1" of volume group "helter_skelter"
without -ff
recreation of /dev/sdg1 failed

[root@taft-01 ~]# lvs -a -o +devices
  Volume group "helter_skelter" inconsistent
  WARNING: Inconsistent metadata found for VG helter_skelter - updating to use
version 8
  Cannot update volume group helter_skelter while physical volumes are missing.
  Automatic metadata correction failed
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert
Devices        
  LogVol00 VolGroup00 -wi-ao 58.38G                                      
/dev/sda2(0)   
  LogVol01 VolGroup00 -wi-ao  9.75G                                      
/dev/sda2(1868)
  Internal error: Volume Group helter_skelter was not unlocked
  Device '/dev/sdd1' has been left open.
  Device '/dev/sde1' has been left open.
  Device '/dev/sdg1' has been left open.
  Device '/dev/sdf1' has been left open.
  Device '/dev/sdh1' has been left open.
  Device '/dev/sdg1' has been left open.
  Device '/dev/sdd1' has been left open.
  Device '/dev/sde1' has been left open.
  Device '/dev/sdf1' has been left open.
  Device '/dev/sdh1' has been left open.
  Device '/dev/sde1' has been left open.
  Device '/dev/sda2' has been left open.
  Device '/dev/sdd1' has been left open.
  Device '/dev/sda2' has been left open.
  Device '/dev/sdh1' has been left open.
  Device '/dev/sdf1' has been left open.
  Device '/dev/sdg1' has been left open.
Comment 25 Milan Broz 2009-06-18 07:12:29 EDT
Fix/Revert in lvm2-2.02.46-8.el5.
Comment 27 Corey Marthaler 2009-06-18 14:21:00 EDT
Reverted changes verified in lvm2-2.02.46-8.el5. Mirror device failure tests passed with the only issue being bug 491340.
Comment 29 errata-xmlrpc 2009-09-02 07:56:52 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1393.html

Note You need to log in before you can comment on or make changes to this bug.