Bug 506183

Summary: unable to recreate failed log device after successful convert to core log
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Petr Rockai <prockai>
Status: CLOSED DEFERRED QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.4CC: agk, dwysocha, edamato, heinzm, jbrassow, mbroz, prockai
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-10 17:06:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
archive 1
none
archive 2
none
archive 3 none

Description Corey Marthaler 2009-06-15 22:18:51 UTC
Description of problem:
================================================================================
Iteration 0.1 started at Mon Jun 15 17:11:20 CDT 2009                           
================================================================================
Scenario: Kill disk log of synced 2 leg mirror(s)                               

****** Mirror hash info for this scenario ******
* name:         syncd_log_2legs                 
* sync:         1                               
* num mirrors:  1                               
* disklog:      /dev/sdh1                       
* failpv(s):    /dev/sdh1                       
* leg devices:  /dev/sdf1 /dev/sde1             
************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate -m 1 -n syncd_log_2legs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sde1:0-1000 /dev/sdh1:0-150

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=3.08% )
        0/1 mirror(s) are fully synced: ( 1=64.75% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Creating ext on top of mirror(s) on taft-04...
mke2fs 1.39 (29-May-2006)                     
Mounting mirrored ext filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-04 ----                              
checkit starting with:                                 
CREATE                                                 
Num files:          100                                
Random Seed:        17601                              
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1     
Working dir:        /mnt/syncd_log_2legs_1/checkit     

<start name="taft-04_1" pid="16737" time="Mon Jun 15 17:11:52 2009" type="cmd" />
Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure      
Verifying files (checkit) on mirror(s) on...                                     
        ---- taft-04 ----                                                        
checkit starting with:                                                           
VERIFY                                                                           
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1                               
Working dir:        /mnt/syncd_log_2legs_1/checkit                               


Disabling device sdh on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-04
10+0 records in                                             
10+0 records out                                            
41943040 bytes (42 MB) copied, 0.150969 seconds, 278 MB/s   
Verifying the down conversion of the failed mirror(s)       
  /dev/sdh1: open failed: No such device or address         
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)               
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
Verifying LOG device /dev/sdh1 is *NOT* in the linear(s)                  
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
Verifying LEG device /dev/sdf1 *IS* in the volume(s)
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
Verifying LEG device /dev/sde1 *IS* in the volume(s)
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
Verify the dm devices associated with /dev/sdh1 are no longer present
Verify that the mirror image order remains the same after the down conversion
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.
  Couldn't find device with uuid 'am7kbU-mP36-l1zd-r0u7-jGd2-83H7-RKhHmj'.

Verifying files (checkit) on mirror(s) on...
        ---- taft-04 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit

Enabling device sdh on taft-04

Recreating PVs /dev/sdh1
  WARNING: Volume group helter_skelter is not consistent
  Can't initialize physical volume "/dev/sdh1" of volume group "helter_skelter" without -ff
recreation of /dev/sdh1 failed


# It appears that lvm think that device is still apart of the VG.

[root@taft-04 archive]# pvscan
  WARNING: Volume Group helter_skelter is not consistent
  PV /dev/sdh1   VG helter_skelter   lvm2 [135.66 GB / 135.66 GB free]
  PV /dev/sdg1   VG helter_skelter   lvm2 [135.66 GB / 135.66 GB free]
  PV /dev/sdf1   VG helter_skelter   lvm2 [135.66 GB / 135.08 GB free]
  PV /dev/sde1   VG helter_skelter   lvm2 [135.66 GB / 135.08 GB free]
  PV /dev/sdd1   VG helter_skelter   lvm2 [135.66 GB / 135.66 GB free]
  PV /dev/sda2   VG VolGroup00       lvm2 [68.12 GB / 0    free]
  Total: 6 [746.45 GB] / in use: 6 [746.45 GB] / in no VG: 0 [0   ]
[root@taft-04 archive]# lvs -a -o +devices
  Volume group "helter_skelter" inconsistent
  WARNING: Inconsistent metadata found for VG helter_skelter - updating to use version 8
  LV                           VG             Attr   LSize   Origin Snap%  Move Log Copy%  Convert Devices                                                    
  LogVol00                     VolGroup00     -wi-ao  58.38G                                       /dev/sda2(0)                                               
  LogVol01                     VolGroup00     -wi-ao   9.75G                                       /dev/sda2(1868)                                            
  syncd_log_2legs_1            helter_skelter mwi-ao 600.00M                        100.00         syncd_log_2legs_1_mimage_0(0),syncd_log_2legs_1_mimage_1(0)
  [syncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 600.00M                                       /dev/sdf1(0)                                               
  [syncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                       /dev/sde1(0)  

Version-Release number of selected component (if applicable):
lvm2-2.02.46-7.el5
lvm2-cluster-2.02.46-6.el5


How reproducible:
Everytime

Comment 1 Corey Marthaler 2009-06-15 22:19:53 UTC
Created attachment 348021 [details]
archive 1

Comment 2 Corey Marthaler 2009-06-15 22:20:20 UTC
Created attachment 348022 [details]
archive 2

Comment 3 Corey Marthaler 2009-06-15 22:20:49 UTC
Created attachment 348023 [details]
archive 3

Comment 5 Petr Rockai 2009-09-22 12:36:41 UTC
The pvcreate says to use -ff to force overwriting a device with existing LVM metadata on it. If you fail a device, the device is of course not wiped by LVM (it is gone) nor when it returns (doing this would likely lead to data loss in at least some cases). When it comes back, it still has the old metadata on it, so LVM will, by default, refuse to pvcreate.

In this case, however, the new version of the mirror repair patch should also run vgreduce --removemissing (without --force), which would lead to a clean volume group (since there is nothing in the VG besides the partial mirror). I guess we want to track this bug for the new mirror repair code -- we need to double-check that this is really fixed.

Comment 6 Corey Marthaler 2009-10-14 19:23:21 UTC
It seems like the metadata on the devices that stay up are changed to reflect that the failed device is gone, whether or not the metadata on the failed device is ever changed. This test had never failed like that before and hasn't since. I've run this now quite a few times with the latest rpms and haven't seen any issues. This appears to be some kind of fluke failure/bug.  

lvm2-2.02.46-10.el5    BUILT: Fri Sep 18 09:38:06 CDT 2009
lvm2-cluster-2.02.46-10.el5    BUILT: Fri Sep 18 09:39:48 CDT 2009

Comment 7 Petr Rockai 2009-11-10 17:06:35 UTC
Ok. I'l close this as deferred, due to complete lack of reproducibility. If you ever run into that problem again, please let me know.