Bug 446107

Summary: RHEL5 cmirror tracker: log device not removed properly after log failure
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WORKSFORME QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 5.3CC: agk, ccaulfie, dwysocha, edamato, heinzm, mbroz
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-15 14:11:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2008-05-12 19:14:50 UTC
Description of problem:
I just added a check to helter_skelter (the mirror device failure suite) 
to verify that the dm-devices are actually being removed as well (instead
of just checking that the failed device isn't in the lvm output). So it's
possible that this bz is already filed, but caught earlier in the process.



================================================================================
Iteration 0.12 started at Mon May 12 11:57:26 CDT 2008
================================================================================
Senario: Kill disk log of non synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      nonsyncd_log_2legs
* sync:      0
* mirrors:   1
* disklog:   1
* failpv:    /dev/sdg1
* legs:      2
* pvs:       /dev/sdh1 /dev/sde1 /dev/sdg1
************************************************

Creating mirror(s) on taft-02...
taft-02: lvcreate -m 1 -n nonsyncd_log_2legs_1 -L 800M helter_skelter
/dev/sdh1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-150

Continuing on without fully syncd mirrors, currently at...
        ( 1=1.00% )

Creating gfs on top of mirror(s) on taft-02...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        12484
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_2legs_1
Working dir:        /mnt/nonsyncd_log_2legs_1/checkit

        ---- taft-02 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        11784
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_2legs_1
Working dir:        /mnt/nonsyncd_log_2legs_1/checkit

        ---- taft-03 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        12308
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_2legs_1
Working dir:        /mnt/nonsyncd_log_2legs_1/checkit

        ---- taft-04 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        12676
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_2legs_1
Working dir:        /mnt/nonsyncd_log_2legs_1/checkit


<start name="taft-01_1" pid="706" time="Mon May 12 11:58:02 2008" type="cmd" />
<start name="taft-02_1" pid="708" time="Mon May 12 11:58:02 2008" type="cmd" />
<start name="taft-03_1" pid="710" time="Mon May 12 11:58:02 2008" type="cmd" />
<start name="taft-04_1" pid="712" time="Mon May 12 11:58:02 2008" type="cmd" />
Sleeping 12 seconds to get some outsanding I/O locks before the failure

Disabling device sdg on taft-01
Disabling device sdg on taft-02
Disabling device sdg on taft-03
Disabling device sdg on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.093717 seconds, 448 MB/s
Verifying the down conversion of the failed mirror(s)
  /dev/sdg1: open failed: No such device or address
Verifying LEG device /dev/sdh1 *IS* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying LEG device /dev/sde1 *IS* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verify the dm devices associated with /dev/sdg1 are no longer present
nonsyncd_log_2legs_1_mlog on taft-03 should no longer be there


The log device should be gone by this stage, but it's not on all nodes.

TAFT-01:
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 800.00M                
       100.00        
nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 800.00M                
                      /dev/sdh1(0)                   
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M                
                      /dev/sde1(0)                   
  nonsyncd_log_2legs_1_mlog       helter_skelter vwi---   4.00M   

helter_skelter-nonsyncd_log_2legs_1     (253, 5)
helter_skelter-nonsyncd_log_2legs_1_mimage_1    (253, 4)
helter_skelter-nonsyncd_log_2legs_1_mimage_0    (253, 3)


TAFT-02:
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 800.00M                
       100.00        
nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 800.00M                
                      /dev/sdh1(0)                   
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M                
                      /dev/sde1(0)                   
  nonsyncd_log_2legs_1_mlog       helter_skelter vwi---   4.00M                
                                                     

helter_skelter-nonsyncd_log_2legs_1     (253, 5)
helter_skelter-nonsyncd_log_2legs_1_mimage_1    (253, 4)
helter_skelter-nonsyncd_log_2legs_1_mimage_0    (253, 3)


TAFT-03:
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 800.00M                
       100.00        
nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 800.00M                
                      /dev/sdh1(0)                   
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M                
                      /dev/sde1(0)                   
  nonsyncd_log_2legs_1_mlog       helter_skelter vwi-a-   4.00M     

helter_skelter-nonsyncd_log_2legs_1_mlog        (253, 2)
helter_skelter-nonsyncd_log_2legs_1     (253, 5)
helter_skelter-nonsyncd_log_2legs_1_mimage_1    (253, 4)
helter_skelter-nonsyncd_log_2legs_1_mimage_0    (253, 3)

TAFT-04:
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 800.00M                
       100.00        
nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 800.00M                
                      /dev/sdh1(0)                   
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M                
                      /dev/sde1(0)                   
  nonsyncd_log_2legs_1_mlog       helter_skelter vwi-a-   4.00M   

helter_skelter-nonsyncd_log_2legs_1_mlog        (253, 2)
helter_skelter-nonsyncd_log_2legs_1     (253, 5)
helter_skelter-nonsyncd_log_2legs_1_mimage_1    (253, 4)
helter_skelter-nonsyncd_log_2legs_1_mimage_0    (253, 3)



Version-Release number of selected component (if applicable):
[root@taft-04 ~]# rpm -qi lvm2
Name        : lvm2                         Relocations: (not relocatable)
Version     : 2.02.32                           Vendor: Red Hat, Inc.
Release     : 4.el5                         Build Date: Fri 04 Apr 2008 06:15:19
AM CDT
Install Date: Fri 02 May 2008 03:43:27 PM CDT      Build Host:
hs20-bc2-3.build.redhat.com

[root@taft-04 ~]# rpm -qi lvm2-cluster
Name        : lvm2-cluster                 Relocations: (not relocatable)
Version     : 2.02.32                           Vendor: Red Hat, Inc.
Release     : 4.el5                         Build Date: Wed 02 Apr 2008 03:56:50
AM CDT
Install Date: Fri 02 May 2008 03:45:12 PM CDT      Build Host:
hs20-bc2-3.build.redhat.com
Extensions to LVM2 to support clusters.

[root@taft-04 ~]# rpm -qi cmirror
Name        : cmirror                      Relocations: (not relocatable)
Version     : 1.1.17                            Vendor: Red Hat, Inc.
Release     : 1.el5                         Build Date: Fri 09 May 2008 11:33:43
AM CDT
Install Date: Fri 09 May 2008 03:57:13 PM CDT      Build Host:
hs20-bc1-7.build.redhat.com

[root@taft-04 ~]# rpm -qi kmod-cmirror
Name        : kmod-cmirror                 Relocations: (not relocatable)
Version     : 0.1.9                             Vendor: Red Hat, Inc.
Release     : 1.el5                         Build Date: Thu 08 May 2008 02:28:27
PM CDT
Install Date: Thu 08 May 2008 05:10:39 PM CDT      Build Host:
hs20-bc2-4.build.redhat.com

[root@taft-04 ~]# rpm -qi device-mapper
Name        : device-mapper                Relocations: (not relocatable)
Version     : 1.02.24                           Vendor: Red Hat, Inc.
Release     : 1.el5                         Build Date: Thu 17 Jan 2008 04:46:05
PM CST
Install Date: Fri 02 May 2008 03:42:58 PM CDT      Build Host:
hs20-bc2-3.build.redhat.com


How reproducible:
I believe I've seen this before

Comment 1 Corey Marthaler 2008-05-12 19:27:12 UTC
[root@taft-01 ~]# dmsetup ls --tree
helter_skelter-nonsyncd_log_2legs_1 (253:5)
 ├─helter_skelter-nonsyncd_log_2legs_1_mimage_1 (253:4)
 │  └─ (8:65)
 └─helter_skelter-nonsyncd_log_2legs_1_mimage_0 (253:3)
    └─ (8:113)
[root@taft-01 ~]# dmsetup status
helter_skelter-nonsyncd_log_2legs_1: 0 1638400 mirror 2 253:3 253:4 1600/1600 1
AA 1 clustered_core
helter_skelter-nonsyncd_log_2legs_1_mimage_1: 0 1638400 linear
helter_skelter-nonsyncd_log_2legs_1_mimage_0: 0 1638400 linear


[root@taft-02 ~]# dmsetup ls --tree
helter_skelter-nonsyncd_log_2legs_1 (253:5)
 ├─helter_skelter-nonsyncd_log_2legs_1_mimage_1 (253:4)
 │  └─ (8:65)
 └─helter_skelter-nonsyncd_log_2legs_1_mimage_0 (253:3)
    └─ (8:113)
[root@taft-02 ~]# dmsetup status
helter_skelter-nonsyncd_log_2legs_1: 0 1638400 mirror 2 253:3 253:4 1600/1600 1
AA 1 clustered_core
helter_skelter-nonsyncd_log_2legs_1_mimage_1: 0 1638400 linear
helter_skelter-nonsyncd_log_2legs_1_mimage_0: 0 1638400 linear


[root@taft-03 ~]# dmsetup ls --tree
helter_skelter-nonsyncd_log_2legs_1_mlog (253:2)
 └─ (8:97)
helter_skelter-nonsyncd_log_2legs_1 (253:5)
 ├─helter_skelter-nonsyncd_log_2legs_1_mimage_1 (253:4)
 │  └─ (8:65)
 └─helter_skelter-nonsyncd_log_2legs_1_mimage_0 (253:3)
    └─ (8:113)
[root@taft-03 ~]# dmsetup status
helter_skelter-nonsyncd_log_2legs_1_mlog: 0 8192 linear
helter_skelter-nonsyncd_log_2legs_1: 0 1638400 mirror 2 253:3 253:4 1600/1600 1
AA 1 clustered_core
helter_skelter-nonsyncd_log_2legs_1_mimage_1: 0 1638400 linear
helter_skelter-nonsyncd_log_2legs_1_mimage_0: 0 1638400 linear


[root@taft-04 ~]# dmsetup ls --tree
helter_skelter-nonsyncd_log_2legs_1_mlog (253:2)
 └─ (8:97)
helter_skelter-nonsyncd_log_2legs_1 (253:5)
 ├─helter_skelter-nonsyncd_log_2legs_1_mimage_1 (253:4)
 │  └─ (8:65)
 └─helter_skelter-nonsyncd_log_2legs_1_mimage_0 (253:3)
    └─ (8:113)
[root@taft-04 ~]# dmsetup status
helter_skelter-nonsyncd_log_2legs_1_mlog: 0 8192 linear
helter_skelter-nonsyncd_log_2legs_1: 0 1638400 mirror 2 253:3 253:4 1600/1600 1
AA 1 clustered_core
helter_skelter-nonsyncd_log_2legs_1_mimage_1: 0 1638400 linear
helter_skelter-nonsyncd_log_2legs_1_mimage_0: 0 1638400 linear


Comment 2 Corey Marthaler 2008-05-12 19:30:09 UTC
Looks like lvm knows that it failed to remove the device:

May 12 11:58:20 taft-01 lvm[7333]: WARNING: dev_open(/dev/sdh1) called while
suspended
May 12 11:58:20 taft-01 clogd[6720]: [hbeCYldT] rw_log:  write failure:
Input/output error
May 12 11:58:20 taft-01 clogd[6720]: [hbeCYldT] Error writing to disk log
May 12 11:58:20 taft-01 clogd[6720]: [hbeCYldT] rw_log:  write failure:
Input/output error
May 12 11:58:20 taft-01 clogd[6720]: [hbeCYldT] Error writing to disk log
May 12 11:58:20 taft-01 kernel: device-mapper: dm-log-clustered: [hbeCYldT]
Server error while processing request [DM_CLOG_FLUSH]: -5
May 12 11:58:20 taft-01 lvm[7333]: Monitoring mirror device
helter_skelter-nonsyncd_log_2legs_1 for events
May 12 11:58:20 taft-01 lvm[7333]: Another thread is handling an event.  Waiting...
May 12 11:58:20 taft-01 lvm[7333]: WARNING: dev_open(/dev/sde1) called while
suspended
May 12 11:58:20 taft-01 lvm[7333]: WARNING: dev_open(/dev/sdf1) called while
suspended
May 12 11:58:20 taft-01 lvm[7333]: WARNING: dev_open(/dev/sdh1) called while
suspended
May 12 11:58:20 taft-01 lvm[7333]: WARNING: dev_open(/etc/lvm/lvm.conf) called
while suspended
May 12 11:58:20 taft-01 lvm[7333]: Failed to remove faulty devices in
helter_skelter-nonsyncd_log_2legs_1
May 12 11:58:22 taft-01 lvm[7333]: No longer monitoring mirror device
helter_skelter-nonsyncd_log_2legs_1 for events


Comment 3 RHEL Program Management 2008-07-14 15:01:11 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Corey Marthaler 2008-07-23 14:07:19 UTC
I reproduced this last night.

Senario: Kill disk log of synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_log_2legs
* sync:      1
* mirrors:   1
* disklog:   1
* failpv:    /dev/sde1
* legs:      2
* pvs:       /dev/sdg1 /dev/sdf1 /dev/sde1
************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate -m 1 -n syncd_log_2legs_1 -L 800M helter_skelter
/dev/sdg1:0-1000 /dev/sdf1:0-1000 /dev/sde1:0-150

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=0.50% )
        0/1 mirror(s) are fully synced: ( 1=31.00% )
        0/1 mirror(s) are fully synced: ( 1=61.50% )
        0/1 mirror(s) are fully synced: ( 1=92.00% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Creating gfs on top of mirror(s) on taft-04...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...

Sleeping 12 seconds to get some outsanding I/O locks before the failure

Verifying files (checkit) on mirror(s) on...

Disabling device sde on taft-01
Disabling device sde on taft-02
Disabling device sde on taft-03
Disabling device sde on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-04
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.108968 seconds, 385 MB/s
Verifying the down conversion of the failed mirror(s)
  /dev/sde1: open failed: No such device or address
Verifying LEG device /dev/sdg1 *IS* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying LEG device /dev/sdf1 *IS* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verify the dm devices associated with /dev/sde1 are no longer present
syncd_log_2legs_1_mlog on taft-03 should no longer be there


[root@taft-04 tmp]# lvs -a -o +devices
  /dev/sde1: open failed: No such device or address
  LV                           VG             Attr   LSize   Origin Snap%  Move
Log Copy%  Convert Devices                              
  LogVol00                     VolGroup00     -wi-ao  66.19G                   
                   /dev/sda2(0)                         
  LogVol01                     VolGroup00     -wi-ao   1.94G                   
                   /dev/sda2(2118)                      
  syncd_log_2legs_1            helter_skelter mwi-ao 800.00M                   
    100.00         syncd_log_2legs_1_mimage_0(0),syncd_log_2legs_1_mimage_1(0)
  [syncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 800.00M                   
                   /dev/sdg1(0)                         
  [syncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M                   
                   /dev/sdf1(0)                         
  syncd_log_2legs_1_mlog       helter_skelter vwi---   4.00M                   
                                                        
[root@taft-04 tmp]# dmsetup ls
helter_skelter-syncd_log_2legs_1_mimage_1       (253, 4)
helter_skelter-syncd_log_2legs_1_mimage_0       (253, 3)
VolGroup00-LogVol01     (253, 1)
helter_skelter-syncd_log_2legs_1        (253, 5)
VolGroup00-LogVol00     (253, 0)


[root@taft-03 tmp]# lvs -a -o +devices
  /dev/sde1: read failed after 0 of 2048 at 0: Input/output error
  LV                           VG             Attr   LSize   Origin Snap%  Move
Log Copy%  Convert Devices                              
  LogVol00                     VolGroup00     -wi-ao  66.19G                   
                   /dev/sda2(0)                         
  LogVol01                     VolGroup00     -wi-ao   1.94G                   
                   /dev/sda2(2118)                      
  syncd_log_2legs_1            helter_skelter mwi-ao 800.00M                   
    100.00         syncd_log_2legs_1_mimage_0(0),syncd_log_2legs_1_mimage_1(0)
  [syncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 800.00M                   
                   /dev/sdg1(0)                         
  [syncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 800.00M                   
                   /dev/sdf1(0)                         
  syncd_log_2legs_1_mlog       helter_skelter vwi-a-   4.00M                   
                                                        
[root@taft-03 tmp]# dmsetup ls
helter_skelter-syncd_log_2legs_1_mlog   (253, 2)
helter_skelter-syncd_log_2legs_1_mimage_1       (253, 4)
helter_skelter-syncd_log_2legs_1_mimage_0       (253, 3)
VolGroup00-LogVol01     (253, 1)
helter_skelter-syncd_log_2legs_1        (253, 5)
VolGroup00-LogVol00     (253, 0)



Comment 5 Corey Marthaler 2008-08-20 19:23:56 UTC
Having trouble reproducing this issue with the latest build...

Comment 6 Corey Marthaler 2008-09-02 18:54:47 UTC
Taking this off the list 5.3 list until it can be more reliably reproduced.

Comment 7 Corey Marthaler 2008-10-15 14:11:05 UTC
The last time this was seen was 3 months ago. Closing for now and will reopen if seen again.