RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 596318 - Basic cmirror device failure (with I/O running) is broken
Summary: Basic cmirror device failure (with I/O running) is broken
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.0
Hardware: All
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 599016
TreeView+ depends on / blocked
 
Reported: 2010-05-26 15:28 UTC by Corey Marthaler
Modified: 2010-11-10 21:07 UTC (History)
11 users (show)

Fixed In Version: lvm2-2.02.72-6.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-10 21:07:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log from taft-01 (21.04 KB, text/plain)
2010-05-26 15:37 UTC, Corey Marthaler
no flags Details
log from taft-02 (59.13 KB, text/plain)
2010-05-26 15:37 UTC, Corey Marthaler
no flags Details
log from taft-03 (19.53 KB, text/plain)
2010-05-26 15:38 UTC, Corey Marthaler
no flags Details
log from taft-04 (49.07 KB, text/plain)
2010-05-26 15:39 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2010-05-26 15:28:01 UTC
Description of problem:
In the following case, the secondary leg should have been removed. That remove failed and resulted in a corrupted mirror. I'll gather more info, like whether or not this also occurs on local LVM as well.

Scenario: Kill secondary leg of synced core log 2 leg mirror(s)                                                                                         

********* Mirror hash info for this scenario *********
* names:              syncd_secondary_core_2legs_1    
* sync:               1                               
* disklog:            0                               
* failpv(s):          /dev/sdg1                       
* failnode(s):        taft-01 taft-02 taft-03 taft-04 
* leg devices:        /dev/sdc1 /dev/sdg1             
* leg fault policy:   remove                          
* log fault policy:   allocate                        
******************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate --corelog -m 1 -n syncd_secondary_core_2legs_1 -L 600M helter_skelter /dev/sdc1:0-1000 /dev/sdg1:0-1000

PV=/dev/sdg1
        syncd_secondary_core_2legs_1_mimage_1: 6:
PV=/dev/sdg1                                     
        syncd_secondary_core_2legs_1_mimage_1: 6:

Waiting until all mirrors become fully syncd...
   0/1 mirror(s) are fully synced: ( 43.33% )  
   0/1 mirror(s) are fully synced: ( 79.92% )  
   1/1 mirror(s) are fully synced: ( 100.00% ) 

Creating gfs2 on top of mirror(s) on taft-01...
Mounting mirrored gfs2 filesystems on taft-01...
Mounting mirrored gfs2 filesystems on taft-02...
Mounting mirrored gfs2 filesystems on taft-03...
Mounting mirrored gfs2 filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----                              
        ---- taft-02 ----                              
        ---- taft-03 ----                              
        ---- taft-04 ----                              

Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure                                 
Verifying files (checkit) on mirror(s) on...                                                                
        ---- taft-01 ----                                                                                   
        ---- taft-02 ----
        ---- taft-03 ----
        ---- taft-04 ----

Disabling device sdg on taft-01
Disabling device sdg on taft-02
Disabling device sdg on taft-03
Disabling device sdg on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.361142 s, 116 MB/s
Verifying current sanity of lvm after the failure
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg1: read failed after 0 of 2048 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 629080064: Input/output error
  [...]
  Couldn't find device with uuid OL4s9P-QZXm-Gezs-RbjI-9mUu-6mQy-EEYHT0.
Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg1: read failed after 0 of 2048 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 629080064: Input/output error
  [...]
  Couldn't find device with uuid OL4s9P-QZXm-Gezs-RbjI-9mUu-6mQy-EEYHT0.
Verifying LEG device /dev/sdc1 *IS* in the volume(s)
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg1: read failed after 0 of 2048 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 629080064: Input/output error
  [...]
  Couldn't find device with uuid OL4s9P-QZXm-Gezs-RbjI-9mUu-6mQy-EEYHT0.
verify the dm devices associated with /dev/sdg1 have been removed as expected
Checking REMOVAL of syncd_secondary_core_2legs_1_mimage_1 on:  taft-01 taft-02 taft-03 taft-04
syncd_secondary_core_2legs_1_mimage_1 on taft-04 should no longer be there


[root@taft-01 ~]# lvs -a -o +devices
  Couldn't find device with uuid OL4s9P-QZXm-Gezs-RbjI-9mUu-6mQy-EEYHT0.
  LV                                    VG             Attr   LSize   Log Copy%  Devices
  syncd_secondary_core_2legs_1          helter_skelter -wi-ao 600.00m            /dev/sdc1(0)
  syncd_secondary_core_2legs_1_mimage_0 helter_skelter vwi-a- 600.00m
  syncd_secondary_core_2legs_1_mimage_1 helter_skelter -wi--- 600.00m            unknown device(0)


Version-Release number of selected component (if applicable):
2.6.32-28.el6bz590851_v1.x86_64

lvm2-2.02.65-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
lvm2-libs-2.02.65-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
lvm2-cluster-2.02.65-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-libs-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-event-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-event-libs-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
cmirror-2.02.65-1.el6    BUILT: Wed May 19 11:19:57 CDT 2010

Comment 1 Corey Marthaler 2010-05-26 15:31:34 UTC
I'll attach the full logs, but here's the bit about the repair:

taft-01:
May 26 15:15:38 taft-01 lvm[19586]: Couldn't find device with uuid OL4s9P-QZXm-Gezs-RbjI-9mUu-6mQy-EEYHT0.
May 26 15:15:39 taft-01 lvm[19586]: Repair of mirrored LV helter_skelter/syncd_secondary_core_2legs_1 finished successfully.

taft-02:
May 26 10:14:05 taft-02 lvm[17402]: Error locking on node taft-04: LV helter_skelter/syncd_secondary_core_2legs_1_mimage_1 in use: not deactivating
May 26 10:14:05 taft-02 lvm[17402]: Repair of mirrored LV helter_skelter/syncd_secondary_core_2legs_1 failed.
May 26 10:14:05 taft-02 lvm[17402]: Failed to remove faulty devices in helter_skelter-syncd_secondary_core_2legs_1.
May 26 10:14:07 taft-02 lvm[17402]: No longer monitoring mirror device helter_skelter-syncd_secondary_core_2legs_1 for events.

taft-03:

taft-04:

Comment 2 Corey Marthaler 2010-05-26 15:37:25 UTC
Created attachment 416882 [details]
log from taft-01

Comment 3 Corey Marthaler 2010-05-26 15:37:55 UTC
Created attachment 416883 [details]
log from taft-02

Comment 4 Corey Marthaler 2010-05-26 15:38:37 UTC
Created attachment 416884 [details]
log from taft-03

Comment 5 Corey Marthaler 2010-05-26 15:39:09 UTC
Created attachment 416885 [details]
log from taft-04

Comment 6 Corey Marthaler 2010-05-26 16:51:49 UTC
This appears to be a cluster mirror issue only. Local machine mirrors "work", there are other issues however like bug 596367, but the basic functionality is there.

Comment 8 Jonathan Earl Brassow 2010-06-11 12:59:42 UTC
corey, please try again without udev running - we think udev is getting in the way.  Once we know whose fault this is, we can proceed to fix.

Comment 9 Corey Marthaler 2010-06-17 19:03:55 UTC
cmirror creation doesn't appear to work without udev running, so I'm not sure how to tell if udev is the problem here.

Comment 10 Corey Marthaler 2010-06-18 18:56:05 UTC
I tried this same simple fault injection case with the latest patched built and saw the exact same results, both without killing udev before the failure, and with killing udev before the failure. Not sure where to go from here...

2.6.32-25.el6.x86_64

lvm2-2.02.67-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010
lvm2-libs-2.02.67-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010
lvm2-cluster-2.02.67-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010
device-mapper-1.02.49-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010
device-mapper-libs-1.02.49-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010
device-mapper-event-1.02.49-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010
device-mapper-event-libs-1.02.49-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010
cmirror-2.02.67-1.6.el6    BUILT: Thu Jun 17 10:54:32 CDT 2010

Comment 11 Corey Marthaler 2010-06-28 21:26:34 UTC
FYI - if I run this testcase w/o any I/O load (the only I/O being a dd in order to force the repair) then cmirror device failure works.

Comment 12 Petr Rockai 2010-08-09 15:14:41 UTC
I *think* this is the same problem as bug 596453 and friends (from looking at the logs, although to confirm this I would need to have more of the logs). Jon, if you disagree please flip this back to ASSIGNED.

Comment 14 Corey Marthaler 2010-08-13 18:20:57 UTC
There is now a basic level of device failure functionality wrt cmirrors in the latest build. Other less basic device failure bugs still exist however. 

Marking this bug verified.

2.6.32-59.1.el6.x86_64

lvm2-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
lvm2-libs-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
lvm2-cluster-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
udev-147-2.22.el6    BUILT: Fri Jul 23 07:21:33 CDT 2010
device-mapper-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-libs-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-event-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-event-libs-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
cmirror-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010

Comment 15 releng-rhel@redhat.com 2010-11-10 21:07:58 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.