Bug 733114 - mirror conversion and mirror image splitting results in device i/o errors
Summary: mirror conversion and mirror image splitting results in device i/o errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 747120
TreeView+ depends on / blocked
 
Reported: 2011-08-24 19:33 UTC by Corey Marthaler
Modified: 2011-12-06 17:02 UTC (History)
9 users (show)

Fixed In Version: lvm2-2.02.87-5.el6
Doc Type: Bug Fix
Doc Text:
Do not document.
Clone Of:
Environment:
Last Closed: 2011-12-06 17:02:37 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1522 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2011-12-06 00:50:10 UTC

Description Corey Marthaler 2011-08-24 19:33:03 UTC
Description of problem:
[root@taft-01 ~]# lvs -a -o +devices
 LV                                   VG          Attr   LSize   Log                            Copy%  Devices
 split_images_sequentially            split_image mwi-a- 300.00m split_images_sequentially_mlog 100.00 split_images_sequentially_mimage_0(0),split_images_sequentially_mimage_1(0),split_images_sequentially_mimage_2(0),split_images_sequentially_mimage_3(0),split_images_sequentially_mimage_4(0)
 [split_images_sequentially_mimage_0] split_image iwi-ao 300.00m                                       /dev/sdh1(0)
 [split_images_sequentially_mimage_1] split_image iwi-ao 300.00m                                       /dev/sdg1(0)
 [split_images_sequentially_mimage_2] split_image iwi-ao 300.00m                                       /dev/sdf1(0)
 [split_images_sequentially_mimage_3] split_image iwi-ao 300.00m                                       /dev/sde1(0)
 [split_images_sequentially_mimage_4] split_image iwi-ao 300.00m                                       /dev/sdd1(0)
 [split_images_sequentially_mlog]     split_image lwi-ao   4.00m                                       /dev/sdc1(0)

[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new0 split_image/split_images_sequentially
 Logical volume split_images_sequentially converted.
[root@taft-01 ~]# lvs -a -o +devices
 LV                                   Attr   LSize   Log                            Copy%  Devices
 new0                                 -wi-a- 300.00m                                       /dev/sdd1(0)
 split_images_sequentially            mwi-a- 300.00m split_images_sequentially_mlog 100.00 split_images_sequentially_mimage_0(0),split_images_sequentially_mimage_1(0),split_images_sequentially_mimage_2(0),split_images_sequentially_mimage_3(0)
 [split_images_sequentially_mimage_0] iwi-ao 300.00m                                       /dev/sdh1(0)
 [split_images_sequentially_mimage_1] iwi-ao 300.00m                                       /dev/sdg1(0)
 [split_images_sequentially_mimage_2] iwi-ao 300.00m                                       /dev/sdf1(0)
 [split_images_sequentially_mimage_3] iwi-ao 300.00m                                       /dev/sde1(0)
 [split_images_sequentially_mlog]     lwi-ao   4.00m                                       /dev/sdc1(0)

[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new1 split_image/split_images_sequentially
 Logical volume split_images_sequentially converted.
[root@taft-01 ~]# lvs -a -o +devices
 LV                                   Attr   LSize   Log                            Copy%  Devices
 new0                                 -wi-a- 300.00m                                       /dev/sdd1(0)
 new1                                 -wi-a- 300.00m                                       /dev/sde1(0)
 split_images_sequentially            mwi-a- 300.00m split_images_sequentially_mlog 100.00 split_images_sequentially_mimage_0(0),split_images_sequentially_mimage_1(0),split_images_sequentially_mimage_2(0)
 [split_images_sequentially_mimage_0] iwi-ao 300.00m                                       /dev/sdh1(0)
 [split_images_sequentially_mimage_1] iwi-ao 300.00m                                       /dev/sdg1(0)
 [split_images_sequentially_mimage_2] iwi-ao 300.00m                                       /dev/sdf1(0)
 [split_images_sequentially_mlog]     lwi-ao   4.00m                                       /dev/sdc1(0)

[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new2 split_image/split_images_sequentially
 Logical volume split_images_sequentially converted.
[root@taft-01 ~]# lvs -a -o +devices
 LV                                   Attr   LSize   Log                            Copy%  Devices
 new0                                 -wi-a- 300.00m                                       /dev/sdd1(0)
 new1                                 -wi-a- 300.00m                                       /dev/sde1(0)
 new2                                 -wi-a- 300.00m                                       /dev/sdf1(0)
 split_images_sequentially            mwi-a- 300.00m split_images_sequentially_mlog 100.00 split_images_sequentially_mimage_0(0),split_images_sequentially_mimage_1(0)
 [split_images_sequentially_mimage_0] iwi-ao 300.00m                                       /dev/sdh1(0)
 [split_images_sequentially_mimage_1] iwi-ao 300.00m                                       /dev/sdg1(0)
 [split_images_sequentially_mlog]     lwi-ao   4.00m                                       /dev/sdc1(0)

[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new3 split_image/split_images_sequentially
 LV split_image/split_images_sequentially_mimage_0 in use: not deactivating
[root@taft-01 ~]# lvs -a -o +devices
 LV                                 Attr   LSize   Log Copy%  Devices
 new0                               -wi-a- 300.00m            /dev/sdd1(0)
 new1                               -wi-a- 300.00m            /dev/sde1(0)
 new2                               -wi-a- 300.00m            /dev/sdf1(0)
 new3                               -wi-a- 300.00m            /dev/sdg1(0)    
 split_images_sequentially          -wi-a- 300.00m            /dev/sdh1(0)
 split_images_sequentially_mimage_0 vwi-a- 300.00m
 split_images_sequentially_mlog     -wi-a-   4.00m            /dev/sdc1(0)


Aug 24 14:15:02 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:02 taft-01 lvm[1243]: Monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:02 taft-01 lvm[1243]: split_image-split_images_sequentially is now in-sync.
Aug 24 14:15:02 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:02 taft-01 lvm[1243]: Monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:02 taft-01 lvm[1243]: split_image-split_images_sequentially is now in-sync.
Aug 24 14:15:13 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:13 taft-01 lvm[1243]: Monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:13 taft-01 lvm[1243]: split_image-split_images_sequentially is now in-sync.
Aug 24 14:15:13 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:14 taft-01 lvm[1243]: Monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:14 taft-01 lvm[1243]: split_image-split_images_sequentially is now in-sync.
Aug 24 14:15:22 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:23 taft-01 lvm[1243]: Monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:23 taft-01 lvm[1243]: split_image-split_images_sequentially is now in-sync.
Aug 24 14:15:23 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:23 taft-01 lvm[1243]: Monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:23 taft-01 lvm[1243]: split_image-split_images_sequentially is now in-sync.
Aug 24 14:15:32 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Aug 24 14:15:32 taft-01 kernel: __ratelimit: 127 callbacks suppressed
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76784
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76784
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76798
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76798
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 0
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 0
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 1
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76799
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76799
Aug 24 14:15:32 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76799


Version-Release number of selected component (if applicable):
2.6.32-191.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011


How reproducible:
Everytime

Comment 1 Corey Marthaler 2011-08-24 19:45:00 UTC
Easier way to repo this is just start off with a 2-way mirror.

[root@taft-01 ~]# lvs -a -o +devices
  LV                        VG          Attr   LSize   Log                 Copy%  Devices                                              
  split_last_leg            split_image mwi-a- 300.00m split_last_leg_mlog 100.00 split_last_leg_mimage_0(0),split_last_leg_mimage_1(0)
  [split_last_leg_mimage_0] split_image iwi-ao 300.00m                            /dev/sdh1(0)                                         
  [split_last_leg_mimage_1] split_image iwi-ao 300.00m                            /dev/sdg1(0)                                         
  [split_last_leg_mlog]     split_image lwi-ao   4.00m                            /dev/sdc1(0)                                         

[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new split_image/split_last_leg
  LV split_image/split_last_leg_mimage_0 in use: not deactivating

[root@taft-01 ~]# lvs -a -o +devices
  LV                      VG          Attr   LSize   Log Copy%  Devices         
  new                     split_image -wi-a- 300.00m            /dev/sdg1(0)    
  split_last_leg          split_image -wi-a- 300.00m            /dev/sdh1(0)    
  split_last_leg_mimage_0 split_image vwi-a- 300.00m                            
  split_last_leg_mlog     split_image -wi-a-   4.00m            /dev/sdc1(0)    

Aug 24 14:42:26 taft-01 lvm[1243]: No longer monitoring mirror device split_image-split_last_leg for events.
Aug 24 14:42:26 taft-01 kernel: __ratelimit: 127 callbacks suppressed
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76784
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76784
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76798
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76798
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 0
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 0
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 1
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76799
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76799
Aug 24 14:42:26 taft-01 kernel: Buffer I/O error on device dm-3, logical block 76799

Comment 2 Jonathan Earl Brassow 2011-08-31 21:24:52 UTC
The I/O errors don't bother me.  They come from udev trying to access split_last_leg_mimage_0, which is now an error target.  The real problem here is the fact that we get 'LV split_image/split_images_sequentially_mimage_0 in use: not deactivating' - probably an [de]activation sequence issue.

Comment 3 Jonathan Earl Brassow 2011-09-01 19:31:16 UTC
Please test this bug again with the changes made for bug 732142 - the ordering of the split operations has changed and will likely affect this bug.  It will need to be reconfirmed.

I will mark this one as POST along with bug 732142.

Comment 4 Corey Marthaler 2011-09-01 21:38:37 UTC
Need devel unit testing results from this fix.

Comment 6 Corey Marthaler 2011-09-09 22:31:50 UTC
Devel should really be trying these test cases before moving to POST.

SCENARIO - [mirror_split_fs_io_verification]
Create 3-way mirror with simple fs data (checkit) and then split the images and verify
taft-02: lvcreate -m 2 -n split_io_verify --nosync -L 4G mirror_sanity
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!

Placing a lock_nolock gfs2 filesystem on split_io_verify volume
Mounting split_io_verify volume

Writing files to /mnt/split_io_verify/taft-02

Splitting: lvconvert --splitmirrors 1 --name new1 mirror_sanity/split_io_verify
  LV mirror_sanity/split_io_verify_mimage_2 in use: not deactivating
  Failed to deactivate former mirror image, split_io_verify_mimage_2
couldn't split split_io_verify into new1 on taft-02

[root@taft-02 ~]# lvs -a -o +devices
 LV                         VG            Attr   Log                  Copy%  Devices                                                
 split_io_verify            mirror_sanity Mwi-a- split_io_verify_mlog 100.00 split_io_verify_mimage_0(0),split_io_verify_mimage_1(0)
 [split_io_verify_mimage_0] mirror_sanity iwi-ao                             /dev/sdh1(0)                                           
 [split_io_verify_mimage_1] mirror_sanity iwi-ao                             /dev/sdg1(0)                                           
 split_io_verify_mimage_2   mirror_sanity -wi-a-                             /dev/sdf1(0)                                           
 [split_io_verify_mlog]     mirror_sanity lwi-ao                             /dev/sdc1(0)

Comment 7 Corey Marthaler 2011-09-09 22:36:54 UTC
Even the basic test case still fails.

[root@taft-02 ~]# lvs -a -o +devices
 LV                Attr   LSize   Log         Copy%  Devices                              
 mirror            mwi-a- 100.00m mirror_mlog 100.00 mirror_mimage_0(0),mirror_mimage_1(0)
 [mirror_mimage_0] iwi-ao 100.00m                    /dev/sdb1(0)                         
 [mirror_mimage_1] iwi-ao 100.00m                    /dev/sdc1(0)                         
 [mirror_mlog]     lwi-ao   4.00m                    /dev/sdh1(0)                         

[root@taft-02 ~]# lvconvert --splitmirrors 1 --name new taft/mirror
 LV taft/mirror_mimage_0 in use: not deactivating

Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 25598
Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 25598
Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 0
Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 0
Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 1
Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 25599
Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 25599
Sep  9 17:34:30 taft-02 kernel: Buffer I/O error on device dm-4, logical block 255992.6.32-193.el6.x86_64


lvm2-2.02.87-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011
lvm2-libs-2.02.87-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011
lvm2-cluster-2.02.87-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011
device-mapper-libs-1.02.66-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011
device-mapper-event-1.02.66-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011
device-mapper-event-libs-1.02.66-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011
cmirror-2.02.87-2.el6    BUILT: Fri Sep  9 09:35:36 CDT 2011

Comment 8 Corey Marthaler 2011-09-12 16:54:02 UTC
FYI - This case doesn't always fail, I'd say it fails roughly 1/2 the time, and by fail, it doesn't do the actual mirror split.


============================================================
Iteration 1 of 10000 started at Mon Sep 12 11:50:30 CDT 2011
============================================================
SCENARIO - [split_off_last_leg_leaving_no_mirror]
Create a 2-way mirror and split off the last leg leaving no mirror
taft-03: lvcreate -m 1 -n split_last_leg -L 300M split_image
Waiting until all mirrors become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

splitting off last leg...

Deactivating mirror new... and removing
Deactivating mirror split_last_leg... and removing


============================================================
Iteration 2 of 10000 started at Mon Sep 12 11:50:51 CDT 2011
============================================================
SCENARIO - [split_off_last_leg_leaving_no_mirror]
Create a 2-way mirror and split off the last leg leaving no mirror
taft-03: lvcreate -m 1 -n split_last_leg -L 300M split_image
Waiting until all mirrors become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

splitting off last leg...
couldn't split off last leg

[root@taft-03 ~]# lvs -a -o +devices
  LV                      VG          Attr   LSize   Copy% Devices         
  split_last_leg          split_image -wi-a- 300.00m       /dev/sdh1(0)    
  split_last_leg_mimage_0 split_image vwi-a- 300.00m
  split_last_leg_mimage_1 split_image -wi-a- 300.00m       /dev/sdg1(0)    
  split_last_leg_mlog     split_image -wi-a-   4.00m       /dev/sdc1(0)

Comment 9 Jonathan Earl Brassow 2011-09-12 18:12:00 UTC
When an image is split from a 2-way mirror, the original mirror is converted to a linear device.  To do this, the top "layer" must be removed.  The segments are transferred from the sub-lv to the top-level LV and the link is severed.  The former sub-lv - having its segments transferred - now contains a temporary error target.

When the original LV is resumed, the old sub-lv that now contains an error segment is activated and scanned.  This is what causes the I/O error messages.  There are three ways to fix this problem:
1) Do not set the sub-lv which contains the error target as "visible" before suspending the original LV.  This way, when the original is resumed, the sub-lv device node is not created and it is not scanned - avoiding the error messages.  The problem with this approach is that if the machine crashes after the resume, it leaves the *hidden* LV in place and the user has a more difficult time noticing that it needs to be cleaned up.  Thus, this type of processing is frowned upon.
2) Do like _remove_mirror_images does and suspend the original, then suspend the sub-lv (the error target), then resume the sub-lv, and finally resume the original LV.  This seems like extra pointless operations to me, but it does not produce the error message (although, I'm not sure why) and it allows us to leave the visible flag in place.
3) Flag the sub-lv (error target) with a "do not scan" flag.  This seems like the cleanest approach, but I have been unable to find the method for doing this.

This same bug exists in RAID1 and the solution could be applied there as well.

Comment 10 Jonathan Earl Brassow 2011-09-13 14:04:53 UTC
Option #2 was chosen from the solutions in comment 9 because it is the least invasive.

Before the patch was applied, I would get:
Sep 13 09:02:28 bp-01 kernel: __ratelimit: 118 callbacks suppressed
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1008
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1008
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1022
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1022
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 0
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 0
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1023
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1023
Sep 13 09:02:28 bp-01 kernel: Buffer I/O error on device dm-4, logical block 1023

I do not get these messages after the patch.

Patch checked in upstream - Version 2.02.89

Comment 11 Corey Marthaler 2011-09-15 15:55:53 UTC
Jon, I filed bug 738714 for the fact these lvconvert split operations (device i/o errors aside) are failing most of the time when there are more then 2 legs of the mirror.

Comment 13 Corey Marthaler 2011-09-23 16:17:59 UTC
Still seeing errors from time to time on multiple clusters with the latest rpms. The errors appear on the nodes not doing the split operation.

SCENARIO - [split_off_last_leg_leaving_no_mirror]
Create a 2-way mirror and split off the last leg leaving no mirror
hayes-01: lvcreate -m 1 -n split_last_leg -L 300M split_image
Waiting until all mirrors become fully syncd...
   0/1 mirror(s) are fully synced: ( 31.67% )
   0/1 mirror(s) are fully synced: ( 38.83% )
   0/1 mirror(s) are fully synced: ( 54.33% )
   0/1 mirror(s) are fully synced: ( 83.00% )
   1/1 mirror(s) are fully synced: ( 100.00% )

splitting off last leg...
couldn't split off last leg
  Error locking on node hayes-03: LV split_image/split_last_leg_mimage_0 in use: not deactivating
  Error locking on node hayes-02: LV split_image/split_last_leg_mimage_0 in use: not deactivating

From the logs:
Sep 23 11:07:48 taft-01 qarshd[31974]: Running cmdline: lvconvert --splitmirrors 1 --name new3 split_image/split_images_sequentiay


Sep 23 11:07:54 taft-02 lvm[1229]: No longer monitoring mirror device split_image-split_images_sequentially for events.
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 76784
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 76798
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 76798
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 0
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 0
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 1
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 76799
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 76799
Sep 23 11:07:54 taft-02 kernel: Buffer I/O error on device dm-3, logical block 76799


2.6.32-198.el6.x86_64

lvm2-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-libs-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-cluster-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
udev-147-2.38.el6    BUILT: Fri Sep  9 16:25:50 CDT 2011
device-mapper-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
cmirror-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011

Comment 14 Corey Marthaler 2011-09-23 17:03:33 UTC
I'm seeing this while doing lv convert operations as well.

---------------------------------------------
TEST CASE=22
 base legs 1; convert to 0 legs;
 base log disk; convert to core log;
 active 1; sync=1; pvs=sufficient
---------------------------------------------
creating a base mirror on taft-03
Waiting until all mirrors become fully syncd...
   0/1 mirror(s) are fully synced: ( 28.36% )
   0/1 mirror(s) are fully synced: ( 49.21% )
   0/1 mirror(s) are fully synced: ( 72.86% )
   0/1 mirror(s) are fully synced: ( 93.71% )
   1/1 mirror(s) are fully synced: ( 100.00% )

LV                    VG        Attr   LSize   Log             Copy%  Devices                                      
centi_base            centipede mwi-a- 700.00m centi_base_mlog 100.00 centi_base_mimage_0(0),centi_base_mimage_1(0)
[centi_base_mimage_0] centipede iwi-ao 700.00m                        /dev/sdh2(0)                                 
[centi_base_mimage_1] centipede iwi-ao 700.00m                        /dev/sdh1(0)                                 
[centi_base_mlog]     centipede lwi-ao   4.00m                        /dev/sdd2(0)                                 

Converting from 1 leg(s) disk log; to 0 leg(s) core log on taft-03

legs_up=
logs_up=
legs_down=/dev/sdh2
logs_down=/dev/sdd2
total=/dev/sdh2 /dev/sdd2

lvconvert --mirrorlog core -m 0 centipede/centi_base /dev/sdh2 /dev/sdd2
sufficient pvs convert case failed on taft-03
  Error locking on node taft-02: LV centipede/centi_base_mimage_0 in use: not deactivating
  Error locking on node taft-01: LV centipede/centi_base_mimage_0 in use: not deactivating
  Error locking on node taft-04: LV centipede/centi_base_mimage_0 in use: not deactivating
  Error locking on node taft-03: LV centipede/centi_base_mimage_0 in use: not deactivating

Comment 17 Jonathan Earl Brassow 2011-09-28 15:15:56 UTC
Solution #2 from comment 9 does not work in a cluster.  The suspend/resume of the sub-lv is not performed on the remote nodes - thus, the error target is scanned on the remote nodes which causes the I/O errors.

Of the remaining two solutions in comment 9, #1 is the simplest but also the dirtiest for the reasons mentioned above.  Solution #3 seems somewhat more promising, but I'd rather figure out why the resume is resuming all of the sub-lvs without passing them through _get_udev_flags().  It is the lack of calling _get_udev_flags() that makes udev think it should scan the sub-lv error target (two reasons that the device should not be scanned).

I'll try to pursue both courses and implement the best.  If I fail to complete the modified solution #3, #1 can be a temporary fall-back.

Comment 18 Jonathan Earl Brassow 2011-09-28 16:07:16 UTC
Comment 9, solution 1 stops the I/O errors, but fails to rename the devices...

Comment 19 Jonathan Earl Brassow 2011-09-28 16:14:11 UTC
Comment 9, solution 1 also fails to resume the sub-lv (the error target LV that represents the layer removed) as an error target - presumably because it is not visible and not attached to the top-level LV anymore.  Although, it does still get resumed.

Comment 20 Jonathan Earl Brassow 2011-10-06 15:18:49 UTC
A proper place to reset the udev flags has been found and the splitmirrors code has been updated.  This required three check-ins:

1) Revert original work-around that failed to solve the problem in a cluster
   Upstream git commit id: f7235e7cb439d0c6cbae4ba94a912d50d46c5d25
2) Add code to properly set udev flags of mirror sub-LVs during changes
   Upstream git commit id: 857339758f8181575512fc51757291deb07078ec
3) Update splitmirror (_split_mirror_images) code
   Upstream git commit id: 1986e51928ae159781a273972d809a25990fd339


Tests confirm the error messages (access to intermediate error target) do not happen after the changes.  Splitmirrors also properly functions for single machine mirrors, cluster mirrors, and exclusively activated mirrors in a cluster.

Comment 22 Corey Marthaler 2011-10-13 18:05:05 UTC
Both the mirror conversion and mirror image splitting regression tests passed without any "Buffer I/O" errors. Marking verified.

2.6.32-207.el6.x86_64

lvm2-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
lvm2-libs-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
lvm2-cluster-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
device-mapper-libs-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
device-mapper-event-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
device-mapper-event-libs-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
cmirror-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011

Comment 23 Peter Rajnoha 2011-10-27 07:56:17 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Do not document.

Comment 24 errata-xmlrpc 2011-12-06 17:02:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1522.html


Note You need to log in before you can comment on or make changes to this bug.