Bug 811669 - Suspend/resume of an out-of-sync RAID LV will cause the sync process to stall
Suspend/resume of an out-of-sync RAID LV will cause the sync process to stall
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks: 739162
  Show dependency treegraph
 
Reported: 2012-04-11 12:51 EDT by Corey Marthaler
Modified: 2012-06-20 04:46 EDT (History)
10 users (show)

See Also:
Fixed In Version: kernel-2.6.32-269.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 04:46:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2012-04-11 12:51:52 EDT
Description of problem:
Scenario kill_primary_synced_raid1_2legs: Kill primary leg of synced 2 leg raid1 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_primary_raid1_2legs_1
* sync:               1
* type:               raid1
* -m |-i value:       2
* leg devices:        /dev/sde1 /dev/sdd1 /dev/sdc1
* failpv(s):          /dev/sde1
* failnode(s):        taft-01
* additional snap:    /dev/sdd1
* raid fault policy:   allocate
******************************************************

Creating raids(s) on taft-01...
taft-01: lvcreate --type raid1 -m 2 -n synced_primary_raid1_2legs_1 -L 500M black_bird /dev/sde1:0-1000 /dev/sdd1:0-1000 /dev/sdc1:0-1000

Creating a snapshot volume of each of the raids

RAID Structure(s):
  LV                                      Attr     LSize   Copy%  Devices
  bb_snap1                                swi-a-s- 252.00m        /dev/sdd1(126)
  synced_primary_raid1_2legs_1            owi-a-m- 500.00m   8.80 synced_primary_raid1_2legs_1_rimage_0(0),synced_primary_raid1_2legs_1_rimage_1(0),synced_primary_raid1_2legs_1_rimage_2(0)
  [synced_primary_raid1_2legs_1_rimage_0] Iwi-aor- 500.00m        /dev/sde1(1)
  [synced_primary_raid1_2legs_1_rimage_1] Iwi-aor- 500.00m        /dev/sdd1(1)
  [synced_primary_raid1_2legs_1_rimage_2] Iwi-aor- 500.00m        /dev/sdc1(1)
  [synced_primary_raid1_2legs_1_rmeta_0]  ewi-aor-   4.00m        /dev/sde1(0)
  [synced_primary_raid1_2legs_1_rmeta_1]  ewi-aor-   4.00m        /dev/sdd1(0)
  [synced_primary_raid1_2legs_1_rmeta_2]  ewi-aor-   4.00m        /dev/sdc1(0)

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 8.82% )
   0/1 mirror(s) are fully synced: ( 8.82% )
   0/1 mirror(s) are fully synced: ( 8.82% )
   0/1 mirror(s) are fully synced: ( 8.82% )

# SYNC IS STUCK



Version-Release number of selected component (if applicable):
2.6.32-251.el6.x86_64
lvm2-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
lvm2-libs-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
lvm2-cluster-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
device-mapper-libs-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
device-mapper-event-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
device-mapper-event-libs-1.02.74-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012
cmirror-2.02.95-4.el6    BUILT: Wed Apr 11 09:03:19 CDT 2012

How reproducible:
Everytime
Comment 1 Jonathan Earl Brassow 2012-04-12 09:46:11 EDT
It isn't just limited to snapshots or to RAID1.  This bug affects any RAID type and is induced by the suspend/resume cycle (which happens to occur during a snapshot).
Comment 3 RHEL Product and Program Management 2012-04-18 15:30:00 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 4 Jonathan Earl Brassow 2012-04-18 17:44:03 EDT
Before patch (Testing RAID1, then RAID5):
[root@bp-01 ~]# lvcreate --type raid1 -m2 -L 500M -n lv vg; sleep 1; dmsetup suspend vg-lv; dmsetup resume vg-lv ; dmsetup status vg-lv; sleep 30; dmsetup status vg-lv
  Logical volume "lv" created
0 1024000 raid raid1 3 aaa 4096/1024000
0 1024000 raid raid1 3 aaa 4096/1024000

[root@bp-01 ~]# lvcreate --type raid5 -i3 -L 500M -n lv vg; sleep 1; dmsetup suspend vg-lv; dmsetup resume vg-lv ; dmsetup status vg-lv; sleep 30; dmsetup status vg-lv
  Using default stripesize 64.00 KiB
  Rounding size (125 extents) up to stripe boundary size (126 extents)
  Logical volume "lv" created
0 1032192 raid raid5_ls 4 aaaa 23352/344064
0 1032192 raid raid5_ls 4 aaaa 23352/344064


After patch (Testing RAID1, then RAID5):
[root@bp-01 ~]# lvcreate --type raid1 -m2 -L 500M -n lv vg; sleep 1; dmsetup suspend vg-lv; dmsetup resume vg-lv ; dmsetup status vg-lv; sleep 30; dmsetup status vg-lv
  Logical volume "lv" created
0 1024000 raid raid1 3 aaa 0/1024000
0 1024000 raid raid1 3 AAA 1024000/1024000

[root@bp-01 ~]# lvcreate --type raid5 -i3 -L 500M -n lv vg; sleep 1; dmsetup suspend vg-lv; dmsetup resume vg-lv ; dmsetup status vg-lv; sleep 30; dmsetup status vg-lv
  Using default stripesize 64.00 KiB
  Rounding size (125 extents) up to stripe boundary size (126 extents)
  Logical volume "lv" created
0 1032192 raid raid5_ls 4 aaaa 22528/344064
0 1032192 raid raid5_ls 4 AAAA 344064/344064
Comment 6 Jarod Wilson 2012-05-02 12:19:57 EDT
Patch(es) available on kernel-2.6.32-269.el6
Comment 9 Corey Marthaler 2012-05-02 15:20:28 EDT
The raid + snapshot failure cases now work with the latest kernel.

2.6.32-269.el6.x86_64
lvm2-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-libs-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-cluster-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
cmirror-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
Comment 11 errata-xmlrpc 2012-06-20 04:46:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html

Note You need to log in before you can comment on or make changes to this bug.