Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionCorey Marthaler
2013-11-15 21:47:37 UTC
Description of problem:
I feel like this bug may have already been filed but I couldn't find one that didn't involve partial allocation (bug 824159) or mirrored volumes (bug 1016296).
In this case there were two devices failed and two free devices in the VG for allocation to work, and it did appear to work, it's just the repair failed.
================================================================================
Iteration 0.1 started at Thu Nov 14 17:36:08 CST 2013
================================================================================
Scenario kill_multiple_synced_raid1_4legs: Kill multiple legs of synced 4 leg raid1 volume(s)
********* RAID hash info for this scenario *********
* names: synced_multiple_raid1_4legs_1
* sync: 1
* type: raid1
* -m |-i value: 4
* leg devices: /dev/sdf1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdg1
* failpv(s): /dev/sdf1 /dev/sdb1
* failnode(s): virt-004.cluster-qe.lab.eng.brq.redhat.com
* additional snap: /dev/sda1
* lvmetad: 0
* raid fault policy: allocate
******************************************************
Creating raids(s) on virt-004.cluster-qe.lab.eng.brq.redhat.com...
virt-004.cluster-qe.lab.eng.brq.redhat.com: lvcreate --type raid1 -m 4 -n synced_multiple_raid1_4legs_1 -L 500M black_bird /dev/sdf1:0-2000 /dev/sda1:0-2000 /dev/sdb1:0-2000 /dev/sdc1:0-2000 /dev/sdg1:0-2000
Current mirror/raid device structure(s):
LV Attr LSize Cpy%Sync Devices
synced_multiple_raid1_4legs_1 rwi-a-r--- 500.00m 0.00 synced_multiple_raid1_4legs_1_rimage_0(0),synced_multiple_raid1_4legs_1_rimage_1(0),synced_multiple_raid1_4legs_1_rimage_2(0),synced_multiple_raid1_4legs_1_rimage_3(0),synced_multiple_raid1_4legs_1_rimage_4(0)
[synced_multiple_raid1_4legs_1_rimage_0] Iwi-aor--- 500.00m /dev/sdf1(1)
[synced_multiple_raid1_4legs_1_rimage_1] Iwi-aor--- 500.00m /dev/sda1(1)
[synced_multiple_raid1_4legs_1_rimage_2] Iwi-aor--- 500.00m /dev/sdb1(1)
[synced_multiple_raid1_4legs_1_rimage_3] Iwi-aor--- 500.00m /dev/sdc1(1)
[synced_multiple_raid1_4legs_1_rimage_4] Iwi-aor--- 500.00m /dev/sdg1(1)
[synced_multiple_raid1_4legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdf1(0)
[synced_multiple_raid1_4legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sda1(0)
[synced_multiple_raid1_4legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdb1(0)
[synced_multiple_raid1_4legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdc1(0)
[synced_multiple_raid1_4legs_1_rmeta_4] ewi-aor--- 4.00m /dev/sdg1(0)
/dev/sda1 IS in the mirror
/dev/sdb1 IS in the mirror
/dev/sdc1 IS in the mirror
/dev/sde1 is NOT in the mirror
/dev/sdf1 IS in the mirror
/dev/sdg1 IS in the mirror
/dev/sdh1 is NOT in the mirror
AVAIL:2 - NEEDED:2
will_alloc_work=yes
Waiting until all mirror|raid volumes become fully syncd...
1/1 mirror(s) are fully synced: ( 100.00% )
Creating ext on top of mirror(s) on virt-004.cluster-qe.lab.eng.brq.redhat.com...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on virt-004.cluster-qe.lab.eng.brq.redhat.com...
PV=/dev/sdb1
synced_multiple_raid1_4legs_1_rimage_2: 1.0
synced_multiple_raid1_4legs_1_rmeta_2: 1.0
PV=/dev/sdf1
synced_multiple_raid1_4legs_1_rimage_0: 1.0
synced_multiple_raid1_4legs_1_rmeta_0: 1.0
Creating a snapshot volume of each of the raids
Writing verification files (checkit) to mirror(s) on...
---- virt-004.cluster-qe.lab.eng.brq.redhat.com ----
Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
---- virt-004.cluster-qe.lab.eng.brq.redhat.com ----
Disabling device sdf on virt-004.cluster-qe.lab.eng.brq.redhat.com
Disabling device sdb on virt-004.cluster-qe.lab.eng.brq.redhat.com
Getting recovery check start time from /var/log/messages: Nov 15 00:37
Attempting I/O to cause mirror down conversion(s) on virt-004.cluster-qe.lab.eng.brq.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.312702 s, 134 MB/s
Verifying current sanity of lvm after the failure
Current mirror/raid device structure(s):
Couldn't find device with uuid 6913Qo-v4h6-Wa2D-lh2O-pQq5-v5Ii-BiybDm.
Couldn't find device with uuid xTH2Ah-DWUp-QBKb-X0fa-nDpd-dk8N-6HqTrr.
LV Attr LSize Cpy%Sync Devices
bb_snap1 swi-a-s--- 252.00m /dev/sda1(127)
synced_multiple_raid1_4legs_1 owi-aor--- 500.00m 53.60 synced_multiple_raid1_4legs_1_rimage_0(0),synced_multiple_raid1_4legs_1_rimage_1(0),synced_multiple_raid1_4legs_1_rimage_2(0),synced_multiple_raid1_4legs_1_rimage_3(0),synced_multiple_raid1_4legs_1_rimage_4(0)
[synced_multiple_raid1_4legs_1_rimage_0] Iwi-aor--- 500.00m /dev/sde1(1)
[synced_multiple_raid1_4legs_1_rimage_1] iwi-aor--- 500.00m /dev/sda1(1)
[synced_multiple_raid1_4legs_1_rimage_2] Iwi-aor--- 500.00m /dev/sdh1(1)
[synced_multiple_raid1_4legs_1_rimage_3] iwi-aor--- 500.00m /dev/sdc1(1)
[synced_multiple_raid1_4legs_1_rimage_4] iwi-aor--- 500.00m /dev/sdg1(1)
[synced_multiple_raid1_4legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sde1(0)
[synced_multiple_raid1_4legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sda1(0)
[synced_multiple_raid1_4legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdh1(0)
[synced_multiple_raid1_4legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdc1(0)
[synced_multiple_raid1_4legs_1_rmeta_4] ewi-aor--- 4.00m /dev/sdg1(0)
Verifying FAILED device /dev/sdf1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sda1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdc1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdg1 *IS* in the volume(s)
verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of synced_multiple_raid1_4legs_1_rimage_2 on: virt-004.cluster-qe.lab.eng.brq.redhat.com
Checking EXISTENCE and STATE of synced_multiple_raid1_4legs_1_rmeta_2 on: virt-004.cluster-qe.lab.eng.brq.redhat.com
Checking EXISTENCE and STATE of synced_multiple_raid1_4legs_1_rimage_0 on: virt-004.cluster-qe.lab.eng.brq.redhat.com
Checking EXISTENCE and STATE of synced_multiple_raid1_4legs_1_rmeta_0 on: virt-004.cluster-qe.lab.eng.brq.redhat.com
Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: unknown /dev/sda1 unknown /dev/sdc1 /dev/sdg1
ACTUAL LEG ORDER: /dev/sde1 /dev/sda1 /dev/sdh1 /dev/sdc1 /dev/sdg1
unknown ne /dev/sde1
/dev/sda1 ne /dev/sda1
unknown ne /dev/sdh1
/dev/sdc1 ne /dev/sdc1
/dev/sdg1 ne /dev/sdg1
Verifying files (checkit) on mirror(s) on...
---- virt-004.cluster-qe.lab.eng.brq.redhat.com ----
Enabling device sdf on virt-004.cluster-qe.lab.eng.brq.redhat.com
Enabling device sdb on virt-004.cluster-qe.lab.eng.brq.redhat.com
Verify that each of the raid repairs finished successfully
repair of raid LV black_bird-synced_multiple_raid1_4legs_1 failed on virt-004.cluster-qe.lab.eng.brq.redhat.com
Nov 15 00:37:35 virt-004 qarshd[7370]: Running cmdline: echo offline > /sys/block/sdf/device/state &
Nov 15 00:37:36 virt-004 qarshd[7373]: Running cmdline: echo offline > /sys/block/sdb/device/state &
[...]
Nov 15 00:37:37 virt-004 lvm[5930]: /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error
Nov 15 00:37:37 virt-004 lvm[5930]: Failed to write changes to synced_multiple_raid1_4legs_1 in black_bird
Nov 15 00:37:37 virt-004 lvm[5930]: Failed to replace faulty devices in black_bird/synced_multiple_raid1_4legs_1.
Nov 15 00:37:37 virt-004 lvm[5930]: Repair of RAID device black_bird-synced_multiple_raid1_4legs_1-real failed.
Nov 15 00:37:37 virt-004 lvm[5930]: Failed to process event for black_bird-synced_multiple_raid1_4legs_1-real
Nov 15 00:37:42 virt-004 kernel: sd 6:0:0:1: rejecting I/O to offline device
Nov 15 00:37:42 virt-004 kernel: md: super_written gets error=-5, uptodate=0
Nov 15 00:37:42 virt-004 kernel: md/raid1:mdX: Disk failure on dm-7, disabling device.
Nov 15 00:37:42 virt-004 kernel: md/raid1:mdX: Operation continuing on 3 devices.
Nov 15 00:37:42 virt-004 lvm[5930]: Device #0 of raid1 array, black_bird-synced_multiple_raid1_4legs_1-real, has failed.
[...]
Nov 15 00:37:42 virt-004 kernel: sd 3:0:0:1: rejecting I/O to offline device
Nov 15 00:37:42 virt-004 kernel: sd 3:0:0:1: rejecting I/O to offline device
Nov 15 00:37:42 virt-004 kernel: device-mapper: raid: Device 2 specified for rebuild: Clearing superblock
Nov 15 00:37:42 virt-004 kernel: device-mapper: raid: Device 0 specified for rebuild: Clearing superblock
Nov 15 00:37:42 virt-004 kernel: md/raid1:mdX: active with 3 out of 5 mirrors
Nov 15 00:37:42 virt-004 kernel: created bitmap (1 pages) for device mdX
Nov 15 00:37:42 virt-004 kernel: mdX: bitmap initialized from disk: read 1 pages, set 4 of 1000 bits
Nov 15 00:37:42 virt-004 lvm[5930]: Monitoring RAID device black_bird-synced_multiple_raid1_4legs_1-real for events.
Nov 15 00:37:43 virt-004 lvm[5930]: Monitoring RAID device black_bird-synced_multiple_raid1_4legs_1-real for events.
Nov 15 00:37:43 virt-004 lvm[5930]: Faulty devices in black_bird/synced_multiple_raid1_4legs_1 successfully replaced.
Version-Release number of selected component (if applicable):
2.6.32-425.el6.x86_64
lvm2-2.02.100-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
lvm2-libs-2.02.100-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
lvm2-cluster-2.02.100-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
udev-147-2.51.el6 BUILT: Thu Oct 17 13:14:34 CEST 2013
device-mapper-1.02.79-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
device-mapper-libs-1.02.79-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
device-mapper-event-1.02.79-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
device-mapper-event-libs-1.02.79-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
device-mapper-persistent-data-0.2.8-2.el6 BUILT: Mon Oct 21 16:14:25 CEST 2013
cmirror-2.02.100-8.el6 BUILT: Wed Oct 30 09:10:56 CET 2013
How reproducible:
Often
Comment 2RHEL Program Management
2013-11-18 22:45:05 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 3Jonathan Earl Brassow
2014-04-04 22:29:22 UTC
Using the following command to test:
./black_bird -o bp-01 -l /usr/tests/sts-rhel6.5/ -r /usr/tests/sts-rhel6.5/ -e kill_multiple_synced_raid1_4legs
Comment 4Jonathan Earl Brassow
2014-04-07 19:20:54 UTC
Seems to run just fine for me (10+ iterations) if lvmetad isn't used.
I'm running into issues re-enabling devices when lvmetad is used. I'll work around that and try those tests again.
Comment 5Jonathan Earl Brassow
2014-04-07 20:02:17 UTC
After making the workaround for re-enabling failed devices, lvmetad seems to work fine also.
I am testing the upstream code ATM, so the issue may have been addressed outside RAID code already.
I will attempt 6.5 rpm testing and see if I can reproduce.
Comment 6Jonathan Earl Brassow
2014-04-07 20:48:34 UTC
10 iterations of black_bird with the RHEL6.5 RPMs - no reproduction.
Maybe I'll save this for a weekend or overnight run.
In the meantime, can QA reproduce it?
Comment 9Jonathan Earl Brassow
2014-05-30 17:12:55 UTC
I'm closing this one. If it can be reproduced then we'll reopen, but I think we've given this enough consideration.