| Summary: | log device failure on multiple HA mirrors resulted in a corrupt VG | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | ||||
| Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Corey Marthaler <cmarthal> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 6.1 | CC: | agk, coughlan, dwysocha, heinzm, jbrassow, lhh, prajnoha, prockai, thornber, zkabelac | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-07-12 16:42:00 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 756082 | ||||||
| Attachments: |
|
||||||
Here's what the mirror volumes looked like before /dev/sdd was failed. This HA setup was using the tag method for HA.
# If volume_list is defined, each LV is only activated if there is a
# match against the list.
# "vgname" and "vgname/lvname" are matched exactly.
# "@tag" matches any tag set in the LV or VG.
# "@*" matches if any tag defined on the host is also set in the LV or VG
#
# volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
volume_list = [ "vg_taft03", "@taft-03" ]
[root@taft-03 ~]# vgs -o +vg_tags
VG #PV #LV #SN Attr VSize VFree VG Tags
TAFT 21 4 0 wz-pn- 949.57g 901.55g taft-03
[root@taft-03 ~]# lvs -a -o +devices
LV VG Attr LSize Log Copy% Devices
ha1 TAFT mwi-ao 8.00g ha1_mlog 100.00 ha1_mimage_0(0),ha1_mimage_1(0)
[ha1_mimage_0] TAFT iwi-ao 8.00g /dev/sdb1(0)
[ha1_mimage_1] TAFT iwi-ao 8.00g /dev/sdc1(0)
[ha1_mlog] TAFT lwi-ao 4.00m /dev/sdd1(0)
ha2 TAFT mwi-ao 8.00g ha2_mlog 100.00 ha2_mimage_0(0),ha2_mimage_1(0)
[ha2_mimage_0] TAFT iwi-ao 8.00g /dev/sdb2(0)
[ha2_mimage_1] TAFT iwi-ao 8.00g /dev/sdc2(0)
[ha2_mlog] TAFT lwi-ao 4.00m /dev/sdd2(0)
ha3 TAFT mwi-ao 8.00g ha3_mlog 100.00 ha3_mimage_0(0),ha3_mimage_1(0)
[ha3_mimage_0] TAFT iwi-ao 8.00g /dev/sdb3(0)
[ha3_mimage_1] TAFT iwi-ao 8.00g /dev/sdc3(0)
[ha3_mlog] TAFT lwi-ao 4.00m /dev/sdd3(0)
Created attachment 516743 [details]
log from taft-03
There's not quite enough information to go on. I'm hoping for a reproducer. Was this a one time event? I'm not sure how the lvm commands would produce metadata with checksum errors. Perhaps another machine was involved somehow. Since we are using the 'tag' method. There is a higher probability of other machine interaction. I can no longer reproduce this issue with the latest rpms. Closing... 2.6.32-279.el6.x86_64 rgmanager-3.0.12.1-12.el6.x86_64 lvm2-2.02.95-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 lvm2-libs-2.02.95-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 lvm2-cluster-2.02.95-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 udev-147-2.41.el6 BUILT: Thu Mar 1 13:01:08 CST 2012 device-mapper-1.02.74-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 device-mapper-libs-1.02.74-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 device-mapper-event-1.02.74-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 device-mapper-event-libs-1.02.74-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 cmirror-2.02.95-10.el6 BUILT: Fri May 18 03:26:00 CDT 2012 |
Description of problem: I have three mirrors included in one HA service, and then failed the log device. The VG that these mirrors resided on ended up inconsistent after the failure. Cluster Status for TAFT @ Thu Aug 4 10:02:15 2011 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ taft-01 1 Online, rgmanager taft-02 2 Online, rgmanager taft-03 3 Online, Local, rgmanager taft-04 4 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:halvm taft-03 started Scenario: Kill log ********* Mirror info for this scenario ********* * mirrors: ha1 ha2 ha3 * leg devices: /dev/sdb1 /dev/sdc1 * log devices: /dev/sdd1 * failpv(s): /dev/sdd1 * failnode(s): taft-03 * leg fault policy: allocate * log fault policy: remove * HA mirror: YES * HA service name: halvm * HA current owner: taft-03 ************************************************* PV=/dev/sdd1 ha1_mlog: 2 PV=/dev/sdd1 ha1_mlog: 2 Disabling device sdd on taft-03 Attempting I/O to cause mirror down conversion(s) on taft-03 1+0 records in 1+0 records out 512 bytes (512 B) copied, 5.4389e-05 s, 9.4 MB/s 1+0 records in 1+0 records out 512 bytes (512 B) copied, 4.9121e-05 s, 10.4 MB/s 1+0 records in 1+0 records out 512 bytes (512 B) copied, 5.4579e-05 s, 9.4 MB/s [DEADLOCK] [root@taft-03 ~]# lvs -a -o +devices /dev/sdd1: open failed: No such device or address /dev/sdd2: read failed after 0 of 2048 at 0: Input/output error /dev/sdd3: read failed after 0 of 4096 at 0: Input/output error /dev/sde1: Checksum error /dev/sde2: Checksum error /dev/sde3: Checksum error /dev/sdf1: Checksum error /dev/sdf2: Checksum error /dev/sdf3: Checksum error /dev/sdg1: Checksum error /dev/sdg2: Checksum error /dev/sdg3: Checksum error /dev/sdh1: Checksum error /dev/sdh2: Checksum error /dev/sdh3: Checksum error /dev/sdb1: Checksum error /dev/sdb2: Checksum error /dev/sdb3: Checksum error /dev/sdc1: Checksum error /dev/sdc2: Checksum error /dev/sdc3: Checksum error [DEADLOCK] [root@taft-03 ~]# mount /dev/mapper/TAFT-ha1 on /mnt/fs1 type ext3 (rw) /dev/mapper/TAFT-ha2 on /mnt/fs2 type ext3 (rw) /dev/mapper/TAFT-ha3 on /mnt/fs3 type ext3 (rw) [root@taft-03 ~]# touch /mnt/fs3/foobar [DEADLOCK] [root@taft-03 ~]# cat /var/log/messages | grep lvm\\[ Aug 3 16:38:16 taft-03 lvm[7397]: Monitoring mirror device TAFT-ha1 for events. Aug 3 16:38:16 taft-03 lvm[7397]: Monitoring mirror device TAFT-ha2 for events. Aug 3 16:38:16 taft-03 lvm[7397]: Monitoring mirror device TAFT-ha3 for events. Aug 3 16:38:58 taft-03 lvm[7397]: No longer monitoring mirror device TAFT-ha1 for events. Aug 3 16:38:59 taft-03 lvm[7397]: No longer monitoring mirror device TAFT-ha2 for events. Aug 3 16:38:59 taft-03 lvm[7397]: No longer monitoring mirror device TAFT-ha3 for events. Aug 3 16:39:05 taft-03 lvm[7397]: Monitoring mirror device TAFT-ha1 for events. Aug 3 16:39:05 taft-03 lvm[7397]: Monitoring mirror device TAFT-ha2 for events. Aug 3 16:39:05 taft-03 lvm[7397]: Monitoring mirror device TAFT-ha3 for events. Aug 3 16:42:18 taft-03 lvm[7397]: TAFT-ha1 is now in-sync. Aug 3 16:43:15 taft-03 lvm[7397]: TAFT-ha2 is now in-sync. Aug 3 16:43:17 taft-03 lvm[7397]: TAFT-ha3 is now in-sync. Aug 3 18:19:52 taft-03 lvm[7397]: Log device 253:3 has failed (D). Aug 3 18:19:52 taft-03 lvm[7397]: Device failure in TAFT-ha1. Aug 3 18:19:53 taft-03 lvm[7397]: /dev/sdd1: read failed after 0 of 4096 at 48553721856: Input/output error [...] Aug 3 18:19:53 taft-03 lvm[7397]: /dev/sdd2: read failed after 0 of 512 at 48553721856: Input/output error [...] Aug 3 18:19:53 taft-03 lvm[7397]: /dev/sdd3: read failed after 0 of 4096 at 48561979392: Input/output error [...] Aug 3 18:19:53 taft-03 lvm[7397]: Another thread is handling an event. Waiting... Aug 3 18:19:53 taft-03 lvm[7397]: Couldn't find device with uuid 1vFlLa-d3zr-Aeki-rWOV-Ny2c-6QBf-pXjO10. Aug 3 18:19:53 taft-03 lvm[7397]: Couldn't find device with uuid 2JLe6g-Ed6M-wuG1-o86o-8trB-GLR0-3uSrLe. Aug 3 18:19:53 taft-03 lvm[7397]: Couldn't find device with uuid YJRe5Z-SD7i-uwuF-wPR5-pXn8-Kor9-0YRyBs. Aug 3 18:19:54 taft-03 lvm[7397]: Another thread is handling an event. Waiting... Aug 3 18:20:02 taft-03 lvm[7397]: Mirror log status: 1 of 1 images failed - switching to core Aug 3 18:20:19 taft-03 lvm[7397]: Monitoring mirror device TAFT-ha1 for events. Aug 3 18:20:19 taft-03 lvm[7397]: Another thread is handling an event. Waiting... Aug 3 18:20:41 taft-03 lvm[7397]: /dev/sdd1: open failed: No such device or address Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdg3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: Couldn't read volume group metadata. Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdh1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: Couldn't read volume group metadata. Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdh2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: Couldn't read volume group metadata. Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdh3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: Couldn't read volume group metadata. Aug 3 18:20:49 taft-03 lvm[7397]: Inconsistent pre-commit metadata copies for volume group TAFT Aug 3 18:20:49 taft-03 lvm[7397]: Volume group TAFT metadata is inconsistent Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdb1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdb2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdb3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdc1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdc2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdc3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sde1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sde2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sde3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdf1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdf2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdf3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdg1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdg2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdg3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdh1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdh2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdh3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: Volume group for uuid not found: 3lHsWDW2DxtCFsmYLor0bkTOnSGLT32DZhywE4y8JW7ggRRPWplApfe0Ab4udn9N Aug 3 18:20:49 taft-03 lvm[7397]: Problem reactivating ha1 Aug 3 18:20:49 taft-03 lvm[7397]: Repair of mirrored LV TAFT/ha1 failed. Aug 3 18:20:49 taft-03 lvm[7397]: Failed to remove faulty devices in TAFT-ha1. Aug 3 18:20:49 taft-03 lvm[7397]: Log device 253:11 has failed (D). Aug 3 18:20:49 taft-03 lvm[7397]: Device failure in TAFT-ha3. Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdb1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdb2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdb3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdc1: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdc2: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdc3: Checksum error Aug 3 18:20:49 taft-03 lvm[7397]: /dev/sdd2: read failed after 0 of 512 at 48553721856: Input/output error [...] Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdd3: read failed after 0 of 4096 at 48561979392: Input/output error [...] Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sde1: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sde2: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sde3: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdf1: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdf2: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdf3: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdg1: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdg2: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdg3: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdh1: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdh2: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: /dev/sdh3: Checksum error Aug 3 18:20:50 taft-03 lvm[7397]: Volume group "TAFT" not found Aug 3 18:20:50 taft-03 lvm[7397]: Repair of mirrored LV TAFT/ha3 failed. Aug 3 18:20:50 taft-03 lvm[7397]: Failed to remove faulty devices in TAFT-ha3. Aug 3 18:20:50 taft-03 lvm[7397]: Log device 253:7 has failed (D). Aug 3 18:20:50 taft-03 lvm[7397]: Device failure in TAFT-ha2. Aug 3 18:20:51 taft-03 lvm[7397]: device-mapper: waitevent ioctl failed: Interrupted system call Aug 3 18:20:51 taft-03 lvm[7397]: No longer monitoring mirror device TAFT-ha1 for events. Version-Release number of selected component (if applicable): 2.6.32-174.el6.x86_64 rgmanager-3.0.12.1-2.el6.x86_64 lvm2-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 lvm2-libs-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 lvm2-cluster-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 udev-147-2.35.el6 BUILT: Wed Mar 30 07:32:05 CDT 2011 device-mapper-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-libs-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-event-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-event-libs-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 cmirror-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011