Bug 855398
Summary: | CLVM: Stacking volume groups on cluster mirrors does not work | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jonathan Earl Brassow <jbrassow> | ||||
Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.3 | CC: | agk, cmarthal, coughlan, dwysocha, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | lvm2-2.02.98-2.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
A regression since RHEL6.0 has caused it to be impossible to create volume groups on top of clustered mirror logical volumes; that is, to recursively stack cluster volume groups. This was caused by an improper restriction placed on only mirror logical volumes that caused them to be ignored during activation. The restriction has been refined to pass over only mirrors that could cause LVM commands to block indefinitely. It is now possible to layer clustered volume groups on cluster mirror logical volumes again.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-21 08:13:29 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jonathan Earl Brassow
2012-09-07 15:31:46 UTC
Trying this on local volume groups fails when clvmd is used... 'local_top' is a local volume group built on top of two single machine mirrors. The config file has 'locking_type = 3' set. ## First, try without going through clvmd [root@hayes-02 ~]# lvcreate -i 2 -L 500M -n stripe local_top --config 'global{locking_type=1}' Using default stripesize 64.00 KiB Rounding size (125 extents) up to stripe boundary size (126 extents) Logical volume "stripe" created [root@hayes-02 ~]# lvremove -ff local_top --config 'global{locking_type=1}' Logical volume "stripe" successfully removed *SUCCESS* ## Next try going through clvmd (remember, the VGs here are still non-clustered) [root@hayes-02 ~]# lvcreate -i 2 -L 500M -n stripe local_top Using default stripesize 64.00 KiB Rounding size (125 extents) up to stripe boundary size (126 extents) Error locking on node hayes-02: Volume group for uuid not found: E5kLy2Qw9Rp6nww0xm1H7BMHCwweLz4EqPF1PKMMueJs84r6Xk5gOBzTDM0WayOI Failed to activate new LV. Error locking on node hayes-02: Volume group for uuid not found: E5kLy2Qw9Rp6nww0xm1H7BMHCwweLz4EqPF1PKMMueJs84r6Xk5gOBzTDM0WayOI Unable to deactivate failed new LV. Manual intervention required. *FAILURE* '_ignore_suspended_devices' is being set for clvmd, but not for local lvcreate/lvremove. This causes lvm2/lib/activate/dev_manager.c:device_is_usable(): 'if (target_type && !strcmp(target_type, "mirror") && ignore_suspended_devices())' to trigger. The test in comment 3 could be ignored when the mirror device is not in the same volume group as the logical volume being created/activated. That information is not available to device_is_usable() though. lvm2/daemons/clvmd/lvm-functions.c:do_refresh_cache() calls init_ignore_suspended_devices(1). Then later in when lib/commands/toolcontext.c:refresh_filters() calls it again with the saved value from do_refresh_cache(). Created attachment 631760 [details]
Fix for problem - awaiting review
From patch header:
cluser mirror: Allow VGs to be built on cluster mirrors
While it is possible to create VGs on top of cluster mirrors,
it is currently worthless to do so because no LVs can be created.
This is not a limitation of 'locking_type = 1' LVM. IOW, you can
happily stack a VG on top of a single machine LV of 'mirror'
segment type.
The disconnect comes because of the way 'ignore_suspended_devices'
is set. That is, it is not set during lvcreate/lvremove when
running 'locking_type = 1' (i.e. single machine). However, it is set
- every time - when 'locking_type = 3' and the activation is sent
through clvmd.
'ignore_suspended_devices' is meant to avoid reading any DM device
that is suspended. However, a mirror device can block I/O for a couple
reasons. The first is because it is suspended. The second is because
it has a unaddressed device failure. The first case would be already
addressed by the generic rejection of all DM devices that are suspended.
The second is not addressed at all by also rejecting mirror devices if
'ignore_suspended_devices' is set. Therefore, this chunk of code is
pointless. It also is the cause of not being able to use mirrors as
a source for VG stacking.
Fix committed upstream: commit 9fd7ac7d035f0b2f8dcc3cb19935eb181816bd76 Author: Jonathan Brassow <jbrassow> Date: Tue Oct 23 23:10:33 2012 -0500 mirror: Avoid reading from mirrors that have failed devices Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors), and other issues. The LVM code attempts to avoid reading labels from devices that are suspended to try to avoid situations that may cause the commands to block indefinitely. When scanning devices, 'ignore_suspended_devices' can be set so the code (lib/activate/dev_manager.c:device_is_usable()) checks any DM devices it finds and avoids them if they are suspended. The mirror target has an additional mechanism that can cause I/O to be blocked. If a device in a mirror fails, all I/O will be blocked by the kernel until a new table (a linear target or a mirror with replacement devices) is loaded. The mirror indicates that this condition has happened by marking a 'D' for the faulty device in its status output. This condition must also be checked by 'device_is_usable()' to avoid the possibility of blocking LVM commands indefinitely due to an attempt to read the blocked mirror for labels. Until now, mirrors were avoided if the 'ignore_suspended_devices' condition was set. This check seemed to suggest, "if we are concerned about suspended devices, then let's ignore mirrors altogether just in case". This is insufficient and doesn't solve any problems. All devices that are suspended are already avoided if 'ignore_suspended_devices' is set; and if a mirror is blocking because of an error condition, it will block the LVM command regardless of the setting of that variable. Rather than avoiding mirrors whenever 'ignore_suspended_devices' is set, this patch causes mirrors to be avoided whenever they are blocking due to an error. (As mentioned above, the case where a DM device is suspended is already covered.) This solves a number of issues that weren' handled before. For example, pvcreate (or any command that does a pv_read or vg_read, which eventually call device_is_usable()) will be protected from blocked mirrors regardless of how 'ignore_suspended_devices' is set. Additionally, a mirror that is neither suspended nor blocking is /allowed/ to be read regardless of how 'ignore_suspended_devices' is set. (The latter point being the source of the fix for rhbz855398.) Comment on attachment 631760 [details]
Fix for problem - awaiting review
patch made obsolete by better patch committed upstream.
QA test requirements: 1) Create cluster VG with two cluster mirror LVs 2) pvcreate then vgcreate a new VG on top of the cmirror LVs 3) attempt to create a striped LV in the top-level VG #3 fails w/o the fix, and succeeds with the fix. [If lvm.conf:'locking_type=3', this bug would be triggered whether the VGs were single machine or not. IOW, it doesn't matter if you are testing with cmirror or not. It does matter if the activation requests are going through clvmd (IOW, locking_type=3). So, testing with non-clustered VGs would be acceptable to if the locking_type is set to '3'. Unit test: [root@bp-01 lvm2]# lvcreate -m1 -L 5G -n m1 vg Logical volume "m1" created [root@bp-01 lvm2]# lvcreate -m1 -L 5G -n m2 vg Logical volume "m2" created [root@bp-01 lvm2]# pvcreate /dev/vg/m m1 m2 [root@bp-01 lvm2]# pvcreate /dev/vg/m m1 m2 [root@bp-01 lvm2]# pvcreate /dev/vg/m* Physical volume "/dev/vg/m1" successfully created Physical volume "/dev/vg/m2" successfully created [root@bp-01 lvm2]# vgcreate top /dev/vg/m* Clustered volume group "top" successfully created [root@bp-01 lvm2]# lvcreate -i 2 -L 1G -n stripe top Using default stripesize 64.00 KiB Logical volume "stripe" created [root@bp-01 lvm2]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert stripe top -wi-a---- 1.00g m1 vg mwi-aom-- 5.00g m1_mlog 100.00 m2 vg mwi-aom-- 5.00g m2_mlog 100.00 lv_home vg_bp01 -wi-ao--- 407.43g lv_root vg_bp01 -wi-ao--- 50.00g lv_swap vg_bp01 -wi-ao--- 7.84g [root@bp-01 lvm2]# pvs PV VG Fmt Attr PSize PFree /dev/sda2 vg_bp01 lvm2 a-- 465.27g 0 /dev/sdb1 vg lvm2 a-- 1.09t 1.08t /dev/sdc1 vg lvm2 a-- 1.09t 1.08t /dev/sdd1 vg lvm2 a-- 1.09t 1.09t /dev/sde1 vg lvm2 a-- 1.09t 1.09t /dev/sdf1 vg lvm2 a-- 1.09t 1.09t /dev/sdg1 vg lvm2 a-- 1.09t 1.09t /dev/sdh1 vg lvm2 a-- 1.09t 1.09t /dev/sdi1 vg lvm2 a-- 1.09t 1.09t /dev/vg/m1 top lvm2 a-- 5.00g 4.50g /dev/vg/m2 top lvm2 a-- 5.00g 4.50g POST -> ASSIGNED. While running QA's sts test suite to look for another bug, I stumbled on a complication for the patches for this bug. Corey's test often poll doing 'pvs's while preforming the tests and killing devices. It isn't /that/ hard to get his tests to hang on a 'pvs' when a mirrored-log device goes bad. This means that LVM hangs indefinitely, because the mirrors cannot get their chance to repair. It is a very tough exercise to get out of. This was a know issue going in and is documented in the comments for the upstream patch for the check-ins associated with this bug: + * _mirrored_transient_status(). FIXME: It is unable to handle mirrors + * with mirrored logs because it does not have a way to get the status of + * the mirror that forms the log, which could be blocked. I now consider it essential to be able to recurse a mirrored log and determine it's status as well. Here is an example of that illustrates comment 12 : [root@bp-01 lvm2]# !lvcre lvcreate -m 1 --mirrorlog mirrored -L 200M -n lv vg Logical volume "lv" created [root@bp-01 lvm2]# devices vg LV Cpy%Sync Devices lv 100.00 lv_mimage_0(0),lv_mimage_1(0) [lv_mimage_0] /dev/sdb1(0) [lv_mimage_1] /dev/sdc1(0) [lv_mlog] 100.00 lv_mlog_mimage_0(0),lv_mlog_mimage_1(0) [lv_mlog_mimage_0] /dev/sdh1(0) [lv_mlog_mimage_1] /dev/sdi1(0) [root@bp-01 lvm2]# killall dmeventd [root@bp-01 lvm2]# off.sh sdi Turning off sdi [root@bp-01 lvm2]# !dd dd if=/dev/zero of=/dev/vg/lv bs=4M count=10 & [1] 8725 [root@bp-01 lvm2]# dmsetup status vg-lv vg-lv_mlog vg-lv: 0 409600 mirror 2 253:6 253:7 400/400 1 AA 3 disk 253:5 A vg-lv_mlog: 0 8192 mirror 2 253:3 253:4 7/8 1 AD 1 core [root@bp-01 lvm2]# pvs hang hang hang The 'pvs' cannot proceed because it is trying to read the mirror that contains a failed mirror log. You can see that this is tricky, because the log is blocking but doesn't register as failed in the 'vg-lv' status. Additional fix checked-in upstream: commit b248ba0a396d7fc9a459eea02cfdc70b33ce3441 Author: Jonathan Brassow <jbrassow> Date: Thu Oct 25 00:42:45 2012 -0500 mirror: Avoid reading mirrors with failed devices in mirrored log Commit 9fd7ac7d035f0b2f8dcc3cb19935eb181816bd76 did not handle mirrors that contained mirrored logs. This is because the status line of the mirror does not give an indication of the health of the mirrored log, as you can see here: [root@bp-01 lvm2]# dmsetup status vg-lv vg-lv_mlog vg-lv: 0 409600 mirror 2 253:6 253:7 400/400 1 AA 3 disk 253:5 A vg-lv_mlog: 0 8192 mirror 2 253:3 253:4 7/8 1 AD 1 core Thus, the possibility for LVM commands to hang still persists when mirror have mirrored logs. I discovered this while performing some testing that does polling with 'pvs' while doing I/O and killing devices. The 'pvs' managed to get between the mirrored log device failure and the attempt by dmeventd to repair it. The result was a very nasty block in LVM commands that is very difficult to remove - even for someone who knows what is going on. Thus, it is absolutely essential that the log of a mirror be recursively checked for mirror devices which may be failed as well. Despite what the code comment says in the aforementioned commit... + * _mirrored_transient_status(). FIXME: It is unable to handle mirrors + * with mirrored logs because it does not have a way to get the status of + * the mirror that forms the log, which could be blocked. ... it is possible to get the status of the log because the log device major/minor is given to us by the status output of the top-level mirror. We can use that to query the log device for any DM status and see if it is a mirror that needs to be bypassed. This patch does just that and is now able to avoid reading from mirrors that have failed devices in a mirrored log. Unit test showing that the commit in comment 14 clears the objection raise in comment 12: [root@bp-01 lvm2]# lvcreate -m 1 --mirrorlog mirrored -L 200M -n lv vg Logical volume "lv" created [root@bp-01 lvm2]# devices vg LV Cpy%Sync Devices lv 100.00 lv_mimage_0(0),lv_mimage_1(0) [lv_mimage_0] /dev/sdb1(0) [lv_mimage_1] /dev/sdc1(0) [lv_mlog] 100.00 lv_mlog_mimage_0(0),lv_mlog_mimage_1(0) [lv_mlog_mimage_0] /dev/sdh1(0) [lv_mlog_mimage_1] /dev/sdi1(0) [root@bp-01 lvm2]# killall -9 dmeventd [root@bp-01 lvm2]# off.sh sdi Turning off sdi [root@bp-01 lvm2]# dd if=/dev/zero of=/dev/vg/lv bs=4M count=10 & [1] 4878 [root@bp-01 lvm2]# dmsetup status vg-lv vg-lv_mlog vg-lv: 0 409600 mirror 2 253:6 253:7 400/400 1 AA 3 disk 253:5 A vg-lv_mlog: 0 8192 mirror 2 253:3 253:4 7/8 1 AD 1 core [root@bp-01 lvm2]# pvs -vvvv >& out [root@bp-01 lvm2]# grep Mirror out #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. #activate/dev_manager.c:239 /dev/mapper/vg-lv_mlog: Mirror image 1 marked as failed #activate/dev_manager.c:358 /dev/mapper/vg-lv_mlog: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:239 253:5: Mirror image 1 marked as failed #activate/dev_manager.c:358 253:5: Mirror device vg-lv_mlog not usable. #activate/dev_manager.c:358 /dev/vg/lv: Mirror device vg-lv not usable. [root@bp-01 lvm2]# lvconvert --repair vg/lv /dev/sdi1: read failed after 0 of 512 at 1197851148288: Input/output error /dev/sdi1: read failed after 0 of 512 at 1197851234304: Input/output error /dev/sdi1: read failed after 0 of 512 at 0: Input/output error /dev/sdi1: read failed after 0 of 512 at 4096: Input/output error /dev/sdi1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid edQd0w-MMjR-xA2h-c1NF-ozv3-03YG-UoW3Xn. 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 170.554 s, 246 kB/s Mirror log status: 1 of 2 images failed. Attempt to replace failed mirror log? [y/n]: y Trying to up-convert to 2 images, 2 logs. [1]+ Done dd if=/dev/zero of=/dev/vg/lv bs=4M count=10 [root@bp-01 lvm2]# devices vg Couldn't find device with uuid edQd0w-MMjR-xA2h-c1NF-ozv3-03YG-UoW3Xn. LV Cpy%Sync Devices lv 100.00 lv_mimage_0(0),lv_mimage_1(0) [lv_mimage_0] /dev/sdb1(0) [lv_mimage_1] /dev/sdc1(0) [lv_mlog] 100.00 lv_mlog_mimage_0(0),lv_mlog_mimage_1(0) [lv_mlog_mimage_0] /dev/sdh1(0) [lv_mlog_mimage_1] /dev/sdg1(0) Marking verified based on Comment 9 and Comment 11 Additionally the test for Comment 15 was done on a single machine since log type mirrored is unavailable to cluster mirrors. Verified with: lvm2-2.02.98-6.el6.x86_64 lvm2-cluster-2.02.98-6.el6.x86_64 cmirror-2.02.98-6.el6.x86_64 device-mapper-1.02.77-6.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |