Bug 712829
Summary: | [LVM] when changing lv availability yes/no simultaneously,inactive lv's device may still be mapped under /dev/mapper/ | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | David Naori <dnaori> | ||||
Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||
Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.1 | CC: | abaron, agk, cpelland, danken, dnaori, dwysocha, hateya, heinzm, jbrassow, mbroz, mgoldboi, prajnoha, prockai, thornber, ykaul, zkabelac | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | lvm2-2.02.86-1.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
A problem appeared when running several commands in parallel that activated or deactivated an LV or a VG as a whole. The symlinks for LVs in /dev were created and removed incorrectly, causing them to exist when the device had already been removed or vice versa.
This problem was caused by a fact that during the activation there's no write lock held that would protect individual activation commands as a whole (there's no metadata change). Together with non-atomicity of checking udev operations, an improper decision was made in the code based on already stale information. This triggered a part of the code that tried to repair the symlinks as a fallback action.
To fix this, these checks are not done anymore by default and we fully rely on udev. However, the old functionality can still be used for diagnosing other udev related problems by setting a new 'verify_udev_operations' option found in 'activation' section of the lvm.conf file.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-12-06 16:59:24 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 713823 | ||||||
Attachments: |
|
So what happens here is that lvchange is not protected with a write lock, but read-only lock. This makes it possible for the lvchange commands to be processed in parallel together with node/symlink management code. In this case, when using udev, this scenario is possible: A: lvchange -ay, wait for udev to create symlinks under dev B: lvchange -an, wait for udev to remove symlinks under dev A: udev has created symlinks, notification received, unlocks the A process B: udev has removed symlinks, notification received, unlocks the B process A: calling "stat" to check whether udev has done its job right - but the symlink has just been removed by the B process before! So we fallback to direct symlink creation. B: ...the same as A, just the reverse situation... The main problem here is that udev wait + the "stat" called is not atomic. I'm afraid this is a hard problem to tackle simply or if possible at all. I'd suggest removing the fallback code altogether when using udev. This would make the solution much cleaner although we'd need to trust udev completely here. (udev serializes the events, so this problem should not occur with pure udev in play) I think the new udev code has proved itself now, and these checks can be retired (with an lvm.conf option to reenable them if someone does still need them as a workaround for some problem they encounter). The patches are upstream now (both for LVM2 and dmsetup): https://www.redhat.com/archives/lvm-devel/2011-June/msg00076.html https://www.redhat.com/archives/lvm-devel/2011-June/msg00077.html (LVM2 v2_02_86, libdevmapper/dmsetup 1.02_65) Adding QA ack for 6.2. Based on comment #0, a definitive reproducer w/o vdsm for this defect does not exist. This bug will mostly be marked verified (SanityOnly) once final 6.2 regression testing has been completed. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A problem appeared when running several commands in parallel that activated or deactivated an LV or a VG as a whole. The symlinks for LVs in /dev were created and removed incorrectly, causing them to exist when the device had already been removed or vice versa. This problem was caused by a fact that during the activation there's no write lock held that would protect individual activation commands as a whole (there's no metadata change). Together with non-atomicity of checking udev operations, an improper decision was made in the code based on already stale information. This triggered a part of the code that tried to repair the symlinks as a fallback action. To fix this, these checks are not done anymore by default and we fully rely on udev. However, the old functionality can still be used for diagnosing other udev related problems by setting a new 'verify_udev_operations' option found in 'activation' section of the lvm.conf file. Never heard back from RHEVM QE on how this fix affected their tests. Marking verified (SanityOnly). (In reply to comment #15) > Never heard back from RHEVM QE on how this fix affected their tests. Marking > verified (SanityOnly). Excuse me for that, i didn't notice the your needinfo request. the bug was found during creation of 100 vms pools via vdsm - tested with lvm2-2.02.87-1.el6.x86_64, all 100 vms were created successfully. Verified from our side. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1522.html |
Created attachment 504406 [details] vdsm's lvm commands. Description of problem: When changing lv avilability yes/no simultaneously- inactive lv's device may be mapped device under /dev/mapper/. (Found during creation of vm's volumes from template using vdsm over iscsi storage) # lvs: 72f36bbc-d81c-41bc-a6a9-4401d5cd94fb a1dc6df5-97f7-44e0-a267-b22d3987d3eb -ri--- 3.00g g #lvscan inactive '/dev/a1dc6df5-97f7-44e0-a267-b22d3987d3eb/72f36bbc-d81c-41bc-a6a9-4401d5cd94fb' [3.00 GiB] inherit #ll /dev/mapper/ brw-rw----. 1 root disk 253, 13 Jun 12 17:39 a1dc6df5--97f7--44e0--a267--b22d3987d3eb-72f36bbc--d81c--41bc--a6a9--4401d5cd94fb # ll /dev/a1dc6df5-97f7-44e0-a267-b22d3987d3eb/ 72f36bbc-d81c-41bc-a6a9-4401d5cd94fb -> /dev/mapper/a1dc6df5--97f7--44e0--a267--b22d3987d3eb-72f36bbc--d81c--41bc--a6a9--4401d5cd94fb # dmsetup ls 10077DAVID3 (253, 5) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-95b0f192--5cb9--4a38--9d35--d1cd4a9a469b (253, 13) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-fb71e0cd--cfb5--45e5--9442--c42a6d7d7f3f (253, 14) 10077DAVID2 (253, 4) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-ids (253, 8) 10077DAVID1 (253, 3) vg0-lv_home (253, 2) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-inbox (253, 9) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-leases (253, 7) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-metadata (253, 6) vg0-lv_swap (253, 1) vg0-lv_root (253, 0) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-master (253, 11) a1dc6df5--97f7--44e0--a267--b22d3987d3eb-outbox (253, 10) d47431bc-e757-404e-b828-9bfd3c3488ba::DEBUG::2011-06-12 17:38:37,993::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/10077DAVID1|/dev/mapper/10077DAVID2|/dev/mapper/10077DAVID3%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags a1dc6df5-97f7-44e0-a267-b22d3987d3eb' (cwd None) Thread-2106::DEBUG::2011-06-12 17:38:38,002::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' /dev/mapper/a1dc6df5--97f7--44e0--a267--b22d3987d3eb-72f36bbc--d81c--41bc--a6a9--4401d5cd94fb not set up by udev: Falling back to direct node creation.\n /dev/mapper/a1dc6df5--97f7--44e0--a267--b22d3987d3eb-72f36bbc--d81c--41bc--a6a9--4401d5cd94fb: open failed: No such device or address\n The link /dev/a1dc6df5-97f7-44e0-a267-b22d3987d3eb/72f36bbc-d81c-41bc-a6a9-4401d5cd94fb should had been created by udev but it was not found. Falling back to direct link creation.\ Version-Release number of selected component (if applicable): lvm2-2.02.83-3.el6.x86_64 device-mapper-1.02.62-3.el6.x86_64 Steps to Reproduce: 1.create 100 vms from template using vdsm on iscsi storage. *tried to reproduce without vdsm it happens rearly with this step: lvchange -a n david/david1 lvchange -a y david/david1 & lvchange -a y david/david1 & lvchange -a n david/david1 vdsm relay on the existence of the link to know if the lv is active, in this case vdsm will always fail to use this lv. Actual results: #lvchange -a n a1dc6df5-97f7-44e0-a267-b22d3987d3eb/72f36bbc-d81c-41bc-a6a9-4401d5cd94fb #ll /dev/mapper brw-rw----. 1 root disk 253, 13 Jun 12 17:39 a1dc6df5--97f7--44e0--a267--b22d3987d3eb-72f36bbc--d81c--41bc--a6a9--4401d5cd94fb sometimes lvchange -a n dose not remove the device under /dev/mapper if Fallback to direct link occur. Expected results: the device should be removed when deactivating the lv. Additional info: attached all lvm commands vdsm executed.