Bug 1009812
Summary: | LVM logical volumes on FC SDs are activated automatically after hypervisor reboot | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Roman Hodain <rhodain> | |
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | |
Status: | CLOSED ERRATA | QA Contact: | Gadi Ickowicz <gickowic> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 3.2.0 | CC: | abaron, acanan, amureini, bazulay, dgibson, ebenahar, fsimonce, iheim, lpeer, lyarwood, nlevinki, nsoffer, pablo.iranzo, pbandark, pzhukov, rhodain, scohen, sputhenp, yeylon | |
Target Milestone: | --- | Keywords: | Triaged, ZStream | |
Target Release: | 3.3.0 | Flags: | amureini:
Triaged+
|
|
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | storage | |||
Fixed In Version: | is25 | Doc Type: | Bug Fix | |
Doc Text: |
When a hypervisor was rebooted, all logical volumes which were part of an FC storage domain were automatically activated. This caused some issues as logical volumes should by activated only on request of the engine, and deactivated immediately when they are not needed. These logical volumes did not pick changes done by the SPM on the storage, which could lead to data corruption when a virtual machine wrote to the logical volume with stale metadata. The fix checks all VDSM logical volumes during LVM bootstrap and deactivates them if possible. Special logical volumes are refreshed, since they are accessed early when
connecting to storage pool, before LVM bootstrap is done. Open logical volumes are skipped because they use correct metadata when opened.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1033123 (view as bug list) | Environment: | ||
Last Closed: | 2014-01-21 16:16:04 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1033123, 1038284 |
Description
Roman Hodain
2013-09-19 08:12:41 UTC
Suggested change attached to force the refresh of active LVs before use. Not sure how valid this approach is over deactivating volumes when the host initially connects to a domain... The fix will include setting flags for the vgs/lvs so they won't be activated on boot. Yeela, from comment 10 I gather that the enclosed patch is not a proper fix for this issue? If so, please remove it from the external tracker. The patch is required but not sufficient as it solves the symptom - what to do when the lv is already active (which we can reach in other ways as well which is why it is needed) and not the underlying problem - prevent the LV from being active to begin with. LVM supports changing this configuration Need to make sure to take care both of newly created VGs/LVs and existing ones. Please specify which RHEL version is installed on the hosts. If the version is earlier then 6.4, please attach these files from one of the hosts: /etc/rc.sysinit /etc/init.d/netfs Workaround: On RHEL 6.4 or later, we can prevent auto-activation of vdsm volumes by specifying which volume will be auto-activated. Any other volumes will not be auto-activated. To specify which volumes should be auto-activated, edit this line in /etc/lvm/lvm.conf: auto_activation_volume_list = ["vg0"] Where "vg0" is the name of the system lvm volume group created during installation. For example, on a system when this workaround was tested: # vgs VG #PV #LV #SN Attr VSize VFree test 2 2 0 wz--n- 39.99g 37.99g vg0 1 3 0 wz--n- 465.27g 0 If the hosts have other volume groups beside vdsm volume groups, you must add them to the auto_activation_volume_list as well, or they will not be activate on boot. Background: On FC system, physical volumes are connected to the system early on boot, when rc.sysinit or netfs run. These scripts perform auto-activation of all lvm volume groups, which activates all lvs on shared storage. From vdsm point of view, all logical voluems *must* be deactivated until they are used. Currently when vdsm is trying to activate a logical volume and it is already active it does nothing. If the logical volume was modified by the SPM, this logical volume meta data is now wrong, which may lead to data corruption when writing to the volume. We plan to fix this issue by deactivating vdsm volumes during boot. (In reply to Nir Soffer from comment #14) > Please specify which RHEL version is installed on the hosts. > > If the version is earlier then 6.4, please attach these files from one of > the hosts: > > /etc/rc.sysinit > /etc/init.d/netfs Hi, it was RHEL 6.4 The complete solution includes: - Deactivate unused lvs when service is started: This handle the root cause, lvs auto-activated during boot. This patch also ensure that there are no active lvs after unclean shutdown of the process. With this patch, we should not see unused active lvs under normal conditions. http://gerrit.ovirt.org/#/c/21291/ - Refresh active lvs when activating volumes Without the previous patch, this ensure that we do not use an active lv without refreshing it. With the previous patch, this serve as a second layer of protection, ensuring correctness even in abnormal condition where lv is left active when it should not. http://gerrit.ovirt.org/#/c/21387 (In reply to Nir Soffer from comment #19) > The complete solution includes: > > [..] > > - Refresh active lvs when activating volumes > Without the previous patch, this ensure that we do not use an active lv > without refreshing it. With the previous patch, this serve as a second > layer of protection, ensuring correctness even in abnormal condition > where lv is left active when it should not. > http://gerrit.ovirt.org/#/c/21387 I've just NACK'd the upstream patch for this part. Thus far my testing on F19 has shown that lvchange --refresh doesn't always result in a volume being correctly updated. I'd like to look into this more and repeat this downstream before verifying the change. In addition the change now depends on the 'Single shot prepare' change [1] to avoid multiple refresh / activation calls. AFAIK this isn't viable for 3.2.z. The plan was to have this fixed in 3.2.5 but given the above I think we need to change our approach here. My suggestion at this point is to split this BZ in two, leaving Nir's deactivation patchset targeted for 3.2.5 with this bug and moving my refresh patchset to a new bug targeted at 3.3 or 3.3.z. Nir, would this be acceptable? [1] http://gerrit.ovirt.org/#/c/4220/ - One shot prepare (In reply to Lee Yarwood from comment #22) > (In reply to Nir Soffer from comment #19) > Nir, would this be acceptable? Yes - for fixing the issue of lvs auto-activated during boot, patch http://gerrit.ovirt.org/#/c/21291 is enough. verified using is29 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html |