Bug 1009812

Summary: LVM logical volumes on FC SDs are activated automatically after hypervisor reboot
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Gadi Ickowicz <gickowic>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.2.0CC: abaron, acanan, amureini, bazulay, dgibson, ebenahar, fsimonce, iheim, lpeer, lyarwood, nlevinki, nsoffer, pablo.iranzo, pbandark, pzhukov, rhodain, scohen, sputhenp, yeylon
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 3.3.0Flags: amureini: Triaged+
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: is25 Doc Type: Bug Fix
Doc Text:
When a hypervisor was rebooted, all logical volumes which were part of an FC storage domain were automatically activated. This caused some issues as logical volumes should by activated only on request of the engine, and deactivated immediately when they are not needed. These logical volumes did not pick changes done by the SPM on the storage, which could lead to data corruption when a virtual machine wrote to the logical volume with stale metadata. The fix checks all VDSM logical volumes during LVM bootstrap and deactivates them if possible. Special logical volumes are refreshed, since they are accessed early when connecting to storage pool, before LVM bootstrap is done. Open logical volumes are skipped because they use correct metadata when opened.
Story Points: ---
Clone Of:
: 1033123 (view as bug list) Environment:
Last Closed: 2014-01-21 16:16:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1033123, 1038284    

Description Roman Hodain 2013-09-19 08:12:41 UTC
Description of problem:
When a hypervisor is rebooted all LVs which are part of a FC storage domain are automatically activated. This can cause an issues as logical volumes should by activated only on request of the engine if needed (VM start) and deactivate immediately when the LVs are not needed (VM stop).

Version-Release number of selected component (if applicable):
vdsm-4.10.2-25.0

How reproducible:
100%

Steps to Reproduce:
1. Create a new FC storage domain
2. Create a VM with a disk so some LVs are created
3. Put the hypervisor to maintenance mode and restart it.
4. Log on the hypervisor and check the status of the LVs

Actual results:
All LVs within the FC storage domain are activated

Expected results:
No LVs within the FC storage domain are activated

Additional info:
This issue is related only to FC SDs as the LVs are activated by LVM-monitor that starts in the early boot. The LVM monitor is started as the first service. iSCSI i snot affected as the iSCSI daemon starts later.

Comment 8 Lee Yarwood 2013-10-04 13:44:45 UTC
Suggested change attached to force the refresh of active LVs before use. Not sure how valid this approach is over deactivating volumes when the host initially connects to a domain...

Comment 10 Yeela Kaplan 2013-10-09 09:17:43 UTC
The fix will include setting flags for the vgs/lvs so they won't be activated on boot.

Comment 11 Allon Mureinik 2013-10-16 08:48:54 UTC
Yeela, from comment 10 I gather that the enclosed patch is not a proper fix for this issue?
If so, please remove it from the external tracker.

Comment 12 Ayal Baron 2013-10-17 07:50:20 UTC
The patch is required but not sufficient as it solves the symptom - what to do when the lv is already active (which we can reach in other ways as well which is why it is needed) and not the underlying problem - prevent the LV from being active to begin with.
LVM supports changing this configuration

Comment 13 Ayal Baron 2013-10-17 07:52:06 UTC
Need to make sure to take care both of newly created VGs/LVs and existing ones.

Comment 14 Nir Soffer 2013-10-30 10:24:57 UTC
Please specify which RHEL version is installed on the hosts.

If the version is earlier then 6.4, please attach these files from one of the hosts:

/etc/rc.sysinit
/etc/init.d/netfs

Comment 15 Nir Soffer 2013-10-30 10:32:38 UTC
Workaround:

On RHEL 6.4 or later, we can prevent auto-activation of vdsm volumes by specifying which volume will be auto-activated. Any other volumes will not be auto-activated.

To specify which volumes should be auto-activated, edit this line in /etc/lvm/lvm.conf:

    auto_activation_volume_list = ["vg0"]

Where "vg0" is the name of the system lvm volume group created during installation.

For example, on a system when this workaround was tested:

# vgs
  VG   #PV #LV #SN Attr   VSize   VFree 
  test   2   2   0 wz--n-  39.99g 37.99g
  vg0    1   3   0 wz--n- 465.27g     0 

If the hosts have other volume groups beside vdsm volume groups, you must add them to the auto_activation_volume_list as well, or they will not be activate on boot.


Background:

On FC system, physical volumes are connected to the system early on boot, when rc.sysinit or netfs run. These scripts perform auto-activation of all lvm volume groups, which activates all lvs on shared storage.

From vdsm point of view, all logical voluems *must* be deactivated until they are used. Currently when vdsm is trying to activate a logical volume and it is already active it does nothing. If the logical volume was modified by the SPM, this logical volume meta data is now wrong, which may lead to data corruption when writing to the volume.

We plan to fix this issue by deactivating vdsm volumes during boot.

Comment 16 Roman Hodain 2013-11-12 07:08:48 UTC
(In reply to Nir Soffer from comment #14)
> Please specify which RHEL version is installed on the hosts.
> 
> If the version is earlier then 6.4, please attach these files from one of
> the hosts:
> 
> /etc/rc.sysinit
> /etc/init.d/netfs

Hi,

it was RHEL 6.4

Comment 19 Nir Soffer 2013-11-19 07:31:24 UTC
The complete solution includes:

- Deactivate unused lvs when service is started:
  This handle the root cause, lvs auto-activated during boot. This patch
  also ensure that there are no active lvs after unclean shutdown of the
  process. With this patch, we  should not see unused active lvs under 
  normal conditions.
  http://gerrit.ovirt.org/#/c/21291/

- Refresh active lvs when activating volumes
  Without the previous patch, this ensure that we do not use an active lv
  without refreshing it. With the previous patch, this serve as a second
  layer of protection, ensuring correctness even in abnormal condition 
  where lv is left active when it should not.
  http://gerrit.ovirt.org/#/c/21387

Comment 22 Lee Yarwood 2013-11-21 10:46:01 UTC
(In reply to Nir Soffer from comment #19)
> The complete solution includes:
>
> [..]
>
> - Refresh active lvs when activating volumes
>   Without the previous patch, this ensure that we do not use an active lv
>   without refreshing it. With the previous patch, this serve as a second
>   layer of protection, ensuring correctness even in abnormal condition 
>   where lv is left active when it should not.
>   http://gerrit.ovirt.org/#/c/21387

I've just NACK'd the upstream patch for this part. Thus far my testing on F19 has shown that lvchange --refresh doesn't always result in a volume being correctly updated. I'd like to look into this more and repeat this downstream before verifying the change.

In addition the change now depends on the 'Single shot prepare' change [1] to avoid multiple refresh / activation calls. AFAIK this isn't viable for 3.2.z.

The plan was to have this fixed in 3.2.5 but given the above I think we need to change our approach here. My suggestion at this point is to split this BZ in two, leaving Nir's deactivation patchset targeted for 3.2.5 with this bug and moving my refresh patchset to a new bug targeted at 3.3 or 3.3.z.

Nir, would this be acceptable?

Comment 23 Lee Yarwood 2013-11-21 10:47:59 UTC
[1] http://gerrit.ovirt.org/#/c/4220/ - One shot prepare

Comment 24 Nir Soffer 2013-11-21 11:46:53 UTC
(In reply to Lee Yarwood from comment #22)
> (In reply to Nir Soffer from comment #19)
> Nir, would this be acceptable?

Yes - for fixing the issue of lvs auto-activated during boot, patch  http://gerrit.ovirt.org/#/c/21291 is enough.

Comment 26 Aharon Canan 2013-12-25 16:29:34 UTC
verified using is29

Comment 27 errata-xmlrpc 2014-01-21 16:16:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html