Bug 1342786

Summary: [TestOnly] All VGs on passthrough disks are activated during the RHV-H boot.
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: ovirt-nodeAssignee: Douglas Schilling Landgraf <dougsland>
Status: CLOSED CURRENTRELEASE QA Contact: cshao <cshao>
Severity: high Docs Contact:
Priority: medium    
Version: 3.6.6CC: agk, cshao, dfediuck, dguo, dougsland, fdeutsch, gveitmic, huzhao, leiwang, lsurette, mgoldboi, mkalinin, nsoffer, qiyuan, rhodain, sbonazzo, weiwang, yaniwang, ycui, ykaul, ylavi, yzhao
Target Milestone: ovirt-3.6.11Keywords: TestOnly, ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-10 02:44:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1374545    
Bug Blocks:    

Description Roman Hodain 2016-06-05 10:01:17 UTC
Description of problem:
When RHEV-H is booted it activates all VG which are located all LUNs from the storage. These LUNs may be used as pass through disk and may contain LVM metadata of the guest systems.

Version-Release number of selected component (if applicable):
RHEV 3.6

How reproducible:
100%

Steps to Reproduce:
1. Attach a LUN to the node
2. Create a VG and LV on this node (Not a storage domain)
3. Reboot the hypervisor
4. Check if the LVM is activated

Actual results:
    The LVs on the LUNs are activated during the boot.

Expected results:
    Only VGs/LVS associated with host system and SDs are activated.

Additional info:
    As a result of this issue. The host boot can take tents of minutes. Depends on the number of LUNs, paths and LVM metadata on those LUNs.

This can be mitigated by correct lvm.conf parameters and the right kernel parameters.

RHEV-H should use the following kernel parameter as it only use LVM VG HostVG on the hypervisor. 

     rd.lvm.vg=HostVG

lvm.conf has to be modified as well so the VGs does not get activated during the system initialisation:

/etc/lvm/lvm.conf in section activation

     volume_list = [ "HostVG" ]

Comment 2 Roman Hodain 2016-06-05 10:18:56 UTC
Just a clarification for the initial post. The parameter is not 

volume_list but auto_activation_volume_list

Comment 3 Fabian Deutsch 2016-06-06 18:45:35 UTC
This probably just affects vintage Node.
In 4.0 the kernel arguments are defined by RHEL.

Comment 4 Roman Hodain 2016-06-09 07:23:52 UTC
(In reply to Fabian Deutsch from comment #3)
> This probably just affects vintage Node.
> In 4.0 the kernel arguments are defined by RHEL.

That is true and the kernel parameter should be set, but RHEL does not modify the lvm.conf. This is rather RHEV specific. I suppose that RHEL would suffer by this as well.

Comment 5 Fabian Deutsch 2016-06-16 13:50:23 UTC
In that case I suggest to move or clone this bug to RHEL as well.

Comment 6 Fabian Deutsch 2016-06-23 10:29:45 UTC
Roman, just to clarify, this bug is about limiting the initial VG activation to the VGs which are related to the system disks?

Comment 7 Roman Hodain 2016-07-04 12:37:05 UTC
(In reply to Fabian Deutsch from comment #6)
> Roman, just to clarify, this bug is about limiting the initial VG activation
> to the VGs which are related to the system disks?

Yes

Comment 8 cshao 2016-07-05 12:20:29 UTC
RHEV-H QE can reproduce this issue.

Test version
RHEV-H 7.2 for RHEV 3.6.6 (rhev-hypervisor7-7.2-20160517.0) 
ovirt-node-3.6.1-12.0.el7ev.noarch

Test steps:
1. Attach a LUN to the node
2. Create a VG and LV on this node (Not a storage domain)
 1) pvcreate /dev/xxx
 2) vgcreate vg100 /dev/xxx
 3) lvcreate -l 50 -n database vg100
3. Reboot the hypervisor
4. Check if the LVM is activated

Test result:
The LVs on the LUNs are activated during the boot.

# lvs
  LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  Config   HostVG -wi-ao----   4.99g                                                    
  Data     HostVG -wi-ao---- 274.81g                                                    
  Logging  HostVG -wi-ao----   2.00g                                                    
  Swap     HostVG -wi-ao----   7.64g                                                    
  database vg100  -wi-a----- 200.00m                                                    
[root@dhcp-8-139 admin]# pvs
  PV                                                  VG     Fmt  Attr PSize   PFree  
  /dev/dm-8                                           vg100  lvm2 a--    7.50g   7.30g
  /dev/mapper/Hitachi_HDT721032SLA380_STA2L7MT1ZZRKB4 HostVG lvm2 a--  289.84g 404.00m
[root@dhcp-8-139 admin]# vgs
  VG     #PV #LV #SN Attr   VSize   VFree  
  HostVG   1   4   0 wz--n- 289.84g 404.00m
  vg100    1   1   0 wz--n-   7.50g   7.30g

Comment 9 Douglas Schilling Landgraf 2016-07-21 18:40:52 UTC
Hi,

I have few comments:

First, introduce this change in zstream is kind of risky as involve vgs/lvs in the host or even might affect old customers that wants their vgs/lvs visible.

I would like to highlight is that in rhev-h vintage we *disable* lvmetad  service (rhbz#1147217) which is required to work the option: "auto_activation_volume_list". The "auto_activation_volume_list" can work also if called vgchange -aay in the boot sequence (after all PVs are visible) or via VDSM.

To validate such change, vgs probably not the best command, I would use lvs and look to 'a' (active) in Attr.

Roman, could you please test if in RHV-H 4.0 it works nicely? In the 4.0 we do not disable lvmetad and should be easier as RHEL for setup. 



Thanks!

Comment 11 Fabian Deutsch 2016-07-26 11:55:14 UTC
Nir, if we would only enable the Node specififc VG (HostVG) + LVs during boot.

Would vdsm then take care of activating an LV if it is used by a VM?

I'm thinking of the case where either an LV is used as a direct lun or storage domain - if those LVs are nto enabled during boot, would vdsm then enable those?

or does vdsm assume that all connected LVs are enabled?

Comment 12 Nir Soffer 2016-07-26 12:07:11 UTC
(In reply to Fabian Deutsch from comment #11)
> Nir, if we would only enable the Node specififc VG (HostVG) + LVs during
> boot.
> 
> Would vdsm then take care of activating an LV if it is used by a VM?

Sure.

> I'm thinking of the case where either an LV is used as a direct lun or
> storage domain - if those LVs are nto enabled during boot, would vdsm then
> enable those?

Yes

> or does vdsm assume that all connected LVs are enabled?

No, vdsm even deactivate lvs during startup, in case the system wrongly 
activate them.

Note that *no* lv should be activated by the system except the lvs required
for boot. Vdsm and only vdsm must activate and deactivate lvs belonging 
to vdsm storage.

We did not try this yet, but setting auto_activation_volume_list = []
in lvm.conf should make sure nothing is activate by mistake. Maybe you need
some lvs for node boot, this should be easy to do in node.

Comment 13 Fabian Deutsch 2016-07-26 13:09:50 UTC
Thanks Nir.

Then solely using

auto_activation_volume_list = [ "HostVG" ]

should limit the VG activation to the VG used for the root LV.
According to IRC all initscripts and systemd are using vgchange -aay (which respects auto_activation_volume_list).
We don't even need to use the karg in that case.

The question is which component is responsible for handling this.

As this configuration item is relevant to both RHVH and RHEL-H hosts, I'd say it should be done by vdsm.
OTOH vdsm does not know if an LVM is used for any disk needed to boot up the host.
On Node however we do know this-
Another component could be anaconda, because this is used in both cases (RHELH and RHVH) to setup the storage. Maybe an option to anaconda makes sense to limit VG activation to the VGs used/created during the installation.

Comment 14 Nir Soffer 2016-07-26 18:46:12 UTC
Yes, vdsm does not know anything about HostVG, so it cannot include it in
the autoactivation list.

For node 4.0, it would be best to handle this in node. In future vdsm may
configure lvm, so special setup on node may not be needed.

Comment 15 Fabian Deutsch 2016-12-20 11:51:18 UTC
Nir, is this the same issue as bug 1374545 ?

Comment 17 Yaniv Lavi 2017-01-10 11:04:03 UTC
Roman, please take a look at BZ #1374545 and suggest the workarounds to the customer.

Comment 20 Nir Soffer 2017-01-26 16:03:01 UTC
I think the only solution for 3.6 would be local lvm configuration:

1. Disable lvm2-lvmetad.service and socket
2. Set use_lvmetad = 0 in lvm.conf
3. Set a filter in lvm.conf, including only the devices used by the host

This hides all the other devices from lvm, so no lv will activated by default.

Comment 21 Douglas Schilling Landgraf 2017-01-28 04:59:06 UTC
(In reply to Nir Soffer from comment #20)
> I think the only solution for 3.6 would be local lvm configuration:
> 
> 1. Disable lvm2-lvmetad.service and socket
> 2. Set use_lvmetad = 0 in lvm.conf
> 3. Set a filter in lvm.conf, including only the devices used by the host
> 
> This hides all the other devices from lvm, so no lv will activated by
> default.

Thanks Nir for the clarification, the filter approach seems the only one that worked so far. In this case, in 3.6 will require manual steps/KCS. 

1. Create a VG and LV on this node 
 1) pvcreate /dev/sdb
 2) vgcreate vg100 /dev/sdb
 3) lvcreate -l 50 -n database vg100

2. Reboot the hypervisor  
   (In my case, had to add nompath in grub, to avoid multipath)

3. Check if the LVM is activated
    # vgs
    # lvs

To hide VG/LV from /dev/sdb, added in lvm.conf the filter:
  # vi /etc/lvm/lvm.conf
  filter = [ "r|/dev/sdb|" ]
  # persist /etc/lvm/lvm.conf
    - Reboot the hypervisor  

After reboot it doesn't appear in vgs and lvs command.
Moving to ON_QA for double check. Qin Yuan, could you please double check?

Thanks!

Comment 22 Yaniv Kaul 2017-01-28 07:36:10 UTC
I don't think disabling multipath is legit. We need and use it.

Comment 23 Douglas Schilling Landgraf 2017-01-29 03:59:10 UTC
(In reply to Yaniv Kaul from comment #22)
> I don't think disabling multipath is legit. We need and use it.

Thanks for pointing this Yaniv. I did new test now with multipath enabled and it worked as well. The initial test was to make sure the filter flag works in rhev-h.

# pvs
  PV                                  VG     Fmt  Attr PSize  PFree  
  /dev/mapper/QEMU_HARDDISK_QM00001p4 HostVG lvm2 a--  31.75g 404.00m
  /dev/mapper/QEMU_HARDDISK_QM00003   vg100  lvm2 a--  30.00g  29.80g


# vi /etc/lvm/lvm.conf
Added:
filter = [ "r|/dev/mapper/QEMU_HARDDISK_QM00003|" ]

# persist /etc/lvm/lvm.conf
# reboot

After reboot no more vg100 or db.

# pvs
  PV                                  VG     Fmt  Attr PSize  PFree  
  /dev/mapper/QEMU_HARDDISK_QM00001p4 HostVG lvm2 a--  31.75g 404.00m

# vgs
  VG     #PV #LV #SN Attr   VSize  VFree  
  HostVG   1   4   0 wz--n- 31.75g 404.00m

 lvs
  LV      VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  Config  HostVG -wi-ao----  8.00m                                                    
  Data    HostVG -wi-ao---- 21.18g                                                    
  Logging HostVG -wi-ao----  2.00g                                                    
  Swap    HostVG -wi-ao----  8.17g                   

Please let me know if more tests are needed from my side.

Comment 24 cshao 2017-02-07 06:36:16 UTC
The Target Milestone of this bug is set to 3.6.11, since there is no 3.6.11 build available to QE testing, so move back the bug to MODIFIED status.

Thanks.

Comment 25 Yihui Zhao 2017-03-30 06:58:28 UTC
Test version:
RHEV-H 7.3-20170324.0.el7ev

Test steps:

#1. Install RHVH-H via PXE
#2. 
[root@dell-per515-01 admin]# pvs
  PV                                              VG     Fmt  Attr PSize   PFree  
  /dev/mapper/360a9800050334c33424b41762d726954p4 HostVG lvm2 a--  190.75g 404.00m
  /dev/mapper/360a9800050334c33424b41762d736d45p1        lvm2 ---   99.00g  99.00g
  /dev/mapper/360a9800050334c33424b41762d745551p1        lvm2 ---   99.00g  99.00g
  /dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3        lvm2 ---    3.27t   3.27t

#3. vi /etc/lvm/lvm.conf
Added:
filter = [ "r|/dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3|" ]

#4. reboot
[root@dell-per515-01 admin]# pvs
  PV                                              VG     Fmt  Attr PSize   PFree  
  /dev/mapper/360a9800050334c33424b41762d726954p4 HostVG lvm2 a--  190.75g 404.00m
  /dev/mapper/360a9800050334c33424b41762d736d45p1        lvm2 ---   99.00g  99.00g
  /dev/mapper/360a9800050334c33424b41762d745551p1        lvm2 ---   99.00g  99.00g

After step4, there is no information about /dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3.

So, the bug status is verified.

Comment 26 Yihui Zhao 2017-03-30 07:16:55 UTC
(In reply to Yihui Zhao from comment #25)
> Test version:
> RHEV-H 7.3-20170324.0.el7ev
> 
> Test steps:
> 
> #1. Install RHVH-H via PXE
> #2. 
> [root@dell-per515-01 admin]# pvs
>   PV                                              VG     Fmt  Attr PSize  
> PFree  
>   /dev/mapper/360a9800050334c33424b41762d726954p4 HostVG lvm2 a--  190.75g
> 404.00m
>   /dev/mapper/360a9800050334c33424b41762d736d45p1        lvm2 ---   99.00g 
> 99.00g
>   /dev/mapper/360a9800050334c33424b41762d745551p1        lvm2 ---   99.00g 
> 99.00g
>   /dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3        lvm2 ---    3.27t  
> 3.27t
> 
> #3. vi /etc/lvm/lvm.conf
> Added:
> filter = [ "r|/dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3|" ]
> 
> #4. reboot
> [root@dell-per515-01 admin]# pvs
>   PV                                              VG     Fmt  Attr PSize  
> PFree  
>   /dev/mapper/360a9800050334c33424b41762d726954p4 HostVG lvm2 a--  190.75g
> 404.00m
>   /dev/mapper/360a9800050334c33424b41762d736d45p1        lvm2 ---   99.00g 
> 99.00g
>   /dev/mapper/360a9800050334c33424b41762d745551p1        lvm2 ---   99.00g 
> 99.00g
> 
> After step4, there is no information about
> /dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3.
> 
> So, the bug status is verified.

Add: 
Test version: 
ovirt-node-3.6.1-43.0.el7ev.noarch
RHEV-H 7.3-20170324.0.el7ev

Comment 27 Marina Kalinin 2017-05-09 23:48:33 UTC
Since this is a test only bug and it is verified by QA, can we close it current release of 3.6.11?
Fabian?

Comment 28 Ying Cui 2017-05-10 02:44:18 UTC
(In reply to Marina from comment #27)
> Since this is a test only bug and it is verified by QA, can we close it
> current release of 3.6.11?
> Fabian?

The 3.6.11 was shipped, I am closing the bug as the current release.

Comment 29 Marina Kalinin 2017-06-01 20:42:26 UTC
Related 4.x bug:
RHV-H starts very slowly when too many LUNs are connected to the host (lvm filter?)
https://bugzilla.redhat.com/show_bug.cgi?id=1400446

Comment 30 Marina Kalinin 2017-06-01 20:52:13 UTC
Based on the two comments 20 and 25, I understand that the workaround needed for RHEV 3.6 is creating a filter in /etc/lvm/lvm.conf, persist the file and done.
Thus, kcs  https://access.redhat.com/solutions/2662261  should be sufficient.
IMO.

(In reply to Nir Soffer from comment #20)
> I think the only solution for 3.6 would be local lvm configuration:
> 
> 1. Disable lvm2-lvmetad.service and socket
> 2. Set use_lvmetad = 0 in lvm.conf
> 3. Set a filter in lvm.conf, including only the devices used by the host
> 
> This hides all the other devices from lvm, so no lv will activated by
> default.

(In reply to Yihui Zhao from comment #25)
> Test version:
> RHEV-H 7.3-20170324.0.el7ev
> 
> Test steps:
> 
> #1. Install RHVH-H via PXE
> #2. 
> [root@dell-per515-01 admin]# pvs
>   PV                                              VG     Fmt  Attr PSize  
> PFree  
>   /dev/mapper/360a9800050334c33424b41762d726954p4 HostVG lvm2 a--  190.75g
> 404.00m
>   /dev/mapper/360a9800050334c33424b41762d736d45p1        lvm2 ---   99.00g 
> 99.00g
>   /dev/mapper/360a9800050334c33424b41762d745551p1        lvm2 ---   99.00g 
> 99.00g
>   /dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3        lvm2 ---    3.27t  
> 3.27t
> 
> #3. vi /etc/lvm/lvm.conf
> Added:
> filter = [ "r|/dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3|" ]
> 
> #4. reboot
> [root@dell-per515-01 admin]# pvs
>   PV                                              VG     Fmt  Attr PSize  
> PFree  
>   /dev/mapper/360a9800050334c33424b41762d726954p4 HostVG lvm2 a--  190.75g
> 404.00m
>   /dev/mapper/360a9800050334c33424b41762d736d45p1        lvm2 ---   99.00g 
> 99.00g
>   /dev/mapper/360a9800050334c33424b41762d745551p1        lvm2 ---   99.00g 
> 99.00g
> 
> After step4, there is no information about
> /dev/mapper/36b8ca3a0e7899a001dfd500516473f47p3.
> 
> So, the bug status is verified.