Bug 1261083 - Using LVM on a cinder volume exposes the data to the compute host [NEEDINFO]
Using LVM on a cinder volume exposes the data to the compute host
Status: ASSIGNED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder (Show other bugs)
7.0 (Kilo)
All Linux
high Severity high
: Upstream M2
: 14.0 (Rocky)
Assigned To: Eric Harney
Avi Avraham
: Triaged
: 1499044 (view as bug list)
Depends On:
Blocks: 1518969
  Show dependency treegraph
 
Reported: 2015-09-08 10:27 EDT by Jack Waterworth
Modified: 2018-05-19 22:14 EDT (History)
24 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
zkabelac: needinfo? (eharney)
jwaterwo: needinfo? (eharney)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3213311 None None None 2017-11-24 17:44 EST
Red Hat Knowledge Base (Solution) 3252081 None None None 2017-11-29 11:54 EST

  None (edit)
Description Jack Waterworth 2015-09-08 10:27:25 EDT
Description of problem:
Using LVM on a cinder volume on the instance causes the compute node to pick up the LVM at the host level

How reproducible:
Every time

Steps to Reproduce:
1. Create a new cinder volume and present it to an instance
2. Use LVM against the raw device (pvcreate/vgcreate/lvcreate)
3. Run 'lvs -o +devices' on the compute

Actual results:
LVM from the guest is seen on the host

Expected results:
host should not be able to see LVM from the guest

Additional info:
This can cause problems such as conflicting VG names on the compute. It can also cause the LVM on the compute to adjust metadata that the instance is not aware of, leading to things like missing volumes.

Current workaround is to set a filter on the compute node.
Comment 3 Jack Waterworth 2015-09-14 12:34:43 EDT
perhaps this is the answer? 

https://review.openstack.org/#/c/148747/

allow us to set an lvm.conf file in the cinder directory, which will be pushed to the compute nodes.
Comment 4 Eric Harney 2015-09-14 12:44:37 EDT
(In reply to Jack Waterworth from comment #3)

I don't think this will help, since it only affects what Cinder sees while managing LVM, and not the system.
Comment 5 Jack Waterworth 2015-10-08 11:31:16 EDT
I accidentally reproduced this on my home box:

[root@bulldozer ~]# lvs -o lv_name,vg_name,devices
  LV                                             VG             Devices                     
  root                                           centos         /dev/sda2(0)                
  swap                                           centos         /dev/sda2(12800)            
  _snapshot-fda38971-737c-45ef-a8e3-e3efa0f05ca9 cinder-volumes                             
  cinder-volumes-pool                            cinder-volumes cinder-volumes-pool_tdata(0)
  volume-0e367765-9631-43d3-8eab-6872360acbc8    cinder-volumes                             
  volume-211434f9-e947-413c-8395-7642a2ea29cb    cinder-volumes                             
  volume-28aa705b-425b-4f95-bf85-8c8bc6bf5806    cinder-volumes                             
  volume-34b23d5a-a72b-4ecc-a668-963159476517    cinder-volumes                             
  volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f    cinder-volumes                             
  volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248    cinder-volumes                             
  volume-62cc6d28-1619-4b69-86cd-66030c25693e    cinder-volumes                             
  volume-7d644152-08d1-4e82-b7f5-e597f860ca4b    cinder-volumes                             
  volume-973245e1-d184-432f-a223-90521d96cd65    cinder-volumes                             
  volume-9b058e04-268d-45f7-af30-a98f74216fa6    cinder-volumes                             
  volume-be1ff916-f326-4914-a6c8-444713c5e6d7    cinder-volumes                             
  volume-d26fc2fe-2243-4f25-9f08-9e371de9caf7    cinder-volumes                             
  volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b    cinder-volumes                             
  volume-eb86aad5-7bc0-4036-84a6-3293b8e95832    cinder-volumes                             
  jack                                           data           /dev/sdc1(0)                
[root@bulldozer ~]# ll /dev/disk/by-path/*06ff7c0bd248*
lrwxrwxrwx. 1 root root  9 Oct  1 23:16 /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0 -> ../../sdc
lrwxrwxrwx. 1 root root 10 Oct  1 22:44 /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0-part1 -> ../../sdc1

Here we can see that my compute node is logged into the iscsi target which brings in /dev/sdc. the compute node is able to scan this for LVM metadata, and it shows up in my lvs output.  the 'data' volume is actually the data volume for one of my instances.

While that is slightly annoying, the real issue is that if my compute node already had a volume group named 'data', LVM would conflict with the local storage.
Comment 7 Jack Waterworth 2015-10-08 13:58:46 EDT
I found that there is a new lvm feature being added to 7.2 that may be able to help this issue:

http://man7.org/linux/man-pages/man7/lvmsystemid.7.html

this could allow openstack to create a specific systemid to be used by the compute hosts, and the compute hosts should only access that systemid. any other volumes would be ignored.

however, reading through the page seems to indicate that the guests would need to be systemid aware, and the guests would have to set systemids on the devices.
Comment 8 Jack Waterworth 2015-10-08 14:00:21 EDT
I'm also experimenting with using a filter to ignore devices presented via iscsi. something like this:

global_filter = [ "r|/dev/disk/by-path/*.openstack.*|" ]

although this doesnt seem to work... still playing around with it
Comment 9 Sergey Gotliv 2015-12-17 05:33:06 EST
(In reply to Jack Waterworth from comment #5)
> I accidentally reproduced this on my home box:
> 
> [root@bulldozer ~]# lvs -o lv_name,vg_name,devices
>   LV                                             VG             Devices     
> 
>   root                                           centos         /dev/sda2(0)
> 
>   swap                                           centos        
> /dev/sda2(12800)            
>   _snapshot-fda38971-737c-45ef-a8e3-e3efa0f05ca9 cinder-volumes             
> 
>   cinder-volumes-pool                            cinder-volumes
> cinder-volumes-pool_tdata(0)
>   volume-0e367765-9631-43d3-8eab-6872360acbc8    cinder-volumes             
> 
>   volume-211434f9-e947-413c-8395-7642a2ea29cb    cinder-volumes             
> 
>   volume-28aa705b-425b-4f95-bf85-8c8bc6bf5806    cinder-volumes             
> 
>   volume-34b23d5a-a72b-4ecc-a668-963159476517    cinder-volumes             
> 
>   volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f    cinder-volumes             
> 
>   volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248    cinder-volumes             
> 
>   volume-62cc6d28-1619-4b69-86cd-66030c25693e    cinder-volumes             
> 
>   volume-7d644152-08d1-4e82-b7f5-e597f860ca4b    cinder-volumes             
> 
>   volume-973245e1-d184-432f-a223-90521d96cd65    cinder-volumes             
> 
>   volume-9b058e04-268d-45f7-af30-a98f74216fa6    cinder-volumes             
> 
>   volume-be1ff916-f326-4914-a6c8-444713c5e6d7    cinder-volumes             
> 
>   volume-d26fc2fe-2243-4f25-9f08-9e371de9caf7    cinder-volumes             
> 
>   volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b    cinder-volumes             
> 
>   volume-eb86aad5-7bc0-4036-84a6-3293b8e95832    cinder-volumes             
> 
>   jack                                           data           /dev/sdc1(0)
> 
> [root@bulldozer ~]# ll /dev/disk/by-path/*06ff7c0bd248*
> lrwxrwxrwx. 1 root root  9 Oct  1 23:16
> /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:
> volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0 -> ../../sdc
> lrwxrwxrwx. 1 root root 10 Oct  1 22:44
> /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:
> volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0-part1 -> ../../sdc1
> 
> Here we can see that my compute node is logged into the iscsi target which
> brings in /dev/sdc. the compute node is able to scan this for LVM metadata,
> and it shows up in my lvs output.  the 'data' volume is actually the data
> volume for one of my instances.
> 
> While that is slightly annoying, the real issue is that if my compute node
> already had a volume group named 'data', LVM would conflict with the local
> storage.

We are presuming that controllers and computes are deployed on the dedicated hosts. They should not have any other vgs and lvs except of those created and used by the Openstack deployment. If the operator decides to create his own local vgs or lvs then his responsibility to prevent name collisions. TBH, I don't see an issue here, do you?

Imagine that cloud operator decides to deploy 3rd party component on the controller host listening to the same port as Cinder api, would you ask me in that case to move cinder api to another port?!
Comment 10 Jack Waterworth 2016-01-04 10:11:50 EST
If an OpenStack admin provides an instance with a cinder volume to a customer, it is possible that that customer could create a volume group name that collides with a volume group created by the OpenStack admin.

Additionally, if the volume group can be detected at the host level (without collision) it is possible that operations could be submitted to the metadata on those devices, even by LVM itself.

The most common issue I see is that LVM will only see 1 out of 2 volumes, and will mark one of the volumes as missing within the metadata. This would bubble up to the instance and cause the device to be marked missing and possibly offline at the guest.
Comment 11 Eric Harney 2016-01-04 11:13:08 EST
This is something that we need to fix, I'm just not sure about the cleanest way to do it yet.

I suspect we need to modify the system LVM config to include an additional file that Cinder manipulates, and is set to ignore devices belong to Cinder volumes.
Comment 14 Jack Waterworth 2016-02-21 01:51:23 EST
found a workaround for this. In my example, sdg1 is the problematic disk. It is a cinder-volume being presented via iscsi. the VG "data" is actually a VG that belongs to one of my instances, and should not be seen on the compute.

I am using the following filter:

    global_filter = [ "r|.*openstack.*|" ]

This will cause lvm to completely ignore any device with 'openstack' in the name, including any devices that have symlinks with openstack int he name.

[root@bulldozer by-path(keystone_admin)]# pvs
  PV         VG             Fmt  Attr PSize   PFree 
  /dev/sda2  centos         lvm2 a--   64.19g 12.19g
  /dev/sdb1  cinder-volumes lvm2 a--  465.76g 23.07g
  /dev/sdg1  data           lvm2 a--  100.00g     0 
[root@bulldozer by-path(keystone_admin)]# pvscan --cache --config 'devices{global_filter = [ "r|.*openstack.*|" ]}'
[root@bulldozer by-path(keystone_admin)]# pvs --config 'devices{global_filter = [ "r|.*openstack.*|" ]}'
  PV         VG             Fmt  Attr PSize   PFree 
  /dev/sda2  centos         lvm2 a--   64.19g 12.19g
  /dev/sdb1  cinder-volumes lvm2 a--  465.76g 23.07g
[root@bulldozer by-path(keystone_admin)]#

this works due to the /dev/disk/by-path directory:

[root@bulldozer by-path(keystone_admin)]# ll /dev/disk/by-path/
total 0
lrwxrwxrwx. 1 root root  9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-1272b721-51cc-4a3f-818e-dde25d93ed8d-lun-0 -> ../../sdi
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-1272b721-51cc-4a3f-818e-dde25d93ed8d-lun-0-part1 -> ../../sdi1
lrwxrwxrwx. 1 root root  9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-47771a4f-9046-43f8-9f80-7e05185b0001-lun-0 -> ../../sdj
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-47771a4f-9046-43f8-9f80-7e05185b0001-lun-0-part1 -> ../../sdj1
lrwxrwxrwx. 1 root root  9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f-lun-0 -> ../../sdh
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f-lun-0-part1 -> ../../sdh1
lrwxrwxrwx. 1 root root  9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0 -> ../../sdg
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0-part1 -> ../../sdg1
lrwxrwxrwx. 1 root root  9 Feb 21 01:37 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a450fdb-b8d1-4ac0-85ee-9a07ad6d48f1-lun-0 -> ../../sdk
lrwxrwxrwx. 1 root root 10 Feb 21 01:37 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a450fdb-b8d1-4ac0-85ee-9a07ad6d48f1-lun-0-part1 -> ../../sdk1
lrwxrwxrwx. 1 root root  9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b-lun-0 -> ../../sdf
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b-lun-0-part1 -> ../../sdf1
lrwxrwxrwx. 1 root root  9 Feb 21 01:17 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-c9aeea86-4751-429c-a85f-3a959883f0ef-lun-0 -> ../../sdd
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-c9aeea86-4751-429c-a85f-3a959883f0ef-lun-0-part1 -> ../../sdd1
lrwxrwxrwx. 1 root root  9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b-lun-0 -> ../../sde
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b-lun-0-part1 -> ../../sde1
lrwxrwxrwx. 1 root root  9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-eb86aad5-7bc0-4036-84a6-3293b8e95832-lun-0 -> ../../sdc
lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-eb86aad5-7bc0-4036-84a6-3293b8e95832-lun-0-part1 -> ../../sdc1

If we use the --all flag in pvs, we can see that these devices are no longer even considered for utilization by LVM:

[root@bulldozer by-path(keystone_admin)]# pvs --all
  PV               VG             Fmt  Attr PSize   PFree 
  /dev/centos/root                     ---       0      0 
  /dev/centos/swap                     ---       0      0 
  /dev/data/jack                       ---       0      0 
  /dev/sda1                            ---       0      0 
  /dev/sda2        centos         lvm2 a--   64.19g 12.19g
  /dev/sda3                            ---       0      0 
  /dev/sdb1        cinder-volumes lvm2 a--  465.76g 23.07g
  /dev/sdc1                            ---       0      0 
  /dev/sdd1                            ---       0      0 
  /dev/sde1                            ---       0      0 
  /dev/sdf1                            ---       0      0 
  /dev/sdg1        data           lvm2 a--  100.00g     0 
  /dev/sdh1                            ---       0      0 
  /dev/sdi1                            ---       0      0 
  /dev/sdj1                            ---       0      0 
  /dev/sdk1                            ---       0      0 
[root@bulldozer by-path(keystone_admin)]# pvs --all --config 'devices{global_filter = [ "r|.*openstack.*|" ]}'
  PV               VG             Fmt  Attr PSize   PFree 
  /dev/centos/root                     ---       0      0 
  /dev/centos/swap                     ---       0      0 
  /dev/data/jack                       ---       0      0 
  /dev/sda1                            ---       0      0 
  /dev/sda2        centos         lvm2 a--   64.19g 12.19g
  /dev/sda3                            ---       0      0 
  /dev/sdb1        cinder-volumes lvm2 a--  465.76g 23.07g
Comment 15 Lee Yarwood 2016-07-07 13:53:25 EDT
IMHO we need to break this up into three separate bugs for docs, Packstack and Director.

For Director we can blacklist everything as LVM isn't used by the local hosts. 
For Packstack we can dynamically add local block devices to the whitelist while configuring the environment.
For docs we can just highlight the need to correctly configure LVM filters on any host running cinder-volume or nova-compute.

Eric, Jack, does that sound like a sane approach?
Comment 16 Vagner Farias 2016-07-07 14:29:51 EDT
When using Director, we should allow a configurable blacklist, even if the default would be blacklisting everything. This is required if the overcloud nodes are using LVM for some reason (eg. /var/lib/nova/instances on a SAN).
Comment 17 Jeff Peeler 2016-07-08 11:37:35 EDT
FYI, in director land the instack host already has disabled LVM because of the same issue.

https://review.openstack.org/#/c/248174/
Comment 18 Jeff Peeler 2016-07-08 11:38:56 EDT
Actually I guess it's not LVM entirely... sorry if that wasn't helpful.
Comment 19 Elise Gafford 2016-08-22 14:46:15 EDT
Moving this issue to 11, as it is unlikely to receive work in the RHOS 10 timeframe.
Comment 25 Jack Waterworth 2017-10-26 14:59:46 EDT
I spoke with the lvm core development engineer and he stated that systemid is extremely expensive and he does not recommend its usage at all.  He states that a filter is the best way to resolve this issue.

I propose that we enable a default reject filter for openstack director deployments.  Our deployments do not utilize LVM on the hosts.  Any customer that wants to start using LVM on the hosts will need to adjust the filter.

    global_filter = { "r|.*|" }

This will prevent this issue from being seen entirely.
Comment 26 John Pittman 2017-10-26 17:05:59 EDT
Jack, an alternate is adding 'volume_list = []' in lvm.conf.  That will prevent activation of all VGs, but the volumes will still be visible on the compute node (if that's desirable).
Comment 27 Jeff Peeler 2017-10-26 20:54:09 EDT
Using a global_filter is what was done a while back on the undercloud - https://review.openstack.org/#/c/343100/. The original fix disabled LVM completely and it turned out that some customers relied on it, so disabling LVM entirely might cause surprises in the overcloud as well.

I also recall trying to use systemid, but think that the guest requirements made this not a viable option.

Couldn't resist commenting, hope that's helpful.
Comment 28 Chris Fields 2017-10-31 10:36:11 EDT
*** Bug 1499044 has been marked as a duplicate of this bug. ***
Comment 29 John Pittman 2017-11-03 09:54:03 EDT
Thinking about this further, my earlier comment 26 may cause issues, for example if there are 10 guests that all have a VG named vg1.  They would not activate on the host, but they would cause a lot of chatter and warnings from LVM.  Also, the global_filter is something that would need to be tweaked if LVM usage changes on the system with time.  

This bug details an issue that has been around for quite some time, in RHEV and now in openstack.  Hopefully, as we move forward with the business, RHEL will be used in more hypervisors, not less.  So maybe being a part of a hypervisor setup should be the defining aspect of some settable parameter.  For example, have a setting in the hypervisor lvm.conf (which would be set by director or rhev at install): 'hypervisor_mode = 1'.  By setting this, all LVM volumes not created on the hypervisor would be completely ignored.  The details of how this is done would be under the hood and decided by the lvm team; all that would matter is that if the volume was not created on that system it would be ignored.  The only issue I can think of is volumes imported from other systems, but as mentioned, lvm team could probably solve that easily.
Comment 37 Jack Waterworth 2017-11-30 10:52:05 EST
(In reply to Andreas Karis from comment #35)
> We could indeed set up very restrictive default filter with Director from
> the get go. We do not support the LVM cinder backend, so the tripleo team
> could take this and just shut it down. Users will need to explicitly enable
> the filters, and thus we will reduce the number of support tickets here. We
> will initially get more support tickets for the disabled cinder LVM backend,
> but we can easily point customers to a well written part of our
> documentation. I *do* like this idea, and this is btw what the NFV team did
> for OVS DPDK and unsupported PMD drivers - simply disable them.

From what Ive been told in the past, making a *block all LVM* change would be pushed upstream. This would be fine for RHOSP that doesnt use LVM by default, but it could potentially break other distros that DO use LVM.  Im not sure how much truth there is to this or if there is any way to work around those kind of issues.

(In reply to Andreas Karis from comment #35)
> Another solution, if feasible, could be that we implement logic in the
> cinder LVM volume driver which verifies that lvm.conf is set up with the
> correct filters, and otherwise the cinder LVM driver will throw an exception
> and fail with a very clear error message. 

This issue occurs outside of the LVMIscsi cinder driver.  While using this driver DOES make things more complicated, it is unsupported as you stated. There is no point wasting brain cells on solving that puzzle in my opinion.

(In reply to Andreas Karis from comment #35)
> Another point here is that we currently do not have sufficient documentation
> for this issue which non-storage people would understand. I created a new
> KCS specifically for the cinder case but it lacks examples and output
> because I don't understand in depth what I'm writing about. And the KCS
> which was linked here earlier is simply not easy to understand for
> non-storage people. We need a storage idiot proof ( = for people like me )
> knowledge base article or documentation to reduce the time that we spend on
> these cases. Or at least to give non-storage sbr-stack members a tool so
> that we can help customers fix this. Ideally, I'd also like to see a few
> concise paragraphs in that KCS of what this issue exactly is and where it
> comes from (I tried to put as much as possible into the KCS, but again, I
> should not be the one doing it, as I don't have enough knowledge).

I can start modifying the article to try to make things a little bit clearer. While default installs can easily work around this issue, anything customized (or using the LVM driver in cinder) will become a little more complicated.  I'll work on that update now.

(In reply to Zdenek Kabelac from comment #36)
> As my 'thinking' result - I could imagine something along this path:
> 
> lvcreate --guest y|n ...
> lvchange --guest y|n  vg/lv
> 
> such LV would have extra attribute - so when set 'y' - it would be created
> with UUID suffix '-guest'.

The problem here is that problematic LVs that are being detected on the host are NOT created by cinder.  The users/admins of the VM guests are created LVM on top of their local (virtualized) devices. This configuration is then seen at the hypervisor level and picked up by LVM.  When the OpenStack user attempts to remove the storage from the VM, the disconnection fails because LVM on the hypervisor is holding the storage open.  This would require education of the issue to the users of the guests

The only way I see to fix this is:

   1) an entire rejection of all LVM on the host

   2) some way for LVM to determine that the VG was not created locally, which will likely require expensive scanning operations.

This issue is currently assigned to cinder, but i think it may make more sense for it to be a tripleo or nova.
Comment 39 Zdenek Kabelac 2017-12-01 08:46:19 EST
(In reply to Jack Waterworth from comment #37)
> (In reply to Andreas Karis from comment #35)
> > lvcreate --guest y|n ...
> > lvchange --guest y|n  vg/lv
> > 
> > such LV would have extra attribute - so when set 'y' - it would be created
> > with UUID suffix '-guest'.
> 
> The problem here is that problematic LVs that are being detected on the host
> are NOT created by cinder.  The users/admins of the VM guests are created
> LVM on top of their local (virtualized) devices. This configuration is then
> seen at the hypervisor level and picked up by LVM.  When the OpenStack user
> attempts to remove the storage from the VM, the disconnection fails because
> LVM on the hypervisor is holding the storage open.  This would require
> education of the issue to the users of the guests


Not sure if I understand this correctly - but this case is like:

1. User has 'some' attached storage on HOST.

2. This attached 'storage' (i.e. /dev/sdX) is directly used for VM.

3. Guest on this VM creates PV/VG on such device.

4. Such VG is then picked on HOST  as host has access to /dev/sdX


If this is the case - you easily see - it's issue completely *OUTSIDE* of lvm2.

lvm2 cannot be 'deducing' which device is meant to be used where.

Unless there would be some sort of authority claiming owner ship of a device.
lvm2 then can 'query' this authoritative tool and exclude access to a device.

Doing any 'runtime' analysis - 'WhO holds device' - would likely ONLY work when 'guest' is running. When 'guest' is offline -  lvm2 has no idea  /dev/sdX should not be accessed.

Thus this universal trouble is commonly solved by placing 'device header' - just like LVM2 is placing PV header to claim ownership of /dev/sdY.

Cinder may place i.e. XXX KiB header (commonly 1MiB these days) - and 'shift' used device by header-size for guest VM.

So adding 'header' makes such device 'unusable' on HOST in all cases and in case it's wanted to be accessed on hos - user has to use i.e.  loop device with offset to get access to this device locally again.
Comment 40 Jack Waterworth 2017-12-01 10:04:42 EST
Since these devices are presented straight to the VM after being presented to the host, the guest would ALSO see a device header shift and LVM would would not pick it up at that level either.  Nova would need to somehow intercept the device at the host level and then present it to the guest without the shift in place.  Keep in mind that, at the moment, nova just tells libvirt to add the storage device to the VM and its libvirt that does all the work.
Comment 41 Zdenek Kabelac 2017-12-01 10:29:48 EST
If there is no easy way how to configure offset for passed device - they it's possibly RFE for qemu/kvm then ?
Comment 42 David Teigland 2017-12-01 14:09:37 EST
As Zdenek has alluded to, this is a missing feature for devices in linux in general.  There's no way to designate some devices as belonging to the system, and other devices as belonging to some application (so the system shouldn't touch them).  You might imagine that devices referenced from /etc/fstab should be used by the system, and none others.  But unfortunately, the system assumes every device belongs to it.  There is some development going on to address this because it's a recurring problem.

In the mean time, lvm provides some ways to deal with the problem.  device filters are one, but they are hard to automate.  The RHEV group have worked on some scripts to attempt to automate host filter creation based on what's being used on the host, but I'm not sure it worked well enough.

Another solution is lvm system ID, which takes advantage of the fact that the host and guest are two different systems.  This would work very well in theory, and it's not "expensive".  The current issue with using it is that a VG without a system ID is accessible to everyone (both host and guest).  If lvm in the guest could be forced to use a system ID on its VGs, then the problem is solved, but the host can't enforce this.  One potential solution is a new lvm option we could add to ignore VGs without a system ID.  This would only need to be enabled for lvm on the host.
Comment 43 David Teigland 2017-12-07 11:24:02 EST
> filters are one, but they are hard to automate.  The RHEV group have worked
> on some scripts to attempt to automate host filter creation based on what's
> being used on the host, but I'm not sure it worked well enough.

They have in fact written a tool for this:
https://ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/

> One potential solution
> is a new lvm option we could add to ignore VGs without a system ID.  This
> would only need to be enabled for lvm on the host.

This was a simple patch and seems to work as expected, but I'll not add this to lvm until I know it'll actually be used.
Comment 51 Jack Waterworth 2018-01-09 09:28:28 EST
If LVM is on an image served up by glance, they should not see this issue.  You'll only see it when LVM is on a cinder provided volume.

Note You need to log in before you can comment on or make changes to this bug.