Bug 1261083
Summary: | [RFE] Using LVM on a cinder volume should not expose the data to the compute host | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Jack Waterworth <jwaterwo> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Giulio Fidente <gfidente> | |
Status: | CLOSED ERRATA | QA Contact: | Tzach Shefi <tshefi> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 16.1 (Train) | CC: | abishop, acanan, agk, akaris, astillma, broose, cschwede, dmaley, eharney, fpantano, gcharot, gfidente, jbrassow, jpittman, jraju, jvisser, jwaterwo, lmarsh, ltoscano, mabrams, marjones, mburns, nkshirsa, nlevinki, nsoffer, nwolf, pablo.iranzo, pgrist, pmorey, sbaker, sclewis, scohen, sgotliv, slinaber, spower, sputhenp, srevivo, teigland, tkajinam, tshefi, tvignaud, tvvcox, vcojot, vfarias, zkabelac | |
Target Milestone: | z3 | Keywords: | FutureFeature, TechPreview, Triaged | |
Target Release: | 16.1 (Train on RHEL 8.2) | Flags: | tshefi:
automate_bug+
ndeevy: needinfo? |
|
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | tripleo-ansible-0.5.1-1.20200914163926.el8ost openstack-tripleo-heat-templates-11.3.2-1.20200914170167.el8ost | Doc Type: | Known Issue | |
Doc Text: |
Currently, LVM filter is not set unless at least one device is listed in the `LVMFilterAllowlist` parameter.
+
Workaround: Set the `LVMFilterAllowdisk` parameter to contain at least one device, for example, the root disk. The LVM filter is set in `/etc/lvm/lvm.conf`.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1883643 1884225 (view as bug list) | Environment: | ||
Last Closed: | 2020-12-15 18:35:44 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1518969, 1883643, 1884225, 1896164 |
Description
Jack Waterworth
2015-09-08 14:27:25 UTC
perhaps this is the answer? https://review.openstack.org/#/c/148747/ allow us to set an lvm.conf file in the cinder directory, which will be pushed to the compute nodes. (In reply to Jack Waterworth from comment #3) I don't think this will help, since it only affects what Cinder sees while managing LVM, and not the system. I accidentally reproduced this on my home box: [root@bulldozer ~]# lvs -o lv_name,vg_name,devices LV VG Devices root centos /dev/sda2(0) swap centos /dev/sda2(12800) _snapshot-fda38971-737c-45ef-a8e3-e3efa0f05ca9 cinder-volumes cinder-volumes-pool cinder-volumes cinder-volumes-pool_tdata(0) volume-0e367765-9631-43d3-8eab-6872360acbc8 cinder-volumes volume-211434f9-e947-413c-8395-7642a2ea29cb cinder-volumes volume-28aa705b-425b-4f95-bf85-8c8bc6bf5806 cinder-volumes volume-34b23d5a-a72b-4ecc-a668-963159476517 cinder-volumes volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f cinder-volumes volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248 cinder-volumes volume-62cc6d28-1619-4b69-86cd-66030c25693e cinder-volumes volume-7d644152-08d1-4e82-b7f5-e597f860ca4b cinder-volumes volume-973245e1-d184-432f-a223-90521d96cd65 cinder-volumes volume-9b058e04-268d-45f7-af30-a98f74216fa6 cinder-volumes volume-be1ff916-f326-4914-a6c8-444713c5e6d7 cinder-volumes volume-d26fc2fe-2243-4f25-9f08-9e371de9caf7 cinder-volumes volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b cinder-volumes volume-eb86aad5-7bc0-4036-84a6-3293b8e95832 cinder-volumes jack data /dev/sdc1(0) [root@bulldozer ~]# ll /dev/disk/by-path/*06ff7c0bd248* lrwxrwxrwx. 1 root root 9 Oct 1 23:16 /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0 -> ../../sdc lrwxrwxrwx. 1 root root 10 Oct 1 22:44 /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0-part1 -> ../../sdc1 Here we can see that my compute node is logged into the iscsi target which brings in /dev/sdc. the compute node is able to scan this for LVM metadata, and it shows up in my lvs output. the 'data' volume is actually the data volume for one of my instances. While that is slightly annoying, the real issue is that if my compute node already had a volume group named 'data', LVM would conflict with the local storage. I found that there is a new lvm feature being added to 7.2 that may be able to help this issue: http://man7.org/linux/man-pages/man7/lvmsystemid.7.html this could allow openstack to create a specific systemid to be used by the compute hosts, and the compute hosts should only access that systemid. any other volumes would be ignored. however, reading through the page seems to indicate that the guests would need to be systemid aware, and the guests would have to set systemids on the devices. I'm also experimenting with using a filter to ignore devices presented via iscsi. something like this: global_filter = [ "r|/dev/disk/by-path/*.openstack.*|" ] although this doesnt seem to work... still playing around with it (In reply to Jack Waterworth from comment #5) > I accidentally reproduced this on my home box: > > [root@bulldozer ~]# lvs -o lv_name,vg_name,devices > LV VG Devices > > root centos /dev/sda2(0) > > swap centos > /dev/sda2(12800) > _snapshot-fda38971-737c-45ef-a8e3-e3efa0f05ca9 cinder-volumes > > cinder-volumes-pool cinder-volumes > cinder-volumes-pool_tdata(0) > volume-0e367765-9631-43d3-8eab-6872360acbc8 cinder-volumes > > volume-211434f9-e947-413c-8395-7642a2ea29cb cinder-volumes > > volume-28aa705b-425b-4f95-bf85-8c8bc6bf5806 cinder-volumes > > volume-34b23d5a-a72b-4ecc-a668-963159476517 cinder-volumes > > volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f cinder-volumes > > volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248 cinder-volumes > > volume-62cc6d28-1619-4b69-86cd-66030c25693e cinder-volumes > > volume-7d644152-08d1-4e82-b7f5-e597f860ca4b cinder-volumes > > volume-973245e1-d184-432f-a223-90521d96cd65 cinder-volumes > > volume-9b058e04-268d-45f7-af30-a98f74216fa6 cinder-volumes > > volume-be1ff916-f326-4914-a6c8-444713c5e6d7 cinder-volumes > > volume-d26fc2fe-2243-4f25-9f08-9e371de9caf7 cinder-volumes > > volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b cinder-volumes > > volume-eb86aad5-7bc0-4036-84a6-3293b8e95832 cinder-volumes > > jack data /dev/sdc1(0) > > [root@bulldozer ~]# ll /dev/disk/by-path/*06ff7c0bd248* > lrwxrwxrwx. 1 root root 9 Oct 1 23:16 > /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack: > volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0 -> ../../sdc > lrwxrwxrwx. 1 root root 10 Oct 1 22:44 > /dev/disk/by-path/ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack: > volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0-part1 -> ../../sdc1 > > Here we can see that my compute node is logged into the iscsi target which > brings in /dev/sdc. the compute node is able to scan this for LVM metadata, > and it shows up in my lvs output. the 'data' volume is actually the data > volume for one of my instances. > > While that is slightly annoying, the real issue is that if my compute node > already had a volume group named 'data', LVM would conflict with the local > storage. We are presuming that controllers and computes are deployed on the dedicated hosts. They should not have any other vgs and lvs except of those created and used by the Openstack deployment. If the operator decides to create his own local vgs or lvs then his responsibility to prevent name collisions. TBH, I don't see an issue here, do you? Imagine that cloud operator decides to deploy 3rd party component on the controller host listening to the same port as Cinder api, would you ask me in that case to move cinder api to another port?! If an OpenStack admin provides an instance with a cinder volume to a customer, it is possible that that customer could create a volume group name that collides with a volume group created by the OpenStack admin. Additionally, if the volume group can be detected at the host level (without collision) it is possible that operations could be submitted to the metadata on those devices, even by LVM itself. The most common issue I see is that LVM will only see 1 out of 2 volumes, and will mark one of the volumes as missing within the metadata. This would bubble up to the instance and cause the device to be marked missing and possibly offline at the guest. This is something that we need to fix, I'm just not sure about the cleanest way to do it yet. I suspect we need to modify the system LVM config to include an additional file that Cinder manipulates, and is set to ignore devices belong to Cinder volumes. found a workaround for this. In my example, sdg1 is the problematic disk. It is a cinder-volume being presented via iscsi. the VG "data" is actually a VG that belongs to one of my instances, and should not be seen on the compute. I am using the following filter: global_filter = [ "r|.*openstack.*|" ] This will cause lvm to completely ignore any device with 'openstack' in the name, including any devices that have symlinks with openstack int he name. [root@bulldozer by-path(keystone_admin)]# pvs PV VG Fmt Attr PSize PFree /dev/sda2 centos lvm2 a-- 64.19g 12.19g /dev/sdb1 cinder-volumes lvm2 a-- 465.76g 23.07g /dev/sdg1 data lvm2 a-- 100.00g 0 [root@bulldozer by-path(keystone_admin)]# pvscan --cache --config 'devices{global_filter = [ "r|.*openstack.*|" ]}' [root@bulldozer by-path(keystone_admin)]# pvs --config 'devices{global_filter = [ "r|.*openstack.*|" ]}' PV VG Fmt Attr PSize PFree /dev/sda2 centos lvm2 a-- 64.19g 12.19g /dev/sdb1 cinder-volumes lvm2 a-- 465.76g 23.07g [root@bulldozer by-path(keystone_admin)]# this works due to the /dev/disk/by-path directory: [root@bulldozer by-path(keystone_admin)]# ll /dev/disk/by-path/ total 0 lrwxrwxrwx. 1 root root 9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-1272b721-51cc-4a3f-818e-dde25d93ed8d-lun-0 -> ../../sdi lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-1272b721-51cc-4a3f-818e-dde25d93ed8d-lun-0-part1 -> ../../sdi1 lrwxrwxrwx. 1 root root 9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-47771a4f-9046-43f8-9f80-7e05185b0001-lun-0 -> ../../sdj lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-47771a4f-9046-43f8-9f80-7e05185b0001-lun-0-part1 -> ../../sdj1 lrwxrwxrwx. 1 root root 9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f-lun-0 -> ../../sdh lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-4b218153-d3e3-4674-b9e8-5fd3c0840a5f-lun-0-part1 -> ../../sdh1 lrwxrwxrwx. 1 root root 9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0 -> ../../sdg lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a31c6e7-a5e1-423d-b3a4-06ff7c0bd248-lun-0-part1 -> ../../sdg1 lrwxrwxrwx. 1 root root 9 Feb 21 01:37 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a450fdb-b8d1-4ac0-85ee-9a07ad6d48f1-lun-0 -> ../../sdk lrwxrwxrwx. 1 root root 10 Feb 21 01:37 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-5a450fdb-b8d1-4ac0-85ee-9a07ad6d48f1-lun-0-part1 -> ../../sdk1 lrwxrwxrwx. 1 root root 9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b-lun-0 -> ../../sdf lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b-lun-0-part1 -> ../../sdf1 lrwxrwxrwx. 1 root root 9 Feb 21 01:17 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-c9aeea86-4751-429c-a85f-3a959883f0ef-lun-0 -> ../../sdd lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-c9aeea86-4751-429c-a85f-3a959883f0ef-lun-0-part1 -> ../../sdd1 lrwxrwxrwx. 1 root root 9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b-lun-0 -> ../../sde lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-d9c8cf52-370c-4ee2-afbc-6bc742a4425b-lun-0-part1 -> ../../sde1 lrwxrwxrwx. 1 root root 9 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-eb86aad5-7bc0-4036-84a6-3293b8e95832-lun-0 -> ../../sdc lrwxrwxrwx. 1 root root 10 Feb 21 00:44 ip-192.168.1.200:3260-iscsi-iqn.2010-10.org.openstack:volume-eb86aad5-7bc0-4036-84a6-3293b8e95832-lun-0-part1 -> ../../sdc1 If we use the --all flag in pvs, we can see that these devices are no longer even considered for utilization by LVM: [root@bulldozer by-path(keystone_admin)]# pvs --all PV VG Fmt Attr PSize PFree /dev/centos/root --- 0 0 /dev/centos/swap --- 0 0 /dev/data/jack --- 0 0 /dev/sda1 --- 0 0 /dev/sda2 centos lvm2 a-- 64.19g 12.19g /dev/sda3 --- 0 0 /dev/sdb1 cinder-volumes lvm2 a-- 465.76g 23.07g /dev/sdc1 --- 0 0 /dev/sdd1 --- 0 0 /dev/sde1 --- 0 0 /dev/sdf1 --- 0 0 /dev/sdg1 data lvm2 a-- 100.00g 0 /dev/sdh1 --- 0 0 /dev/sdi1 --- 0 0 /dev/sdj1 --- 0 0 /dev/sdk1 --- 0 0 [root@bulldozer by-path(keystone_admin)]# pvs --all --config 'devices{global_filter = [ "r|.*openstack.*|" ]}' PV VG Fmt Attr PSize PFree /dev/centos/root --- 0 0 /dev/centos/swap --- 0 0 /dev/data/jack --- 0 0 /dev/sda1 --- 0 0 /dev/sda2 centos lvm2 a-- 64.19g 12.19g /dev/sda3 --- 0 0 /dev/sdb1 cinder-volumes lvm2 a-- 465.76g 23.07g IMHO we need to break this up into three separate bugs for docs, Packstack and Director. For Director we can blacklist everything as LVM isn't used by the local hosts. For Packstack we can dynamically add local block devices to the whitelist while configuring the environment. For docs we can just highlight the need to correctly configure LVM filters on any host running cinder-volume or nova-compute. Eric, Jack, does that sound like a sane approach? When using Director, we should allow a configurable blacklist, even if the default would be blacklisting everything. This is required if the overcloud nodes are using LVM for some reason (eg. /var/lib/nova/instances on a SAN). FYI, in director land the instack host already has disabled LVM because of the same issue. https://review.openstack.org/#/c/248174/ Actually I guess it's not LVM entirely... sorry if that wasn't helpful. Moving this issue to 11, as it is unlikely to receive work in the RHOS 10 timeframe. I spoke with the lvm core development engineer and he stated that systemid is extremely expensive and he does not recommend its usage at all. He states that a filter is the best way to resolve this issue. I propose that we enable a default reject filter for openstack director deployments. Our deployments do not utilize LVM on the hosts. Any customer that wants to start using LVM on the hosts will need to adjust the filter. global_filter = { "r|.*|" } This will prevent this issue from being seen entirely. Jack, an alternate is adding 'volume_list = []' in lvm.conf. That will prevent activation of all VGs, but the volumes will still be visible on the compute node (if that's desirable). Using a global_filter is what was done a while back on the undercloud - https://review.openstack.org/#/c/343100/. The original fix disabled LVM completely and it turned out that some customers relied on it, so disabling LVM entirely might cause surprises in the overcloud as well. I also recall trying to use systemid, but think that the guest requirements made this not a viable option. Couldn't resist commenting, hope that's helpful. *** Bug 1499044 has been marked as a duplicate of this bug. *** Thinking about this further, my earlier comment 26 may cause issues, for example if there are 10 guests that all have a VG named vg1. They would not activate on the host, but they would cause a lot of chatter and warnings from LVM. Also, the global_filter is something that would need to be tweaked if LVM usage changes on the system with time. This bug details an issue that has been around for quite some time, in RHEV and now in openstack. Hopefully, as we move forward with the business, RHEL will be used in more hypervisors, not less. So maybe being a part of a hypervisor setup should be the defining aspect of some settable parameter. For example, have a setting in the hypervisor lvm.conf (which would be set by director or rhev at install): 'hypervisor_mode = 1'. By setting this, all LVM volumes not created on the hypervisor would be completely ignored. The details of how this is done would be under the hood and decided by the lvm team; all that would matter is that if the volume was not created on that system it would be ignored. The only issue I can think of is volumes imported from other systems, but as mentioned, lvm team could probably solve that easily. (In reply to Andreas Karis from comment #35) > We could indeed set up very restrictive default filter with Director from > the get go. We do not support the LVM cinder backend, so the tripleo team > could take this and just shut it down. Users will need to explicitly enable > the filters, and thus we will reduce the number of support tickets here. We > will initially get more support tickets for the disabled cinder LVM backend, > but we can easily point customers to a well written part of our > documentation. I *do* like this idea, and this is btw what the NFV team did > for OVS DPDK and unsupported PMD drivers - simply disable them. From what Ive been told in the past, making a *block all LVM* change would be pushed upstream. This would be fine for RHOSP that doesnt use LVM by default, but it could potentially break other distros that DO use LVM. Im not sure how much truth there is to this or if there is any way to work around those kind of issues. (In reply to Andreas Karis from comment #35) > Another solution, if feasible, could be that we implement logic in the > cinder LVM volume driver which verifies that lvm.conf is set up with the > correct filters, and otherwise the cinder LVM driver will throw an exception > and fail with a very clear error message. This issue occurs outside of the LVMIscsi cinder driver. While using this driver DOES make things more complicated, it is unsupported as you stated. There is no point wasting brain cells on solving that puzzle in my opinion. (In reply to Andreas Karis from comment #35) > Another point here is that we currently do not have sufficient documentation > for this issue which non-storage people would understand. I created a new > KCS specifically for the cinder case but it lacks examples and output > because I don't understand in depth what I'm writing about. And the KCS > which was linked here earlier is simply not easy to understand for > non-storage people. We need a storage idiot proof ( = for people like me ) > knowledge base article or documentation to reduce the time that we spend on > these cases. Or at least to give non-storage sbr-stack members a tool so > that we can help customers fix this. Ideally, I'd also like to see a few > concise paragraphs in that KCS of what this issue exactly is and where it > comes from (I tried to put as much as possible into the KCS, but again, I > should not be the one doing it, as I don't have enough knowledge). I can start modifying the article to try to make things a little bit clearer. While default installs can easily work around this issue, anything customized (or using the LVM driver in cinder) will become a little more complicated. I'll work on that update now. (In reply to Zdenek Kabelac from comment #36) > As my 'thinking' result - I could imagine something along this path: > > lvcreate --guest y|n ... > lvchange --guest y|n vg/lv > > such LV would have extra attribute - so when set 'y' - it would be created > with UUID suffix '-guest'. The problem here is that problematic LVs that are being detected on the host are NOT created by cinder. The users/admins of the VM guests are created LVM on top of their local (virtualized) devices. This configuration is then seen at the hypervisor level and picked up by LVM. When the OpenStack user attempts to remove the storage from the VM, the disconnection fails because LVM on the hypervisor is holding the storage open. This would require education of the issue to the users of the guests The only way I see to fix this is: 1) an entire rejection of all LVM on the host 2) some way for LVM to determine that the VG was not created locally, which will likely require expensive scanning operations. This issue is currently assigned to cinder, but i think it may make more sense for it to be a tripleo or nova. (In reply to Jack Waterworth from comment #37) > (In reply to Andreas Karis from comment #35) > > lvcreate --guest y|n ... > > lvchange --guest y|n vg/lv > > > > such LV would have extra attribute - so when set 'y' - it would be created > > with UUID suffix '-guest'. > > The problem here is that problematic LVs that are being detected on the host > are NOT created by cinder. The users/admins of the VM guests are created > LVM on top of their local (virtualized) devices. This configuration is then > seen at the hypervisor level and picked up by LVM. When the OpenStack user > attempts to remove the storage from the VM, the disconnection fails because > LVM on the hypervisor is holding the storage open. This would require > education of the issue to the users of the guests Not sure if I understand this correctly - but this case is like: 1. User has 'some' attached storage on HOST. 2. This attached 'storage' (i.e. /dev/sdX) is directly used for VM. 3. Guest on this VM creates PV/VG on such device. 4. Such VG is then picked on HOST as host has access to /dev/sdX If this is the case - you easily see - it's issue completely *OUTSIDE* of lvm2. lvm2 cannot be 'deducing' which device is meant to be used where. Unless there would be some sort of authority claiming owner ship of a device. lvm2 then can 'query' this authoritative tool and exclude access to a device. Doing any 'runtime' analysis - 'WhO holds device' - would likely ONLY work when 'guest' is running. When 'guest' is offline - lvm2 has no idea /dev/sdX should not be accessed. Thus this universal trouble is commonly solved by placing 'device header' - just like LVM2 is placing PV header to claim ownership of /dev/sdY. Cinder may place i.e. XXX KiB header (commonly 1MiB these days) - and 'shift' used device by header-size for guest VM. So adding 'header' makes such device 'unusable' on HOST in all cases and in case it's wanted to be accessed on hos - user has to use i.e. loop device with offset to get access to this device locally again. Since these devices are presented straight to the VM after being presented to the host, the guest would ALSO see a device header shift and LVM would would not pick it up at that level either. Nova would need to somehow intercept the device at the host level and then present it to the guest without the shift in place. Keep in mind that, at the moment, nova just tells libvirt to add the storage device to the VM and its libvirt that does all the work. If there is no easy way how to configure offset for passed device - they it's possibly RFE for qemu/kvm then ? As Zdenek has alluded to, this is a missing feature for devices in linux in general. There's no way to designate some devices as belonging to the system, and other devices as belonging to some application (so the system shouldn't touch them). You might imagine that devices referenced from /etc/fstab should be used by the system, and none others. But unfortunately, the system assumes every device belongs to it. There is some development going on to address this because it's a recurring problem. In the mean time, lvm provides some ways to deal with the problem. device filters are one, but they are hard to automate. The RHEV group have worked on some scripts to attempt to automate host filter creation based on what's being used on the host, but I'm not sure it worked well enough. Another solution is lvm system ID, which takes advantage of the fact that the host and guest are two different systems. This would work very well in theory, and it's not "expensive". The current issue with using it is that a VG without a system ID is accessible to everyone (both host and guest). If lvm in the guest could be forced to use a system ID on its VGs, then the problem is solved, but the host can't enforce this. One potential solution is a new lvm option we could add to ignore VGs without a system ID. This would only need to be enabled for lvm on the host. > filters are one, but they are hard to automate. The RHEV group have worked > on some scripts to attempt to automate host filter creation based on what's > being used on the host, but I'm not sure it worked well enough. They have in fact written a tool for this: https://ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/ > One potential solution > is a new lvm option we could add to ignore VGs without a system ID. This > would only need to be enabled for lvm on the host. This was a simple patch and seems to work as expected, but I'll not add this to lvm until I know it'll actually be used. If LVM is on an image served up by glance, they should not see this issue. You'll only see it when LVM is on a cinder provided volume. This recent commit should solve the problem if you set the new scan_lvs=0 on the host. It could be backported to rhel7 if necessary. https://sourceware.org/git/?p=lvm2.git;a=commit;h=bfcecbbce182570d040f3ec446dccb84ca07efcd Hi, I do not think that disabling LVM altogether is the way to go. What if (for compliance reasons) one might want to still run LVM on the Compute's local disks? Couldn't just we blacklist all iscsi disks in lvm.conf to avoid LVM collision caused by iSCSI disks exported by appliances thanks to cinder. See the discussion at: https://lists.gt.net/linuxha/users/74830 It was suggested to: ========================= QUOTE ============================ And you can construct some regexp which allows only your permanent disks to pass. F.e. "a|/dev/disk/by-id/scsi-.*|" or "a|/dev/disk/by-path/pci-.*-scsi-.*|" and then add reject-all rule to the end of that list - "r/.*/". ========================= QUOTE ============================ Not sure if that would be applicable here. Also, I'd like to share some feedback. I'm using (OSP13) with LVM on the bootdisk of my overcloud nodes and the following global_filter seems to work fine for iscsi disks: [root@krynn-ctrl-2 disk]# grep global_filter /etc/lvm/lvm.conf |grep -v \# global_filter = [ "r|^/dev/disk/by-path/ip.*|" ] This controler has 3 local disks and 1 iscsi target presented: [root@krynn-ctrl-2 disk]# lsscsi [0:0:0:0] disk ATA QEMU HARDDISK 3 /dev/sda [1:0:0:0] disk ATA QEMU HARDDISK 3 /dev/sdb [2:0:0:0] disk ATA QEMU HARDDISK 3 /dev/sdc [8:0:0:1] disk SYNOLOGY iSCSI Storage 4.0 /dev/sdd [root@krynn-ctrl-2 disk]# iscsiadm -m node 10.0.128.187:3260,1 iqn.2000-01.com.synology:dsm6.Target-1.3197dca51f "sdd" cannot be used with that filter but /dev/sdc can: [root@krynn-ctrl-2 disk]# lvmdiskscan /dev/rootdg/lv_root [ 16.00 GiB] /dev/rootdg/lv_var [ 32.00 GiB] /dev/sda2 [ <128.00 GiB] LVM physical volume /dev/rootdg/lv_home [ 2.00 GiB] /dev/rootdg/lv_tmp [ 2.00 GiB] /dev/sdc [ 8.00 TiB] LVM physical volume 4 disks 0 partitions 1 LVM physical volume whole disk 1 LVM physical volume [root@krynn-ctrl-2 disk]# pvcreate /dev/sdc Physical volume "/dev/sdc" successfully created. [root@krynn-ctrl-2 disk]# pvcreate /dev/sdd Device /dev/sdd excluded by a filter. So IMHO the above regexp to disable all iscsi disks would be useable while still allowing LVM on the local disks. Please note that if 'augeas' were included in overcloud-full.qcow2, it would be sufficient to do this: # augtool set /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/1/str "r|^/dev/disk/by-path/ip.*|" Result: # augtool print /files/etc/lvm/lvm.conf/devices/dict/global_filter /files/etc/lvm/lvm.conf/devices/dict/global_filter /files/etc/lvm/lvm.conf/devices/dict/global_filter/list /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/1 /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/1/str = "r|^/dev/disk/by-path/ip.*|" (In reply to Jack Waterworth from comment #0) > Description of problem: > Using LVM on a cinder volume on the instance causes the compute node to pick > up the LVM at the host level > > How reproducible: > Every time > > Steps to Reproduce: > 1. Create a new cinder volume and present it to an instance > 2. Use LVM against the raw device (pvcreate/vgcreate/lvcreate) > 3. Run 'lvs -o +devices' on the compute > > Actual results: > LVM from the guest is seen on the host > > Expected results: > host should not be able to see LVM from the guest > > Additional info: > This can cause problems such as conflicting VG names on the compute. It can > also cause the LVM on the compute to adjust metadata that the instance is > not aware of, leading to things like missing volumes. > > Current workaround is to set a filter on the compute node. Forgive me, I've read a good portion of this bug, but not all. By 'compute node', you mean 'host' right? You've talked about filters. Good. Have you tried (perhaps this is overly clever) creating LVM volumes on the cinder volumes ON THE HOST? Then, you export those LVs (built on the cinder volumes ON THE HOST) to be used in the guests. Then you set lvm.conf:devices/scan_lvs = 0 on the host. At this point, any PVs/VGs/LVs created by the guests will be ignored. Problem solved? FWIW, we will be switching the default value of lvm.conf:devices/scan_lvs = 1 to 0 in RHEL7.7. Thus, the only thing you would have to do from here forward is put LVM on the cinder volumes ON THE HOST and then export the LVs to the guests and all should be fine. (In reply to Jonathan Earl Brassow from comment #69) > Forgive me, I've read a good portion of this bug, but not all. > > By 'compute node', you mean 'host' right? Yes, by 'compute node' I mean the Hypervisor: the host that runs the VMs. > You've talked about filters. Good. > > Have you tried (perhaps this is overly clever) creating LVM volumes on the > cinder volumes ON THE HOST? Then, you export those LVs (built on the cinder > volumes ON THE HOST) to be used in the guests. Then you set > lvm.conf:devices/scan_lvs = 0 on the host. At this point, any PVs/VGs/LVs > created by the guests will be ignored. Problem solved? I see two issues with that. First it's not how OpenStack operates : a VM will be built on a compute node (a host) by consuming devices made available on the host: - a virtual block device (either iscsi, FC or ceph) - Memory, cpu & image. It is not uncommon to have 3rd party vendors provide images and stacks that can result in having two VMs having the same LVM configuration (same VG id, same LV's, etc..) because they were launched using the same base image. Consequently, exposing all those to the host probably wouldn't work if two different VMs had identical VGs. > FWIW, we will be switching the default value of lvm.conf:devices/scan_lvs = > 1 to 0 in RHEL7.7. Thus, the only thing you would have to do from here > forward is put LVM on the cinder volumes ON THE HOST and then export the LVs > to the guests and all should be fine. thanks for chiming in, I didn't know that change was coming. At any case, I think scan_lvs=0 might not be sufficient to avoid VG collision. (In reply to Vincent S. Cojot from comment #70) > (In reply to Jonathan Earl Brassow from comment #69) > > Forgive me, I've read a good portion of this bug, but not all. > > > > By 'compute node', you mean 'host' right? > > Yes, by 'compute node' I mean the Hypervisor: the host that runs the VMs. > > > You've talked about filters. Good. > > > > Have you tried (perhaps this is overly clever) creating LVM volumes on the > > cinder volumes ON THE HOST? Then, you export those LVs (built on the cinder > > volumes ON THE HOST) to be used in the guests. Then you set > > lvm.conf:devices/scan_lvs = 0 on the host. At this point, any PVs/VGs/LVs > > created by the guests will be ignored. Problem solved? > > I see two issues with that. First it's not how OpenStack operates : a VM > will be built on a > compute node (a host) by consuming devices made available on the host: > - a virtual block device (either iscsi, FC or ceph) > - Memory, cpu & image. > It is not uncommon to have 3rd party vendors provide images and stacks that > can > result in having two VMs having the same LVM configuration (same VG id, same > LV's, etc..) > because they were launched using the same base image. > Consequently, exposing all those to the host probably wouldn't work if two > different VMs had identical VGs. Sorry, I'm not sure if I don't understand or if I'm not making myself clear... The issue of whether or not there are identical VG IDs is irrelevant if the host does not see the VGs. Solve the problem of visibility on the hypervisor, and you've solved your problem of conflicting VG IDs. Thus, the idea of filtering is a good one - even if it is a pain and hard to automate at times. What I'm asking for is that before you simply pass on storage that the host can see to the guest, you encapsulate it with LVM. Export only LVs as the storage to the guests, not the storage directly. This will solve the visibility problem (i.e. that the host can see the LVM configurations on the guests). Hi Johnathan, You understood the issue well but please note that we don't necessarily have control on the content of the image being used for instances (it may have LVM or not, multiple VGs or a single VG). In other words: the images might be provided by a 3rd party (this is very common in NF context) and we might not be able to control what's getting passed to the instances from the host. Thus, the only way for us to avoid running into LVM issues on the compute machine/host notwithstanding what's in the images used for building instances is to: # augtool set /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/1/str "r|^/dev/disk/by-path/ip.*iscsi.*|" Also, I believe that consuming VGs on the compute hosts/machines and passing LVs to the instances would require modification of OpenStack Nova. I have no idea if that would be on the implementation roadmap for the compute DFG. Ultimately, device filters are the best option, but as you've found they can be difficult to use. Given how filters currently work, white lists usually provide the best results (e.g. accept the local disks necessary for the host and reject everything else). This is because of the growing proliferation of links to block devices, and using a blacklist requires you to hunt down and exclude every one of the links, which is difficult and fragile. To make white lists more practical, it would be nice if there was a standard mechanism to identify the "local disks used by the host". This would ideally be a general capability of the system (I'm sort of repeating what I wrote in comment 42). It's outdated for the OS to assume that every block device that is connected to it is meant to be used directly by it. (In reply to Vincent S. Cojot from comment #72) > # augtool set /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/1/str > "r|^/dev/disk/by-path/ip.*iscsi.*|" Please note this also affects SAN in addition to iscsi. (In reply to Tomas Von Veschler from comment #74) > (In reply to Vincent S. Cojot from comment #72) > > # augtool set /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/1/str > > "r|^/dev/disk/by-path/ip.*iscsi.*|" > > Please note this also affects SAN in addition to iscsi. Hi Thomas, Yes, indeed. A cinder driver utilizing FC-based storage would complicate matters further as it would be most difficult to distinguish between local and SAN-based disks (except by using VID/PID, of course). In that case, perhaps a 'whitelist' of disks to accept would be preferable. In our specific case, however, putting a generic iSCSI reject pattern in place prevents us from having to hardcode specific device paths in a non-generic whitelist. While we are able to work around this issue with filters, customers/users are not aware that the issue even exists until they run into it. Filters are great if you know exactly what you need to white list/blacklist. How can we build something like this into a deployment, so that the issue never happens in the first place? By default, our overcloud images do not use LVM, and I suggest that we configure these images to ignore LVM completely on the hosts. If an admin/customer needs to re-enable LVM on their systems, we can have the process documented, including how to correctly configure a filter. (In reply to Jack Waterworth from comment #76) > While we are able to work around this issue with filters, customers/users > are not aware that the issue even exists until they run into it. Filters > are great if you know exactly what you need to white list/blacklist. How can > we build something like this into a deployment, so that the issue never > happens in the first place? By default, our overcloud images do not use > LVM, and I suggest that we configure these images to ignore LVM completely > on the hosts. If an admin/customer needs to re-enable LVM on their systems, > we can have the process documented, including how to correctly configure a > filter. I agree with you Jack. Also, at the same time, providing support for whole-disk-images functionality involves LVM in almost all cases. Perhaps we could provide a pre-baked lvm.conf in our overcloud images that makes it so that the filter kicks in if/when the end-user ends up using LVM. It wouldn't break anything at all if LVM isn't used. > Another solution is lvm system ID, which takes advantage of the fact that
> the host and guest are two different systems. This would work very well in
> theory, and it's not "expensive". The current issue with using it is that a
> VG without a system ID is accessible to everyone (both host and guest). If
> lvm in the guest could be forced to use a system ID on its VGs, then the
> problem is solved, but the host can't enforce this. One potential solution
> is a new lvm option we could add to ignore VGs without a system ID. This
> would only need to be enabled for lvm on the host.
I believe this would work. What are the chances of getting this new lvm option added so that volume groups without systemids can be ignored?
(In reply to Jack Waterworth from comment #78) > > Another solution is lvm system ID, which takes advantage of the fact that > > the host and guest are two different systems. This would work very well in > > theory, and it's not "expensive". The current issue with using it is that a > > VG without a system ID is accessible to everyone (both host and guest). If > > lvm in the guest could be forced to use a system ID on its VGs, then the > > problem is solved, but the host can't enforce this. One potential solution > > is a new lvm option we could add to ignore VGs without a system ID. This > > would only need to be enabled for lvm on the host. > > I believe this would work. What are the chances of getting this new lvm > option added so that volume groups without systemids can be ignored? I'll see if I still have that patch somewhere, I don't remember the details off hand. One issue with this solution is the host would have to use a system ID and have its system ID set on its own VGs. That is not the default setup (default has no system IDs enabled). Making this change to a host would involve identifying all of the host's own local VGs and setting its own system ID on them (with vgchange). Then the new proposed setting could be used to ignore any other VGs that are visible that have no system ID defined (in addition to ignoring any other VGs with a foreign system ID.) by default, we dont use LVM on the overcloud nodes at all. If we can set the systemid at deployment time, any custom LVM will have the systemid, correct? (In reply to Jack Waterworth from comment #80) > by default, we dont use LVM on the overcloud nodes at all. If we can set the > systemid at deployment time, any custom LVM will have the systemid, correct? I am not sure. Using glance whole-disk-images to enable LVM is typically done prior to deployment. (In reply to David Teigland from comment #79) > (In reply to Jack Waterworth from comment #78) > > > Another solution is lvm system ID, which takes advantage of the fact that > > > the host and guest are two different systems. This would work very well in > > > theory, and it's not "expensive". The current issue with using it is that a > > > VG without a system ID is accessible to everyone (both host and guest). If > > > lvm in the guest could be forced to use a system ID on its VGs, then the > > > problem is solved, but the host can't enforce this. One potential solution > > > is a new lvm option we could add to ignore VGs without a system ID. This > > > would only need to be enabled for lvm on the host. > > > > I believe this would work. What are the chances of getting this new lvm > > option added so that volume groups without systemids can be ignored? Wrote the patch again: https://sourceware.org/git/?p=lvm2.git;a=commit;h=af828fbc4913b081fdfd73b02e5c6b1ca9fbbec3 Nir, didn't the RHV team faced the same issue, and solved it somehow with some black/white list, dynamically creating lvm configuration? https://gerrit.ovirt.org/#/q/project:vdsm+branch:master+topic:lvm-filter ? (In reply to Yaniv Kaul from comment #83) Indeed, David mentioned it in comment 43. The basic idea is that on the host you have a very strict whitelist allowing only the devices used by the host. If you have an application using LVM on shared storage, the application use LVM --config command line option to override the host filter. *** Bug 1719568 has been marked as a duplicate of this bug. *** We want to avoid adding this feature until we're certain it's effective and will be used. The biggest hurdle will likely be enabling system_id on all the hosts. That can create its own new challenges, like deciding what to use as the system ID (e.g. uname?), how to handle situations where you might want to change it (e.g. renaming a host). So, it's not simply a matter of applying the patch, you'll need to experiment with using it first to see how it works and discover the other changes you'll need to make to use it. This issue was a hot topic at a recent workshop with representatives from several RH groups. Engineering has a fresh idea on how to address the problem, and we hope to finally make progress on a resolution. I have to pass this off to Eric, as he's the one with the most knowledge (it's his idea we plan to pursue). Setting Doc Type as Technology Preview for Release Notes inclusion. Verified on: tripleo-ansible-0.5.1-1.20200914163930 openstack-tripleo-heat-templates-11.3.2-1.20200914170176 After hitting a new bz[0] see comment above and using a correct THT things worked as expected. The THT used: (overcloud) [stack@undercloud-0 ~]$ cat virt/extra_templates.yaml --- parameter_defaults: ComputeParameters: LVMFilterEnabled: true LVMFilterAllowlist: - /dev/vda A filter was created on compute's lvm.conf [root@compute-0 ~]# grep -i filter /etc/lvm/lvm.conf # Configuration option devices/global_filter. # Because devices/filter may be overridden from the command line, it is # not suitable for system-wide device filtering, e.g. udev. # Use global_filter to hide devices from these LVM system components. # The syntax is the same as devices/filter. Devices rejected by # global_filter are not opened by LVM. global_filter=["a|/dev/vda|","r|.*|"] <- this was added A booted instance with an attached cinder volume, on which an LVM partion was created doesn't show up any more on compute host. It did before I used the correct THT/no filtering was added. before using correct THT: [root@compute-0 ~]# lvs -o +devices /dev/sda: open failed: No such device or address /dev/sda1: open failed: No such device or address /dev/sda15: open failed: No such device or address /dev/sda: open failed: No such device or address /dev/sda1: open failed: No such device or address /dev/sda15: open failed: No such device or address LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices lv1 vg1 -wi------- 1020.00m /dev/sdb(0) Nothing is returned: [root@compute-0 ~]# lvs -o +devices [root@compute-0 ~]# Good to verify just notice below BZ, until it's fixed allowed list must be populated. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1905973 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413 The test case covering bz is marked as critical\automated https://polarion.engineering.redhat.com/polarion/#/project/RHELOpenStackPlatform/workitem?id=RHELOSP-81055 |