1305922 – Set cgroup device ACLs to allow block device for NVRAM backing store

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1305922 - Set cgroup device ACLs to allow block device for NVRAM backing store

Summary: Set cgroup device ACLs to allow block device for NVRAM backing store

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Peter Krempa
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	1305942
Blocks:	1305915
TreeView+	depends on / blocked

Reported:	2016-02-09 15:20 UTC by Martin Polednik
Modified:	2017-12-20 01:24 UTC (History)
CC List:	13 users (show)
Fixed In Version:	libvirt-1.3.2-1.el7
Doc Type:	Enhancement
Doc Text:
Clone Of:
Clones:	1305942 (view as bug list)
Environment:
Last Closed:	2016-11-03 18:37:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:2577	0	normal	SHIPPED_LIVE	Moderate: libvirt security, bug fix, and enhancement update	2016-11-03 12:07:06 UTC

Description Martin Polednik 2016-02-09 15:20:49 UTC

Description of problem:
To support OVMF properly, RHEV must be able to persist nvram files of individual VMs. Since RHEV does not require file based storage for it's functionality, we must be able to work with nvram contents on a block device. Additionally, to minimize number of LVs,
it would be useful to specify offset of the block dev where the next 128 kib nvram resides.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. create a LV
2. define a domain with OVMF support, use LV for nvram
3. start the domain

Actual results:
Using block device as nvram store following error

internal error: process exited while connecting to monitor: 2016-02-09T16:54:06.367671Z qemu-kvm: -drive file=/dev/dm-7,if=pflash,format=raw,unit=1: Could not open '/dev/dm-7': Operation not permitted

Expected results:
libvirt / QEMU should be able to start the domain with block device nvram store.

Additional info:

Comment 2 Peter Krempa 2016-02-10 07:48:46 UTC

I think this is two separate issues under one bugzilla:

1) Block devices don't work as nvram sources
This is a legitimate bug as we probably don't setup cgroup block device ACLs for nvram images since we did not expect that anybody would use it.

2) Allow using offsets into the block devices
This is a feature request only loosely related to the first issue. Using an offset into the block while saving space device does not seem to be a good idea:

*) in addition to the offset, it will require a size argument
The size is now inferred from the image itself. In addition when storing multiple image sizes this might become a problem in the magnitude of actually doing a filesystem-like implementation.

*) sharing the volume between multiple vms might not be a good idea
Doing separation in case when a qemu process misbehaves for some reason will be impossible

*) No actual benefit of using lvs directly
Since qemu loads and caches the complete nvram image into memory there is no performance benefit from using a LV directly.

... and possibly others.

For issue 2 I'd recommend RHEV using actual files and saving a lot of hassle of keeping the metadata of the placing of the images around.

Comment 3 Martin Polednik 2016-02-10 09:26:27 UTC

A bit more context for issue 2:

RHEV allows administrators to use either file based shared storage, or more importantly block based shared storage. When file based storage is used, current implementation is fine and may be used without additional work from qemu/libvirt.

Block storage uses LVM and data that would have been placed in directories are stored as LVs. In this case, we do not require any kind of additional storage and this is the scenario where offset within a block seem to be the best solution for us. For the record, minimal allocation size in RHEV is 1G LV and different solution discussed was 1 additional LV per VM, yielding (1G - 128kib) storage overhead.

Using files on hosts themselves is not at all suitable as our VMs are transient and there is no guarantee which host will be chosen for it's next run.

Comment 4 Peter Krempa 2016-02-10 10:58:58 UTC

(In reply to Martin Polednik from comment #3)
> A bit more context for issue 2:

...

> Block storage uses LVM and data that would have been placed in directories
> are stored as LVs. In this case, we do not require any kind of additional
> storage and this is the scenario where offset within a block seem to be the
> best solution for us. For the record, minimal allocation size in RHEV is 1G

This statement does not clarify nor refute any of the points I made above which state that it's not a good idea. Could you elaborate how you are going to overcome them?

> LV and different solution discussed was 1 additional LV per VM, yielding (1G
> - 128kib) storage overhead.
> 
> Using files on hosts themselves is not at all suitable as our VMs are
> transient and there is no guarantee which host will be chosen for it's next
> run.

This doesn't make much sense. Making the file available on the target host is a very similar job to making the LV available on a given host for the same task.

Comment 5 Daniel Berrangé 2016-02-10 11:27:21 UTC

(In reply to Peter Krempa from comment #2)
> 2) Allow using offsets into the block devices
> This is a feature request only loosely related to the first issue. Using an
> offset into the block while saving space device does not seem to be a good
> idea:
> 
> *) in addition to the offset, it will require a size argument
> The size is now inferred from the image itself. In addition when storing
> multiple image sizes this might become a problem in the magnitude of
> actually doing a filesystem-like implementation.
> 
> *) sharing the volume between multiple vms might not be a good idea
> Doing separation in case when a qemu process misbehaves for some reason will
> be impossible

I agree that we really do *not* want to support a scenario where we tell QEMU to only use a subset of a volume space, because it is impossible to provide any kind of security protection to ensure it only uses the region it is told to.

Also note you can already achieve the same end result in a safer manner. Take your LVM logical volume and format a partition table on it. Then create a partition that contains just the region of space you wish to ue for the nvram storage, and then give just that partition to QEMU.  The kernel now provides enforcement that QEMU can only write to that range of the underling volume, and QEMU can still have strong sVirt security isolation.

So IMHO this is a WONFIX from libvirt POV, because any libvirt/QEMU solution is worse for security than what is already possible.

Comment 6 Daniel Berrangé 2016-02-10 11:35:36 UTC

Speaking one of the device mapper experts, it turns out it is possible to setup a device mapping a region of another device without even formatting a partition table. You can just do something approx like

 $ dmsetup create $NAME --table="0 $LEN linear /path/to/blockdev $OFFSET

And then give /dev/mapper/$NAME to QEMU.

Comment 7 Martin Polednik 2016-02-10 12:30:09 UTC

(In reply to Daniel Berrange from comment #6)
> Speaking one of the device mapper experts, it turns out it is possible to
> setup a device mapping a region of another device without even formatting a
> partition table. You can just do something approx like
> 
>  $ dmsetup create $NAME --table="0 $LEN linear /path/to/blockdev $OFFSET
> 
> And then give /dev/mapper/$NAME to QEMU.

That's actually seems reasonable enough. It would be even better if libvirt did this.

(In reply to Peter Krempa from comment #4)
> (In reply to Martin Polednik from comment #3)
> > A bit more context for issue 2:
> 
> ...
> 
> > Block storage uses LVM and data that would have been placed in directories
> > are stored as LVs. In this case, we do not require any kind of additional
> > storage and this is the scenario where offset within a block seem to be the
> > best solution for us. For the record, minimal allocation size in RHEV is 1G
> 
> This statement does not clarify nor refute any of the points I made above
> which state that it's not a good idea. Could you elaborate how you are going
> to overcome them?

size argument:
Don't understand this one. The file is fixed at 128 kbit (created from a template).

isolation:
Considering the device mapper approach, I expect this might not allow misbehaving qemu process to overwrite other nvrams.

cache:
We are not considering this for benefit of running from LV, we are working on solution without file based storage.

> This doesn't make much sense. Making the file available on the target host
> is a very similar job to making the LV available on a given host for the
> same task.

There is no place to persist the file between VM runs.

Comment 8 Daniel Berrangé 2016-02-10 12:37:23 UTC

(In reply to Martin Polednik from comment #7)
> (In reply to Daniel Berrange from comment #6)
> > Speaking one of the device mapper experts, it turns out it is possible to
> > setup a device mapping a region of another device without even formatting a
> > partition table. You can just do something approx like
> > 
> >  $ dmsetup create $NAME --table="0 $LEN linear /path/to/blockdev $OFFSET
> > 
> > And then give /dev/mapper/$NAME to QEMU.
> 
> That's actually seems reasonable enough. It would be even better if libvirt
> did this.

I don't see any compelling reason for libvirt to do this really - RHEV already has to manage storage devices, so it is perfectly capable of dealing with this too.

Comment 9 Jaroslav Suchanek 2016-02-13 19:54:59 UTC

Based on comment 2, comment 5 and comment 8 closing as wontfix.

Comment 10 Michal Skrivanek 2016-02-15 10:12:10 UTC

reopening for issue 1) from comment #2, which is a prerequisite for doing it the comment #6 way.

Comment 11 Daniel Berrangé 2016-02-15 11:02:01 UTC

Changing subject to reflect the specific issue that remains to be addressed

Comment 12 Peter Krempa 2016-02-17 11:44:03 UTC

"Issue 1" was fixed upstream by:

commit d1242ba24a5ceb74c7ba21c6b2a44aaa1745fe79
Author: Peter Krempa <pkrempa>
Date:   Tue Feb 16 16:26:01 2016 +0100

    qemu: cgroup: Setup cgroups for bios/firmware images
    
    oVirt wants to use OVMF images on top of lvm for their 'logical'
    storage thus we should set up device ACLs for them so it will actually
    work.

v1.3.1-272-gd1242ba

Comment 14 lijuan men 2016-04-11 09:12:20 UTC

version:
libvirt-1.3.3-1.el7.x86_64
qemu-kvm-rhev-2.5.0-4.el7.x86_64
OVMF-20160202-2.gitd7c0dfa.el7.noarch

scenario1:use the block device without copying NVRAM variables template

step:
1.create a block device

# dmsetup create nvram --table="0 256 linear /dev/sdb4 0"

# ll /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Mar 31 22:25 control
lrwxrwxrwx. 1 root root       7 Mar 31 22:38 nvram -> ../dm-0

2.define a guest using the xml:
...
<os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/dev/mapper/nvram</nvram>
</os>
...
  <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/r7.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/root/RHEL-7.2-20151030.0-Server-x86_64-dvd1.iso'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...
3.start the guest
# virsh start r7
  Domain r7 started

4.use virt-viewer to check the guest
# virt-viewer r7

Actual results:
 the screen of the guest is black,there is not any output.


scenario2:use the  block device  copying NVRAM variables template

step:
1.create a block device

# dmsetup create nvram --table="0 256 linear /dev/sdb4 0"

# ll /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Mar 31 22:25 control
lrwxrwxrwx. 1 root root       7 Mar 31 22:38 nvram -> ../dm-0

2.copy NVRAM variables template
[root@localhost ~]# dd if=/usr/share/OVMF/OVMF_VARS.fd of=/dev/mapper/nvram
256+0 records in
256+0 records out
131072 bytes (131 kB) copied, 0.00431352 s, 30.4 MB/s

3.define a guest using the xml:
...
<os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/dev/mapper/nvram</nvram>
</os>
...
  <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/r7.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/root/RHEL-7.2-20151030.0-Server-x86_64-dvd1.iso'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...
4.start the guest
# virsh start r7
  Domain r7 started

5.use virt-viewer to check the guest
# virt-viewer r7

Actual results:
 I can see the screen of the guest,guest boots normally

So I have a question,must I copy the NVRAM variables template manually for the block device? Or there is another more convenient way? Copying template is not convenient for users. If user doesn't know the copy step,the guest will boot failed(as scenario1)

If I use a null file(not block) as the nvram file,I don't need to copy the template,and the guest can be booted normally
step:
1)[root@localhost ~]# qemu-img create /root/test.img 128K
Formatting '/root/test.img', fmt=raw size=131072
2)start a guest with the xml:
<os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/root/test.img</nvram>
</os>

Comment 15 Peter Krempa 2016-04-11 11:08:04 UTC

(In reply to lijuan men from comment #14)
> So I have a question,must I copy the NVRAM variables template manually for
> the block device? Or there is another more convenient way? Copying template
> is not convenient for users. If user doesn't know the copy step,the guest
> will boot failed(as scenario1)

The approach you used to initialize the device mapper to map 'nvran' to /dev/sdb4 does not clear the data that is previously present on /dev/sdb4, thus you've fed the first sectors of that partition to the NVRAM ...

> If I use a null file(not block) as the nvram file,I don't need to copy the
> template,and the guest can be booted normally

... thus the image is not full of nul bytes as here and the firmware fails to interpret that data correctly.

> step:
> 1)[root@localhost ~]# qemu-img create /root/test.img 128K
> Formatting '/root/test.img', fmt=raw size=131072

The failure is expected. Either clear it or better populate it with the apropriate source image.

Comment 16 lijuan men 2016-04-12 06:36:33 UTC

according to comment 15,test scenario1 again.

After clearing the data in the block,start the guest. guest boots normally.

This bug verified.

Comment 18 errata-xmlrpc 2016-11-03 18:37:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html

Comment 19 Meina Li 2017-12-07 11:03:50 UTC

Hi,Peter
When I test this case, I found guest start failed when allow block device for NVRAM backing Store. It is also reproduced on rhel7.4 released version. So please help to review this questions. Thank you very much.

Version-Release number of selected component:
libvirt-3.9.0-5.el7.x86_64
qemu-kvm-rhev-2.10.0-11.el7.x86_64

Steps to Reporoduce:
1.Create a block device named nvram.

# dmsetup create nvram --table="0 256 linear /dev/sdb1 0"

# ll /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Mar 31 22:25 control
lrwxrwxrwx. 1 root root       7 Mar 31 22:38 nvram -> ../dm-0

2.Define a guest using the xml:
...
  <os>
    <type arch='x86_64' machine='pc-q35-rhel7.5.0'>hvm</type>
    <loader readonly='yes' secure='no' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
    <nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/dev/mapper/nvram</nvram>
    <bootmenu enable='yes' timeout='3000'/>
    <smbios mode='sysinfo'/>
  </os>
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/lmo.qcow2'/>
      <target dev='sda' bus='sata'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...

3. Start the guest.
# virsh start lmo
error: Failed to start domain lmo
error: internal error: child reported: unable to stat: /dev/mapper/nvram: No such file or directory

Actual results:
As above.

Expected results:
guest can be booted successfully.

Additional info:
The log of guest:
2017-12-07 03:11:44.832+0000: 26241: debug : virCommandHandshakeChild:435 : Notifying parent for handshake start on 30
2017-12-07 03:11:44.832+0000: 26241: debug : virCommandHandshakeChild:443 : Waiting on parent for handshake complete on 31
libvirt:  error : libvirtd quit during handshake: Input/output error
2017-12-07 03:11:44.893+0000: shutting down, reason=failed

Comment 20 Peter Krempa 2017-12-19 13:57:57 UTC

I'm suspecting that libvirt does not setup the path in the namespace. Could you please re-try the above scenario but disable namespace support by setting the 'namespaces' variable to an empty list in /etc/libvirt/qemu.conf:

namespaces = [  ]

If the above scenario will work in such case, please file a new bug.

Comment 21 Meina Li 2017-12-20 01:24:45 UTC

If disable qemu namespace in the qemu.conf, the guest can start successfully in host but actually the guest can not boot up, that is, the display in screen is black. 
And I will file a new bug to trace this bug.

1. Modify /etc/libvirt/qemu.conf:

namespaces = [  ]

2. Restart libvirtd.
# virsh restart libvirtd

3. Start guest.
# virsh start lmo
Domain lmo started

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     lmo                            running
 -     lmn                            shut off
 -     rhel7                          shut off

4. Virt-manager.
the display in screen is black.

Note You need to log in before you can comment on or make changes to this bug.