Bug 1305922
| Summary: | Set cgroup device ACLs to allow block device for NVRAM backing store | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Martin Polednik <mpoledni> | |
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.3 | CC: | berrange, dyuan, jdenemar, jsuchane, lersek, lmen, lmiksik, meili, michal.skrivanek, pkrempa, pzhang, rbalakri, xuzhang | |
| Target Milestone: | rc | Keywords: | FutureFeature, Reopened | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-1.3.2-1.el7 | Doc Type: | Enhancement | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1305942 (view as bug list) | Environment: | ||
| Last Closed: | 2016-11-03 18:37:39 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1305942 | |||
| Bug Blocks: | 1305915 | |||
|
Description
Martin Polednik
2016-02-09 15:20:49 UTC
I think this is two separate issues under one bugzilla: 1) Block devices don't work as nvram sources This is a legitimate bug as we probably don't setup cgroup block device ACLs for nvram images since we did not expect that anybody would use it. 2) Allow using offsets into the block devices This is a feature request only loosely related to the first issue. Using an offset into the block while saving space device does not seem to be a good idea: *) in addition to the offset, it will require a size argument The size is now inferred from the image itself. In addition when storing multiple image sizes this might become a problem in the magnitude of actually doing a filesystem-like implementation. *) sharing the volume between multiple vms might not be a good idea Doing separation in case when a qemu process misbehaves for some reason will be impossible *) No actual benefit of using lvs directly Since qemu loads and caches the complete nvram image into memory there is no performance benefit from using a LV directly. ... and possibly others. For issue 2 I'd recommend RHEV using actual files and saving a lot of hassle of keeping the metadata of the placing of the images around. A bit more context for issue 2: RHEV allows administrators to use either file based shared storage, or more importantly block based shared storage. When file based storage is used, current implementation is fine and may be used without additional work from qemu/libvirt. Block storage uses LVM and data that would have been placed in directories are stored as LVs. In this case, we do not require any kind of additional storage and this is the scenario where offset within a block seem to be the best solution for us. For the record, minimal allocation size in RHEV is 1G LV and different solution discussed was 1 additional LV per VM, yielding (1G - 128kib) storage overhead. Using files on hosts themselves is not at all suitable as our VMs are transient and there is no guarantee which host will be chosen for it's next run. (In reply to Martin Polednik from comment #3) > A bit more context for issue 2: ... > Block storage uses LVM and data that would have been placed in directories > are stored as LVs. In this case, we do not require any kind of additional > storage and this is the scenario where offset within a block seem to be the > best solution for us. For the record, minimal allocation size in RHEV is 1G This statement does not clarify nor refute any of the points I made above which state that it's not a good idea. Could you elaborate how you are going to overcome them? > LV and different solution discussed was 1 additional LV per VM, yielding (1G > - 128kib) storage overhead. > > Using files on hosts themselves is not at all suitable as our VMs are > transient and there is no guarantee which host will be chosen for it's next > run. This doesn't make much sense. Making the file available on the target host is a very similar job to making the LV available on a given host for the same task. (In reply to Peter Krempa from comment #2) > 2) Allow using offsets into the block devices > This is a feature request only loosely related to the first issue. Using an > offset into the block while saving space device does not seem to be a good > idea: > > *) in addition to the offset, it will require a size argument > The size is now inferred from the image itself. In addition when storing > multiple image sizes this might become a problem in the magnitude of > actually doing a filesystem-like implementation. > > *) sharing the volume between multiple vms might not be a good idea > Doing separation in case when a qemu process misbehaves for some reason will > be impossible I agree that we really do *not* want to support a scenario where we tell QEMU to only use a subset of a volume space, because it is impossible to provide any kind of security protection to ensure it only uses the region it is told to. Also note you can already achieve the same end result in a safer manner. Take your LVM logical volume and format a partition table on it. Then create a partition that contains just the region of space you wish to ue for the nvram storage, and then give just that partition to QEMU. The kernel now provides enforcement that QEMU can only write to that range of the underling volume, and QEMU can still have strong sVirt security isolation. So IMHO this is a WONFIX from libvirt POV, because any libvirt/QEMU solution is worse for security than what is already possible. Speaking one of the device mapper experts, it turns out it is possible to setup a device mapping a region of another device without even formatting a partition table. You can just do something approx like $ dmsetup create $NAME --table="0 $LEN linear /path/to/blockdev $OFFSET And then give /dev/mapper/$NAME to QEMU. (In reply to Daniel Berrange from comment #6) > Speaking one of the device mapper experts, it turns out it is possible to > setup a device mapping a region of another device without even formatting a > partition table. You can just do something approx like > > $ dmsetup create $NAME --table="0 $LEN linear /path/to/blockdev $OFFSET > > And then give /dev/mapper/$NAME to QEMU. That's actually seems reasonable enough. It would be even better if libvirt did this. (In reply to Peter Krempa from comment #4) > (In reply to Martin Polednik from comment #3) > > A bit more context for issue 2: > > ... > > > Block storage uses LVM and data that would have been placed in directories > > are stored as LVs. In this case, we do not require any kind of additional > > storage and this is the scenario where offset within a block seem to be the > > best solution for us. For the record, minimal allocation size in RHEV is 1G > > This statement does not clarify nor refute any of the points I made above > which state that it's not a good idea. Could you elaborate how you are going > to overcome them? size argument: Don't understand this one. The file is fixed at 128 kbit (created from a template). isolation: Considering the device mapper approach, I expect this might not allow misbehaving qemu process to overwrite other nvrams. cache: We are not considering this for benefit of running from LV, we are working on solution without file based storage. > This doesn't make much sense. Making the file available on the target host > is a very similar job to making the LV available on a given host for the > same task. There is no place to persist the file between VM runs. (In reply to Martin Polednik from comment #7) > (In reply to Daniel Berrange from comment #6) > > Speaking one of the device mapper experts, it turns out it is possible to > > setup a device mapping a region of another device without even formatting a > > partition table. You can just do something approx like > > > > $ dmsetup create $NAME --table="0 $LEN linear /path/to/blockdev $OFFSET > > > > And then give /dev/mapper/$NAME to QEMU. > > That's actually seems reasonable enough. It would be even better if libvirt > did this. I don't see any compelling reason for libvirt to do this really - RHEV already has to manage storage devices, so it is perfectly capable of dealing with this too. reopening for issue 1) from comment #2, which is a prerequisite for doing it the comment #6 way. Changing subject to reflect the specific issue that remains to be addressed "Issue 1" was fixed upstream by:
commit d1242ba24a5ceb74c7ba21c6b2a44aaa1745fe79
Author: Peter Krempa <pkrempa>
Date: Tue Feb 16 16:26:01 2016 +0100
qemu: cgroup: Setup cgroups for bios/firmware images
oVirt wants to use OVMF images on top of lvm for their 'logical'
storage thus we should set up device ACLs for them so it will actually
work.
v1.3.1-272-gd1242ba
version:
libvirt-1.3.3-1.el7.x86_64
qemu-kvm-rhev-2.5.0-4.el7.x86_64
OVMF-20160202-2.gitd7c0dfa.el7.noarch
scenario1:use the block device without copying NVRAM variables template
step:
1.create a block device
# dmsetup create nvram --table="0 256 linear /dev/sdb4 0"
# ll /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Mar 31 22:25 control
lrwxrwxrwx. 1 root root 7 Mar 31 22:38 nvram -> ../dm-0
2.define a guest using the xml:
...
<os>
<type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/dev/mapper/nvram</nvram>
</os>
...
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/r7.qcow2'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/root/RHEL-7.2-20151030.0-Server-x86_64-dvd1.iso'/>
<target dev='hda' bus='ide'/>
<readonly/>
<boot order='1'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
...
3.start the guest
# virsh start r7
Domain r7 started
4.use virt-viewer to check the guest
# virt-viewer r7
Actual results:
the screen of the guest is black,there is not any output.
scenario2:use the block device copying NVRAM variables template
step:
1.create a block device
# dmsetup create nvram --table="0 256 linear /dev/sdb4 0"
# ll /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Mar 31 22:25 control
lrwxrwxrwx. 1 root root 7 Mar 31 22:38 nvram -> ../dm-0
2.copy NVRAM variables template
[root@localhost ~]# dd if=/usr/share/OVMF/OVMF_VARS.fd of=/dev/mapper/nvram
256+0 records in
256+0 records out
131072 bytes (131 kB) copied, 0.00431352 s, 30.4 MB/s
3.define a guest using the xml:
...
<os>
<type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/dev/mapper/nvram</nvram>
</os>
...
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/r7.qcow2'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/root/RHEL-7.2-20151030.0-Server-x86_64-dvd1.iso'/>
<target dev='hda' bus='ide'/>
<readonly/>
<boot order='1'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
...
4.start the guest
# virsh start r7
Domain r7 started
5.use virt-viewer to check the guest
# virt-viewer r7
Actual results:
I can see the screen of the guest,guest boots normally
So I have a question,must I copy the NVRAM variables template manually for the block device? Or there is another more convenient way? Copying template is not convenient for users. If user doesn't know the copy step,the guest will boot failed(as scenario1)
If I use a null file(not block) as the nvram file,I don't need to copy the template,and the guest can be booted normally
step:
1)[root@localhost ~]# qemu-img create /root/test.img 128K
Formatting '/root/test.img', fmt=raw size=131072
2)start a guest with the xml:
<os>
<type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/root/test.img</nvram>
</os>
(In reply to lijuan men from comment #14) > So I have a question,must I copy the NVRAM variables template manually for > the block device? Or there is another more convenient way? Copying template > is not convenient for users. If user doesn't know the copy step,the guest > will boot failed(as scenario1) The approach you used to initialize the device mapper to map 'nvran' to /dev/sdb4 does not clear the data that is previously present on /dev/sdb4, thus you've fed the first sectors of that partition to the NVRAM ... > If I use a null file(not block) as the nvram file,I don't need to copy the > template,and the guest can be booted normally ... thus the image is not full of nul bytes as here and the firmware fails to interpret that data correctly. > step: > 1)[root@localhost ~]# qemu-img create /root/test.img 128K > Formatting '/root/test.img', fmt=raw size=131072 The failure is expected. Either clear it or better populate it with the apropriate source image. according to comment 15,test scenario1 again. After clearing the data in the block,start the guest. guest boots normally. This bug verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html Hi,Peter
When I test this case, I found guest start failed when allow block device for NVRAM backing Store. It is also reproduced on rhel7.4 released version. So please help to review this questions. Thank you very much.
Version-Release number of selected component:
libvirt-3.9.0-5.el7.x86_64
qemu-kvm-rhev-2.10.0-11.el7.x86_64
Steps to Reporoduce:
1.Create a block device named nvram.
# dmsetup create nvram --table="0 256 linear /dev/sdb1 0"
# ll /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Mar 31 22:25 control
lrwxrwxrwx. 1 root root 7 Mar 31 22:38 nvram -> ../dm-0
2.Define a guest using the xml:
...
<os>
<type arch='x86_64' machine='pc-q35-rhel7.5.0'>hvm</type>
<loader readonly='yes' secure='no' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/dev/mapper/nvram</nvram>
<bootmenu enable='yes' timeout='3000'/>
<smbios mode='sysinfo'/>
</os>
...
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/var/lib/libvirt/images/lmo.qcow2'/>
<target dev='sda' bus='sata'/>
<boot order='1'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
...
3. Start the guest.
# virsh start lmo
error: Failed to start domain lmo
error: internal error: child reported: unable to stat: /dev/mapper/nvram: No such file or directory
Actual results:
As above.
Expected results:
guest can be booted successfully.
Additional info:
The log of guest:
2017-12-07 03:11:44.832+0000: 26241: debug : virCommandHandshakeChild:435 : Notifying parent for handshake start on 30
2017-12-07 03:11:44.832+0000: 26241: debug : virCommandHandshakeChild:443 : Waiting on parent for handshake complete on 31
libvirt: error : libvirtd quit during handshake: Input/output error
2017-12-07 03:11:44.893+0000: shutting down, reason=failed
I'm suspecting that libvirt does not setup the path in the namespace. Could you please re-try the above scenario but disable namespace support by setting the 'namespaces' variable to an empty list in /etc/libvirt/qemu.conf: namespaces = [ ] If the above scenario will work in such case, please file a new bug. If disable qemu namespace in the qemu.conf, the guest can start successfully in host but actually the guest can not boot up, that is, the display in screen is black. And I will file a new bug to trace this bug. 1. Modify /etc/libvirt/qemu.conf: namespaces = [ ] 2. Restart libvirtd. # virsh restart libvirtd 3. Start guest. # virsh start lmo Domain lmo started # virsh list --all Id Name State ---------------------------------------------------- 2 lmo running - lmn shut off - rhel7 shut off 4. Virt-manager. the display in screen is black. |