Bug 1461214

Summary:	RFE: Enhance libvirt to allow existing file for memoryBacking type file
Product:	Red Hat Enterprise Linux 7	Reporter:	Zack Cornelius <zack.cornelius>
Component:	libvirt	Assignee:	Michal Privoznik <mprivozn>
Status:	CLOSED ERRATA	QA Contact:	Luyao Huang <lhuang>
Severity:	medium	Docs Contact:
Priority:	low
Version:	7.4	CC:	ailan, dyuan, jdenemar, jsuchane, kchamart, knoel, lmiksik, mprivozn, mtessun, plancast, rbalakri, sgordon, xuzhang, yalzhang, zack.cornelius
Target Milestone:	rc	Keywords:	FutureFeature
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libvirt-3.9.0-11.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1541570 (view as bug list)		Environment:
Last Closed:	2018-04-10 10:48:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1460848
Bug Blocks:	1541570, 1594272, 1795933

Description Zack Cornelius 2017-06-13 23:10:07 UTC

Description of problem:
When using memoryBacking source type 'file' with qemu, libvirt passes the directory from qemu.conf's memory_backing_dir as the mem-path argument for the object. This leads to qemu using a tmpfile for the file backing the memory.

Our use case uses a libvirt hook script to create a symlink to an existing file for qemu to use as the backing store. For this to work, libvirt needs to specify a specific filename instead of just the directory for mem-path. 

I think this could be accomplished via having an option to use a predefined filename (such as the guest's UUID) or allowing the XML to specify the filename for the backing file.


Example existing XML and qemu args:

qemu.conf:
memory_backing_dir = "/var/lib/libvirt/qemu/ram"

XML snippet:
  <uuid>ef1bdff4-27f3-4e85-a807-5fb4d58463cc</uuid>
  <memory unit='KiB'>1048576</memory>
  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>


qemu args:
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=1073741824
-numa node,nodeid=0,cpus=0,memdev=ram-node0


Possible solution XML and qemu args (using virt UUID as filename):

qemu.conf:

memory_backing_dir = "/var/lib/libvirt/qemu/ram"
memory_backing_filename_use_uuid = 1

XML snippet:
  <uuid>ef1bdff4-27f3-4e85-a807-5fb4d58463cc</uuid>
  <memory unit='KiB'>1048576</memory>
  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>

qemu args:
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/ef1bdff4-27f3-4e85-a807-fb4d58463cc,share=yes,size=1073741824
-numa node,nodeid=0,cpus=0,memdev=ram-node0

Comment 2 Michal Privoznik 2017-07-27 12:10:31 UTC

(In reply to Zack Cornelius from comment #0)
> Description of problem:
> When using memoryBacking source type 'file' with qemu, libvirt passes the
> directory from qemu.conf's memory_backing_dir as the mem-path argument for
> the object. This leads to qemu using a tmpfile for the file backing the
> memory.
> 
> Our use case uses a libvirt hook script to create a symlink to an existing
> file for qemu to use as the backing store. For this to work, libvirt needs
> to specify a specific filename instead of just the directory for mem-path. 
> 
> I think this could be accomplished via having an option to use a predefined
> filename (such as the guest's UUID) or allowing the XML to specify the
> filename for the backing file.

UUID is not enough. Thing is, a domain can have multiple memory-object-files. There's <memory model='dimm'/> which can be repeated multiple times in domain definition. And each time we want to have a different path for it. In this light letting users specify the filename in domain XML looks better. However, there might be some drivers (hypervisors) that don't have traditional UNIX path representation of objects which is the reason we haven't exposed the mem-path just yet and worked around it so far.

http://libvirt.org/formatdomain.html#elementsMemory

Comment 3 Michal Privoznik 2017-07-28 14:16:26 UTC

Zack,

I've started discussion on the upstream list:

https://www.redhat.com/archives/libvir-list/2017-July/msg01248.html

The design is still a bit unclear. For instance, what do you need the path for? Is is enough to learn it once qemu has started or do you need to know it upfront (e.g. because Kove creates the file and qemu then just merely mmap()-s it)? Also, as Dan pointed out, if you have a kernel module that implements its own version of tmpfs, shouldn't that be enough since you'll learn the paths once the module handles mmap() issued by qemu?

Comment 4 Zack Cornelius 2017-08-14 18:28:50 UTC

Kove dynamically creates the file(s) in a virtual filesystem used by qemu, based on allocating from a hardware backing device. We expect to then use the libvirt prepare hooks to symlink the file created to the location libvirt/qemu is expecting. With this, we'll need to know or be able to determine the filename upfront.

Because of the need to allocate, and track allocations on the hardware device, we don't act as a standard tmpfs, and do not allow creation of files in the virtual filesystem, outside of our allocation and connection management, so we won't be able to point memory_backing_dir to our virtual filesystem without being able to create the files using some form of predicatable names, prior to running qemu

Comment 5 Michal Privoznik 2017-09-12 09:47:03 UTC

(In reply to Zack Cornelius from comment #4)
>

Zack, I don't know if you follow the upstream discussion, but the digest is that upstream doesn't want to expose paths anywhere because that is Linux specific. For instance for hugepages we have the following:

  <memoryBacking>
    <hugepages>
      <page size='2' unit='MiB'/>
    </hugepages>
  </memoryBacking>

This is generic enough to work on any future systems (e.g. *BSDs), where hugepages are not necessarily represented as paths. Now, if we blindly allow users to set -mem-path by exposing it in the domain XML all bets are off.

However, if Kove's kernel module would create tmpfs-like FS (just like hugetlbfs is), libvirt can detect it on its start and then no path needs to be exposed since libvirt already puts all the files under one directory.

Anyway, it'd be great if you could join the upstream discussion:

https://www.redhat.com/archives/libvir-list/2017-September/msg00089.html

Comment 7 Eduardo Habkost 2017-10-19 19:21:16 UTC

@Zack and Kove team:

Do you have plans for management UI changes to support the new features?  Do we need Nova and/or RHEV BZs too?

Comment 8 Michal Privoznik 2017-10-23 15:45:46 UTC

After some discussion upstream, I think we finally have a clear consensus on the design. So I've implemented it:

https://www.redhat.com/archives/libvir-list/2017-October/msg01063.html

Comment 9 Michal Privoznik 2017-10-24 11:42:41 UTC

Another attempt:

https://www.redhat.com/archives/libvir-list/2017-October/msg01091.html

Comment 10 Michal Privoznik 2017-11-09 15:07:59 UTC

To POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches//2017-November/msg00237.html

Comment 12 Luyao Huang 2018-01-11 07:57:40 UTC

I found a problem when trying to verify this bug:

1. make guest use file as memory backend

  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>

2. start guest:
# virsh start vm1
Domain vm1 started

3. check the memory backing file:

# ll /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
total 346948
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node1
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node2
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node3

4. attach a memory device:

# cat mem.xml 
  <memory model='dimm' access='private'>
    <target>
      <size unit='MiB'>256</size>
      <node>0</node>
    </target>
  </memory>


# virsh attach-device vm1 mem.xml 
Device attached successfully

# ll /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
total 373712
-rw-r--r--. 1 qemu qemu 268435456 Jan 11 01:41 dimm0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node1
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node2
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node3

5. detach this memory device:

# virsh detach-device vm1 mem.xml 
Device detached successfully

# ll /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
total 388616
-rw-r--r--. 1 qemu qemu 268435456 Jan 11 01:41 dimm0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node1
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node2
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node3

6. attach a mem device which size bigger than the first one:

# cat mem2.xml 
  <memory model='dimm' access='private'>
    <target>
      <size unit='MiB'>512</size>
      <node>0</node>
    </target>
  </memory>

# virsh attach-device vm1 mem2.xml 
error: Failed to attach device from mem2.xml
error: internal error: unable to execute QEMU command 'object-add': backing store (null) size 0x10000000 does not match 'size' option 0x20000000


You can see that when attach->detach->attach libvirt will use the same name dimm0 and if the memory device size bigger than the first time attached, qemu will reject the attach request.

Since kove will manage the file in the vfs, maybe they will create the dimm memory backing file and delete it after detach device, then this problem won't be happened in kove system.

Hi Michal, Could you please help to check if this is a bug ? Thanks in advance for your reply !

Comment 13 Michal Privoznik 2018-01-11 09:58:35 UTC

(In reply to Luyao Huang from comment #12)
>
> # virsh attach-device vm1 mem2.xml 
> error: Failed to attach device from mem2.xml
> error: internal error: unable to execute QEMU command 'object-add': backing
> store (null) size 0x10000000 does not match 'size' option 0x20000000
> 

This is because qemu/libvirt does not unlink the file after the first detach so it is laying around. Then, when you try to hotplug it again with changed size we advertise qemu new size on the monitor but the file itself is left untouched and this confuses qemu. I'm not quite sure who should unlink the file - whether it should be libvirt or qemu (who creates the file in the first place). Let me discuss with qemu developers and get back to you (not clearing the needinfo flag for now).

Comment 14 Michal Privoznik 2018-01-11 12:27:28 UTC

So after some IRC discussion I came to conclusion that it'd be for the best if libvirt removes the file on hot unplug. I've proposed the patch here:

https://www.redhat.com/archives/libvir-list/2018-January/msg00350.html

However, I'm not quite sure whether this fits properly into Kove's use case. Zack, can you please take a look?

Comment 15 Zack Cornelius 2018-01-11 16:56:47 UTC

This patch as-proposed will work for Kove's use cases.

Comment 16 Luyao Huang 2018-02-01 08:53:22 UTC

According to comment 14, one more patch need backport to fix the issue in comment 12, move this bug status to ASSIGNED

Comment 17 Michal Privoznik 2018-02-01 14:44:16 UTC

V2:

https://www.redhat.com/archives/libvir-list/2018-February/msg00051.html

Comment 18 Michal Privoznik 2018-02-01 14:44:59 UTC

(In reply to Michal Privoznik from comment #17)
> V2:
> 
> https://www.redhat.com/archives/libvir-list/2018-February/msg00051.html

Ah, sorry. Updated wrong bug. Ignore that comment please.

Comment 19 Michal Privoznik 2018-02-02 10:16:30 UTC

To POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-February/msg00050.html

Comment 20 Luyao Huang 2018-02-06 09:05:52 UTC

verify this bug with libvirt-3.9.0-11.el7.x86_64:

1. prepare a guest which config memory shared=on and backend=file:

  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>
...
  <cpu mode='host-model' check='full'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='1' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>


2. start guest:

# virsh start vm1
Domain vm1 started

3. check guest cmdline:

# ps aux|grep qemu
...
 -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/ram-node0,share=yes,size=1073741824 -numa node,nodeid=0,cpus=0,memdev=ram-node0 -object memory-backend-file,id=ram-node1,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/ram-node1,share=yes,size=1073741824 -numa node,nodeid=1,cpus=1,memdev=ram-node1

4. check the memory dir:

# ll -Z /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
-rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c41,c283 ram-node0
-rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c41,c283 ram-node1

5. attach a memory device:

# cat mem.xml 
  <memory model='dimm' access='private'>
    <target>
      <size unit='MiB'>128</size>
      <node>0</node>
    </target>
  </memory>

# virsh attach-device vm1 mem.xml 
Device attached successfully

6. recheck the memory dir:

# ll -Z /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
-rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c41,c283 dimm0
-rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c41,c283 ram-node0
-rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c41,c283 ram-node1

7. detach memory device:

# virsh detach-device vm1 mem.xml 
Device detached successfully

8. recheck memory dir, cannot find dimm0:

# ll -Z /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
-rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c41,c283 ram-node0
-rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c41,c283 ram-node1

9. destroy guest and the memory dir for this guest will been deleted:

# virsh destroy vm1
Domain vm1 destroyed

# ll -Z /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
ls: cannot access /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/: No such file or directory

10. change the memory_backing_dir in qemu.conf and retest steps 1-9, and get the same result (except the dir path)

Comment 26 errata-xmlrpc 2018-04-10 10:48:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704