Bug 1541570 - [RFE] Enable file-backed memory
Summary: [RFE] Enable file-backed memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: beta
: 14.0 (Rocky)
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
: 1578831 (view as bug list)
Depends On: 1461214
Blocks: 1558125 1578831 1594272 1795933
TreeView+ depends on / blocked
 
Reported: 2018-02-02 23:04 UTC by Zack Cornelius
Modified: 2023-03-21 18:44 UTC (History)
26 users (show)

Fixed In Version: openstack-nova-18.0.0-0.20180710150340.8469fa7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1461214
: 1594272 (view as bug list)
Environment:
Last Closed: 2019-01-11 11:48:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 567876 0 'None' MERGED Implement file backed memory for instances in libvirt 2021-02-03 12:43:16 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:49:12 UTC

Description Zack Cornelius 2018-02-02 23:04:32 UTC
+++ This bug was initially created as a clone of Bug #1461214 +++

Description of problem:

To enable the functionality implemented in Bug #1461214, we need a new Nova host level libvirt configuration option to enable adding the memoryBacking element, with source type=file and access mode=shared, to enable Kove's qemu integration with Openstack Nova.

Additionally, we should have an option to enable qemu's 'discard-data' option for file backed memory. (discard-data implemented in Bug #1460848, libvirt implementation pending in Bug #1480668)


Example desired libvirt XML snippet:

  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>


(Not show above is the flag for discard-data, which is not yet available in libvirt XML)





+++ Original text of Bug #1461214 +++

Description of problem:
When using memoryBacking source type 'file' with qemu, libvirt passes the directory from qemu.conf's memory_backing_dir as the mem-path argument for the object. This leads to qemu using a tmpfile for the file backing the memory.

Our use case uses a libvirt hook script to create a symlink to an existing file for qemu to use as the backing store. For this to work, libvirt needs to specify a specific filename instead of just the directory for mem-path. 

I think this could be accomplished via having an option to use a predefined filename (such as the guest's UUID) or allowing the XML to specify the filename for the backing file.


Example existing XML and qemu args:

qemu.conf:
memory_backing_dir = "/var/lib/libvirt/qemu/ram"

XML snippet:
  <uuid>ef1bdff4-27f3-4e85-a807-5fb4d58463cc</uuid>
  <memory unit='KiB'>1048576</memory>
  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>


qemu args:
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=1073741824
-numa node,nodeid=0,cpus=0,memdev=ram-node0


Possible solution XML and qemu args (using virt UUID as filename):

qemu.conf:

memory_backing_dir = "/var/lib/libvirt/qemu/ram"
memory_backing_filename_use_uuid = 1

XML snippet:
  <uuid>ef1bdff4-27f3-4e85-a807-5fb4d58463cc</uuid>
  <memory unit='KiB'>1048576</memory>
  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>

qemu args:
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/ef1bdff4-27f3-4e85-a807-fb4d58463cc,share=yes,size=1073741824
-numa node,nodeid=0,cpus=0,memdev=ram-node0

--- Additional comment from Michal Privoznik on 2017-07-27 08:10:31 EDT ---

(In reply to Zack Cornelius from comment #0)
> Description of problem:
> When using memoryBacking source type 'file' with qemu, libvirt passes the
> directory from qemu.conf's memory_backing_dir as the mem-path argument for
> the object. This leads to qemu using a tmpfile for the file backing the
> memory.
> 
> Our use case uses a libvirt hook script to create a symlink to an existing
> file for qemu to use as the backing store. For this to work, libvirt needs
> to specify a specific filename instead of just the directory for mem-path. 
> 
> I think this could be accomplished via having an option to use a predefined
> filename (such as the guest's UUID) or allowing the XML to specify the
> filename for the backing file.

UUID is not enough. Thing is, a domain can have multiple memory-object-files. There's <memory model='dimm'/> which can be repeated multiple times in domain definition. And each time we want to have a different path for it. In this light letting users specify the filename in domain XML looks better. However, there might be some drivers (hypervisors) that don't have traditional UNIX path representation of objects which is the reason we haven't exposed the mem-path just yet and worked around it so far.

http://libvirt.org/formatdomain.html#elementsMemory

--- Additional comment from Michal Privoznik on 2017-07-28 10:16:26 EDT ---

Zack,

I've started discussion on the upstream list:

https://www.redhat.com/archives/libvir-list/2017-July/msg01248.html

The design is still a bit unclear. For instance, what do you need the path for? Is is enough to learn it once qemu has started or do you need to know it upfront (e.g. because Kove creates the file and qemu then just merely mmap()-s it)? Also, as Dan pointed out, if you have a kernel module that implements its own version of tmpfs, shouldn't that be enough since you'll learn the paths once the module handles mmap() issued by qemu?

--- Additional comment from Zack Cornelius on 2017-08-14 14:28:50 EDT ---

Kove dynamically creates the file(s) in a virtual filesystem used by qemu, based on allocating from a hardware backing device. We expect to then use the libvirt prepare hooks to symlink the file created to the location libvirt/qemu is expecting. With this, we'll need to know or be able to determine the filename upfront.

Because of the need to allocate, and track allocations on the hardware device, we don't act as a standard tmpfs, and do not allow creation of files in the virtual filesystem, outside of our allocation and connection management, so we won't be able to point memory_backing_dir to our virtual filesystem without being able to create the files using some form of predicatable names, prior to running qemu

--- Additional comment from Michal Privoznik on 2017-09-12 05:47:03 EDT ---

(In reply to Zack Cornelius from comment #4)
>

Zack, I don't know if you follow the upstream discussion, but the digest is that upstream doesn't want to expose paths anywhere because that is Linux specific. For instance for hugepages we have the following:

  <memoryBacking>
    <hugepages>
      <page size='2' unit='MiB'/>
    </hugepages>
  </memoryBacking>

This is generic enough to work on any future systems (e.g. *BSDs), where hugepages are not necessarily represented as paths. Now, if we blindly allow users to set -mem-path by exposing it in the domain XML all bets are off.

However, if Kove's kernel module would create tmpfs-like FS (just like hugetlbfs is), libvirt can detect it on its start and then no path needs to be exposed since libvirt already puts all the files under one directory.

Anyway, it'd be great if you could join the upstream discussion:

https://www.redhat.com/archives/libvir-list/2017-September/msg00089.html

--- Additional comment from Eduardo Habkost on 2017-10-19 15:21:16 EDT ---

@Zack and Kove team:

Do you have plans for management UI changes to support the new features?  Do we need Nova and/or RHEV BZs too?

--- Additional comment from Michal Privoznik on 2017-10-23 11:45:46 EDT ---

After some discussion upstream, I think we finally have a clear consensus on the design. So I've implemented it:

https://www.redhat.com/archives/libvir-list/2017-October/msg01063.html

--- Additional comment from Michal Privoznik on 2017-10-24 07:42:41 EDT ---

Another attempt:

https://www.redhat.com/archives/libvir-list/2017-October/msg01091.html

--- Additional comment from Michal Privoznik on 2017-11-09 10:07:59 EST ---

To POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches//2017-November/msg00237.html

--- Additional comment from Luyao Huang on 2018-01-11 02:57:40 EST ---

I found a problem when trying to verify this bug:

1. make guest use file as memory backend

  <memoryBacking>
    <source type='file'/>
    <access mode='shared'/>
  </memoryBacking>

2. start guest:
# virsh start vm1
Domain vm1 started

3. check the memory backing file:

# ll /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
total 346948
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node1
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node2
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:39 ram-node3

4. attach a memory device:

# cat mem.xml 
  <memory model='dimm' access='private'>
    <target>
      <size unit='MiB'>256</size>
      <node>0</node>
    </target>
  </memory>


# virsh attach-device vm1 mem.xml 
Device attached successfully

# ll /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
total 373712
-rw-r--r--. 1 qemu qemu 268435456 Jan 11 01:41 dimm0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node1
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node2
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node3

5. detach this memory device:

# virsh detach-device vm1 mem.xml 
Device detached successfully

# ll /var/lib/libvirt/qemu/ram/libvirt/qemu/12-vm1/
total 388616
-rw-r--r--. 1 qemu qemu 268435456 Jan 11 01:41 dimm0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node0
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node1
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node2
-rw-r--r--. 1 qemu qemu 524288000 Jan 11 01:41 ram-node3

6. attach a mem device which size bigger than the first one:

# cat mem2.xml 
  <memory model='dimm' access='private'>
    <target>
      <size unit='MiB'>512</size>
      <node>0</node>
    </target>
  </memory>

# virsh attach-device vm1 mem2.xml 
error: Failed to attach device from mem2.xml
error: internal error: unable to execute QEMU command 'object-add': backing store (null) size 0x10000000 does not match 'size' option 0x20000000


You can see that when attach->detach->attach libvirt will use the same name dimm0 and if the memory device size bigger than the first time attached, qemu will reject the attach request.

Since kove will manage the file in the vfs, maybe they will create the dimm memory backing file and delete it after detach device, then this problem won't be happened in kove system.

Hi Michal, Could you please help to check if this is a bug ? Thanks in advance for your reply !

--- Additional comment from Michal Privoznik on 2018-01-11 04:58:35 EST ---

(In reply to Luyao Huang from comment #12)
>
> # virsh attach-device vm1 mem2.xml 
> error: Failed to attach device from mem2.xml
> error: internal error: unable to execute QEMU command 'object-add': backing
> store (null) size 0x10000000 does not match 'size' option 0x20000000
> 

This is because qemu/libvirt does not unlink the file after the first detach so it is laying around. Then, when you try to hotplug it again with changed size we advertise qemu new size on the monitor but the file itself is left untouched and this confuses qemu. I'm not quite sure who should unlink the file - whether it should be libvirt or qemu (who creates the file in the first place). Let me discuss with qemu developers and get back to you (not clearing the needinfo flag for now).

--- Additional comment from Michal Privoznik on 2018-01-11 07:27:28 EST ---

So after some IRC discussion I came to conclusion that it'd be for the best if libvirt removes the file on hot unplug. I've proposed the patch here:

https://www.redhat.com/archives/libvir-list/2018-January/msg00350.html

However, I'm not quite sure whether this fits properly into Kove's use case. Zack, can you please take a look?

--- Additional comment from Zack Cornelius on 2018-01-11 11:56:47 EST ---

This patch as-proposed will work for Kove's use cases.

--- Additional comment from Luyao Huang on 2018-02-01 03:53:22 EST ---

According to comment 14, one more patch need backport to fix the issue in comment 12, move this bug status to ASSIGNED

--- Additional comment from Michal Privoznik on 2018-02-01 09:44:16 EST ---

V2:

https://www.redhat.com/archives/libvir-list/2018-February/msg00051.html

--- Additional comment from Michal Privoznik on 2018-02-01 09:44:59 EST ---

(In reply to Michal Privoznik from comment #17)
> V2:
> 
> https://www.redhat.com/archives/libvir-list/2018-February/msg00051.html

Ah, sorry. Updated wrong bug. Ignore that comment please.

--- Additional comment from Michal Privoznik on 2018-02-02 05:16:30 EST ---

To POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-February/msg00050.html

Comment 2 Matthew Booth 2018-05-25 09:36:10 UTC
*** Bug 1578831 has been marked as a duplicate of this bug. ***

Comment 4 Artom Lifshitz 2018-05-31 10:59:07 UTC
*** Bug 1578831 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2019-01-11 11:48:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.