Bug 822712

Summary: qemu-kvm consumes vast amounts of memory during libvirt migration with --copy-storage-all
Product: [Fedora] Fedora Reporter: Benjamin S. Scarlet <scarlet>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: amit.shah, berrange, cfergeau, crobinso, dwmw2, itamar, juzhang, knoel, pbonzini, scottt.tw, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-11 16:54:48 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Benjamin S. Scarlet 2012-05-17 17:10:19 EDT
Description of problem:
N.B. - I'm using qemu through libvirt, and am only taking an educated guess that this issue is in qemu.

Migrating a VM along with its storage, the qemu-kvm process on the origin machine consumes memory approximately the size of the storage volumes being migrated. A virtual disk larger than the available RAM can easily push the origin machine into swapping and beyond, bringing the machine to its knees.

Version-Release number of selected component (if applicable):

How reproducible:
Every time

Steps to Reproduce:
1. Create a migration environment: two similar hosts with matching networking setups and identically named LVM volume groups exposed to libvirt as identically named storage pools. Suppose the hosts are named migration-host-1 and migration-host-2

2. Create identically named volumes in the LVM storage pools on the two hosts, enough larger than the memory a guest would need that the difference is obvious, for more-troubling-results larger than the available RAM on the hosts, and for most-troubling-results larger than the available swap on the hosts.

In my case, I've got hosts with 16GiB of RAM and 32GiB of swap, and I'm trying to migrate a guest with a 64GiB LVM volume for its disk.

3. Create a VM on migration-host-1, drawing its storage from the LVM pool. Choose the LVM volume you created in step 2 for the disk of the guest. Install something on the guest and get it running - exactly what shouldn't matter - a minimal install of Fedora 16 should be fine. Name the guest "migration-guest".

Don't bother running anything interesting in the guest - just a running OS sitting idle should be enough to see the issue.

4. Start monitoring the qemu-kvm process for migration-guest on migration-host-1 with top.

5. With the new VM running, attempt to migrate it to the other host. On migration-host-1, execute:
$ virsh migrate --live --persistent --undefinesource --copy-storage-all migration-guest qemu+ssh://root@migration-host-2/system
Actual results:

The virtual memory consumed by the qemu process increases dramatically when the migration starts, by roughly the size of the storage volume being migrated along with the host. The resident size increases as much as it can - up to the available RAM. The swap consumed on the machine starts increasing.

Expected results:

I'd expect the additional RAM/swap consumed by the qemu processes to stay in reasonable bounds: at most something proportional to the size of any changes written to the disk by the VM during the migration - with a smallish constant: something usefully <1. I'm very naively imagining something like a hashtable entry for every block touched. 

Additional info:
Comment 1 Fedora End Of Life 2013-01-16 15:44:11 EST
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
Comment 2 Cole Robinson 2013-02-11 16:54:48 EST
Thanks for the detailed report Benjamin. The old style qemu block migration that virsh --copy-storage-all invokes is known to be pretty inefficient, though your description sounds much worse than my understanding.

In F19, libvirt will be using new qemu functionality to implement --copy-storage-all in a much more performant manner:


However F16 is going unsupported now, and qemu has changed a lot since then. If you can still reproduce on F18, please reopen this bug. Though I'll probably just move it to the upstream tracker since it's unlikely to be fixed in Fedora