Created attachment 778295 [details] 'virsh dumpxml <domain>' output Description of problem: I have a VM (10 vcpus, 20GB RAM, 60GB disk on qcow2 file), and I'm snapshotting it with 'virsh snapshot-create <name>'. This is painfully slow, takes about 10 to 20 minutes every time. Running 'dstat 10' while the snapshot is being created paints the following picture: % dstat 10 You did not select any stats, using -cdngy by default. ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 67 1 32 0 0 0| 58k 108k| 0 0 |2059B 7088B| 18k 2288 2 8 82 7 0 1|8859k 270M|2719B 99B|7955k 38M|6223 52k 2 7 80 11 0 1|1060k 360M|3098B 69B| 560k 56M|5706 41k 2 6 85 7 0 1|4275k 254M|5316B 557B|2766k 15M|4995 40k 2 8 81 8 0 1|1966k 359M|2400B 333B| 283k 36M|5778 58k 3 7 75 14 0 1|4415k 414M|2611B 126B|1541k 15M|5151 41k 2 2 87 9 0 0| 90M 206M|4559B 46B| 89M 5475k|6370 16k 2 3 88 6 0 0| 12M 88M|3451B 70B| 11M 3674k|5707 19k 1 1 90 7 0 0| 331k 14k|2982B 69B|6554B 0 |5066 13k 1 1 90 7 0 0|1166k 14k|4925B 368B| 53k 0 |5177 13k 2 2 88 7 0 0| 78k 5734B|4141B 342B| 44k 0 |5763 14k 1 1 90 7 0 0| 100k 52k|5973B 163B| 26k 0 |5061 13k 1 1 89 8 0 0|4872k 438k|6188B 46B|4546k 0 |5410 14k 1 1 90 7 0 0| 258k 263k|2517B 125B| 96k 0 |5133 13k 1 1 89 7 0 0|1420k 118k|3099B 69B| 269k 0 |5401 14k 1 2 89 7 0 0|3016k 184k|6393B 836B| 164k 0 |5355 14k 1 1 90 7 0 0| 58k 119k|2505B 315B| 13k 0 |5063 13k 1 1 90 7 0 0| 556k 45k|3001B 126B| 165k 0 |5075 13k 1 1 90 7 0 0| 19k 79k|4634B 64B| 0 0 |5082 13k 2 2 89 7 0 0| 22k 138k|6942B 70B|3277B 0 |5462 14k 1 1 90 7 0 0|6554B 246k|4795B 69B| 0 0 |5216 13k 1 2 89 7 0 0| 21k 726k|7048B 368B|3277B 0 |5313 14k 1 1 90 7 0 0| 80k 378k|6377B 534B| 13k 0 |5267 13k 1 1 90 7 0 0| 555k 297k|4726B 320B| 81k 0 |5157 13k 1 2 89 7 0 0| 48k 176k|4960B 98B| 0 0 |5415 14k 1 1 90 7 0 0| 11k 126k|3119B 70B|3277B 0 |5100 13k 2 1 89 7 0 0| 11k 514k|5188B 69B| 0 0 |5282 14k 1 1 90 7 0 0| 377k 448k|7469B 830B| 74k 0 |5241 13k 1 1 90 7 0 0| 0 124k|4237B 342B| 0 0 |4941 13k 4 2 86 7 0 0| 0 628k|7080B 108B| 0 0 |5921 13k 1 1 90 7 0 0| 0 104k|8736B 59B| 0 0 |4977 13k 1 1 90 7 0 0|6554B 92k|2491B 110B| 0 0 |5032 13k 1 1 90 7 0 0| 0 348k|5690B 168B| 0 0 |5169 14k 1 2 89 7 0 0| 0 162k|7498B 457B| 0 0 |5318 14k 1 2 89 7 0 0|1016k 254k|3821B 415B| 26k 0 |5368 14k 2 1 89 7 0 0| 168k 162k|3944B 163B| 45k 0 |5306 13k 1 1 90 7 0 0|6554B 52k|4624B 46B| 0 0 |5074 13k 2 2 89 7 0 0| 11k 446k|2647B 88B| 0 0 |5405 14k 1 1 90 7 0 0|9830B 132k|2863B 69B|9830B 0 |5348 14k 1 1 89 7 0 0|9830B 321k|4570B 465B|3277B 0 |5365 14k 3 2 87 7 0 0| 766k 59k|9958B 893B| 131k 0 |5705 13k 2 1 90 7 0 0| 13k 137k|3422B 108B| 0 0 |5249 13k 1 1 90 7 0 0|6554B 47k|6027B 118B| 0 0 |5057 13k 1 1 90 7 0 0|6554B 40k|3115B 70B| 0 0 |5280 14k 2 2 89 7 0 0| 536k 1020k|4818B 1285B| 13k 0 |5677 14k 1 1 90 7 0 0|6554B 93k|5441B 592B| 0 0 |5147 13k 1 1 90 7 0 0| 47k 222k|4959B 336B|3277B 0 |5293 14k 1 1 90 7 0 0| 19k 370k|2989B 121B| 0 0 |5137 13k 1 1 90 7 0 0| 0 318k|5348B 46B| 0 0 |5069 13k 1 1 90 7 0 0| 813k 60k|3161B 70B| 144k 0 |5251 13k 1 1 90 7 0 0|6963B 244k|3018B 87B| 0 0 |5065 13k 2 2 89 7 0 0| 506k 646k|7041B 1968B| 210k 0 |5507 14k 1 1 90 7 0 0|9830B 682k|2697B 166B|3277B 0 |5236 13k 2 2 88 7 0 0|1208k 1248k| 43k 1388B| 595k 0 |5751 14k 3 3 86 7 0 0| 460k 66k| 195k 13k| 153k 0 |6704 16k 2 1 89 7 0 0| 13k 1201k|6810B 354B| 0 0 |5399 14k 1 1 90 7 0 0|7373B 212k|3954B 94B| 0 0 |5104 13k 1 1 90 7 0 0| 13k 115k|6112B 851B| 0 0 |5243 13k 1 2 89 7 0 0| 13k 31k|2880B 60B| 0 0 |5416 14k 2 2 88 7 0 0| 29k 1171k|2869B 465B| 19k 0 |5794 14k 1 1 90 7 0 0| 19k 68k|4226B 92B| 0 0 |5207 13k 2 2 88 7 0 0| 112k 14k|4070B 871B|6554B 0 |5980 15k 3 2 87 7 0 0| 277k 1176k| 28k 6770B| 116k 0 |6382 16k 3 3 87 7 0 0| 130k 40k|4659B 696B| 38k 0 |6118 15k 4 2 86 7 0 0| 78k 1207k| 30k 14k| 16k 0 |6414 16k 3 2 87 7 0 0| 19k 120k|2558B 108B| 0 0 |6088 16k 2 2 88 7 0 0| 751k 1205k|4636B 467B| 252k 0 |5880 14k 3 3 87 7 0 0| 486k 24k|2603B 164B| 288k 0 |5884 14k 2 2 89 7 0 0| 664k 22k|8793B 1057B| 120k 0 |5454 14k 2 2 89 7 0 0| 224k 61k|8581B 2907B|9421B 0 |5440 14k 2 2 89 7 0 0| 16k 47k|2560B 0 |3277B 0 |5353 14k 2 2 87 7 0 0| 312k 1196k|3693B 1077B| 56k 0 |5806 14k 2 1 88 7 0 0| 642k 676k|6162B 1130B| 406k 0 |5668 14k 3 3 87 7 0 0|1321k 678k|4803B 872B| 590k 0 |6276 16k 3 3 86 7 0 0|1088k 1234k|8607B 429B| 202k 0 |6487 16k 3 2 86 7 0 0| 148k 600k|9696B 2836B| 139k 0 |6080 15k 5 4 84 7 0 0| 560k 1842k|8201B 2219B| 228k 0 |6474 15k 2 1 89 7 0 0| 39k 603k|2793B 272B| 26k 0 |5261 13k 2 1 89 7 0 0| 13k 618k|4935B 0 | 0 0 |5365 13k 2 3 87 7 0 0| 406k 1174k| 15k 1474B| 68k 0 |6017 15k 1 1 91 7 0 0| 13k 126k|2467B 69B| 0 0 |5080 13k For the first minute qemu is writing 250-450 MB/s, and then for the next 10 minutes is does nothing noticable. qemu CPU usage is below 20%, it's not using the disk heavily. Maybe it's not a bug, and it's supposed to be like that, but it seems to be excessive. I'll happy to provide more information if necessary, but please point me in the right direction. Version-Release number of selected component (if applicable): qemu-kvm-1.4.2-4.fc19.x86_64 libvirt-1.0.5.2-1.fc19.x86_64 How reproducible: Every time I make a snapshot, maybe 10 times so far. Steps to Reproduce: 1. sudo virsh --connect qemu:///system snapshot-create schlock Actual results: > 10 minutes Expected results: Since the machine has 20GB memory, and the disk can write roughly 0.5GB/s, I'd expect the snapshot to be finished in 40s. Additional info: dumpxml in attachment.
This behavior looks like it's controlled by qemu, so I'm changing the component.
Internal snapshots are known to be slow; upstream qemu is working on adding new ways to do snapshots so that internal snapshots are no longer a bottleneck. However, if you want fast snapshots, I _highly_ recommend using external snapshots instead of internal, as those are lightning fast in comparison, and you don't have to wait for a new qemu.
Thank you for the quick answers. > However, if you want fast snapshots, I _highly_ recommend using external > snapshots instead of internal, as those are lightning fast in comparison I thought that external snapshots cannot be used as --disk-only only. I need to keep the whole memory state.
(In reply to Zbigniew Jędrzejewski-Szmek from comment #3) > I thought that external snapshots cannot be used as --disk-only only. I need > to keep the whole memory state. Saving memory state has been possible in external snapshots since libvirt 1.0.1.
Yes, I recently did a small test. Where even the memory state is also captured with external snapshots. Here is the info: External system checkpoint snapshot: Guest's disk-state will be saved in one file, its RAM & device-state will be saved in another file. Version Info ============ $ uname -r ; rpm -q qemu-kvm libvirt libguestfs 3.9.4-301.fc19.x86_64 qemu-kvm-1.5.0-4.fc19.x86_64 libvirt-1.0.5.1-1.fc19.x86_64 libguestfs-1.21.38-1.fc19.x86_64 Snapshot creation ================= 1/ Start the guest: $ virsh start fed18 Domain fed18 started 2/ List its block device in use: $ virsh domblklist fed18 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/fed18.qcow2 3/ Make an XML file for snapshot: $ cat /var/tmp/ext-disk-ram-snap.xml <domainsnapshot> <memory snapshot='external' file='/var/lib/libvirt/images/fed18.mem.snap2'/> <disks> <disk name='vda'> <source file='/var/lib/libvirt/images/fed18.disk.snap2'/> </disk> </disks> </domainsnapshot> 4/ Create snapshot (disk & memory state) w/ the the above XML file of the running guest: $ virsh snapshot-create fed18 \ --xmlfile /var/tmp/ext-disk-ram-snap.xml --atomic Domain snapshot 1370930820 created from '/var/tmp/ext-disk-ram-snap.xml' 5/ List the snapshots: $ virsh snapshot-list fed18 Name Creation Time State ------------------------------------------------------------ 1370930820 2013-06-11 11:37:00 +0530 running 6/ Again, list the block device in use (note - the disk image file in use is the new qcow2 file we specified in the snapshot XML above): $ virsh domblklist fed18 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/fed18.disk.snap2 7/ Find more information about the disk image (it can be seen, qcow2 overlays are being used): $ qemu-img info --backing-chain /var/lib/libvirt/images/fed18.disk.snap2 image: /var/lib/libvirt/images/fed18.disk.snap2 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 196K cluster_size: 65536 backing file: /var/lib/libvirt/images/fed18.qcow2 backing file format: qcow2 image: /var/lib/libvirt/images/fed18.qcow2 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 1.0G cluster_size: 65536 Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 1370847718 0 2013-06-10 12:31:58 00:00:00.000 8/ List both files which has the snapshotted data: $ file /var/lib/libvirt/images/fed18.mem.snap2 \ /var/lib/libvirt/images/fed18.disk.snap2 /var/lib/libvirt/images/fed18.mem.snap2: data /var/lib/libvirt/images/fed18.disk.snap2: QEMU QCOW Image (v2), has backing file (path /var/lib/libvirt/images/fed18.qcow2), 21474836480 bytes
Eric, Kashyap, thanks. Indeed, external snapshotting is super fast, and restoring is fast too. But I have a problem when I try to make a subsequent snapshots: when I make a second snapshot, everything seems to work, but snapshot-revert restores the *first* snapshot, completely ignoring the second one, to which it was told to revert. I created a new ext-disk-ram-snap.xml with modified paths: <domainsnapshot> <memory snapshot='external' file='/var/lib/libvirt/images/fedoratestday64.mem.snap4'/> <disks> <disk name='vda'> <source file='/var/lib/libvirt/images/fedoratestday64.disk.snap4'/> </disk> </disks> </domainsnapshot> and created the snapshot virsh$ snapshot-create fedoratestday64 --xmlfile /var/tmp/ext-disk-ram-snap.xml --atomic and restored the vm to it: virsh$ snapshot-revert fedoratestday <snapshot2> and it returns me to <snapshot1>. Looking at the xml of snapshot2: <domainsnapshot> <name>1374774685</name> <state>running</state> <parent> <name>1374774274</name> </parent> <creationTime>1374774685</creationTime> <memory snapshot='external' file='/var/lib/libvirt/images/fedoratestday64.mem.snap4'/> <disks> <disk name='vda' snapshot='external'> <driver type='qcow2'/> <source file='/var/lib/libvirt/images/fedoratestday64.disk.snap4'/> </disk> <disk name='hdc' snapshot='no'/> </disks> <domain type='kvm'> <name>fedoratestday64</name> <uuid>f1441071-9f40-68fc-13da-3f8355cbfc7b</uuid> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> <vcpu placement='static'>2</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-1.4'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/fedoratestday64.disk.snap2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/home/zbyszek/images/testday-20130530-x86_64.iso'/> <target dev='hdc' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='1' target='0' unit='0'/> </disk> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <controller type='usb' index='0'/> <controller type='pci' index='0' model='pci-root'/> <interface type='network'> <mac address='52:54:00:85:40:50'/> <source network='default'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <graphics type='spice' autoport='yes' listen='127.0.0.1'> <listen type='address' address='127.0.0.1'/> </graphics> <sound model='ich6'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </sound> <video> <model type='qxl' ram='65536' vram='65536' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> </devices> <seclabel type='dynamic' model='selinux' relabel='yes'/> </domain> </domainsnapshot> Both /var/lib/libvirt/images/fedoratestday64.disk.snap2 and /var/lib/libvirt/images/fedoratestday64.disk.snap4 are there, and .snap4 is in the wrong place?
Reverting to external snapshots is still under development in libvirt (see for example bug 987719); if you are trying to revert to an external snapshot, please ask on the libvirt-users list for steps to take to do the manual pieces that are not yet in place in libvirt proper.
Unfortunately, bug 987719 isn't public, but the gist of the bug summary is "support deletion of external snapshots". Eric, is there anywhere on the libvirt.org wiki with the manual steps?
Should this be moved to the libvirt upstream product then?
(In reply to Paolo Bonzini from comment #9) > Should this be moved to the libvirt upstream product then? Libvirt already has bugs tracking the need to implement snapshot-delete/revert for external snapshots, which is a secondary issue. This bug's primary issue is still about the slowness of qemu's internal snapshots, some of which may be addressed by pending upstream work for qemu 1.6 (or maybe 1.7, as we have already entered soft freeze for 1.6). I've retitled the bug accordingly. (In reply to Dave Allan from comment #8) > Unfortunately, bug 987719 isn't public, but the gist of the bug summary is > "support deletion of external snapshots". Eric, is there anywhere on the > libvirt.org wiki with the manual steps? I'll double check that (which probably means adding such a page), and post a URL back when I'm happy with the result.
I've started a wiki page; I still need to work on it more, but as it is a wiki, hopefully it will be something we can make more useful. http://wiki.libvirt.org/page/I_created_an_external_snapshot%2C_but_libvirt_won%27t_let_me_delete_or_revert_to_it
To avoid any scary backports let's just track this against F20, that's where we have the virt-manager support as well. F20/qemu 1.6 is much better here, but there's apparently an issue with slowness after the first internal snapshot. Patch here: https://lists.gnu.org/archive/html/qemu-devel/2013-09/msg02353.html
qemu-1.6.0-9.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/qemu-1.6.0-9.fc20
Package qemu-1.6.0-9.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing qemu-1.6.0-9.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-18471/qemu-1.6.0-9.fc20 then log in and leave karma (feedback).
qemu-1.6.0-10.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/qemu-1.6.0-10.fc20
qemu-1.6.0-10.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.