Hi, Description of problem: The #virsh save/restore" to save a VM's memory state to a checkpoint file and restore from it worked all the time until we tried it on a large memory VM with 16GB. The error from command line is [root@node5 resumeTest]# virsh restore checkpoint error: Failed to restore domain from checkpoint error: operation failed: failed to start VM The qemu log shows an error cat: write error: Broken pipe The ibvirtd log shows 11:25:27.030: error : internal error Timed out while reading monitor startup output 11:25:27.030: error : internal error unable to start guest: char device redirected to /dev/pts/3 11:25:27.031: error : operation failed: failed to start VM How to reproduce this? On a RHEL-5.5 host, start a 64bit VM ( RHEL 5 u5 ) with 4 CPU and 16GB of RAM. The key here is to actually use large memory otherwise the checkpoint file would be small and restore will still work. Below C code has been written to make sure memory is allocated: [root@vrstorm ~]# cat largeAllocate.c #include<stdio.h> #include<stdlib.h> //#define SIZE 2000000000 //2G*8Byte=16GB #define SIZE 1500000000 //1.5G*8Byte=12GB int main() { printf("hello,large memory\n"); long *pts=(long *)malloc(sizeof(long)*SIZE);//8byte per long long i; for (i=0;i<SIZE;i++) { pts[i]=i; } //dead loop printf("entering dead loop\n"); while (1) { } free(pts); return 0; } Build this code simply with gcc on the VM and run it. It will consume 12GB of memory (check it by top or free). Then on the RHEL-5.5 host, issue the command virsh save <vmname> checkpoint This will take a while and the resulting checkpoint file is 12GB in disk space. Now restore it by "virsh restore checkpoint" and you will see the error I reported above. Version-Release number of selected component (if applicable): kvm-tools-83-164.el5_5.15 kvm-qemu-img-83-164.el5_5.15 etherboot-zroms-kvm-5.4.4-13.el5 libvirt-python-0.6.3-33.el5_5.3 kvm-83-164.el5_5.23 libvirt-0.6.3-33.el5_5.3 kmod-kvm-83-164.el5_5.15 Steps to Reproduce: As mentioned in problem description part. Actual results: virsh restore fails with above mentioned error. Expected results: virsh restore should not fail with 16G (RAM) guest. --Humble
Created attachment 450994 [details] Qemu driver timeout patch This patch will increase the timeout value..
> Even a guest with 1G that keep dirtying his pages will create the same timeout > for you. So the above BZ is not a blocker here I can't reproduce even with a 4GB guest, where almost all pages were made dirty. The result was state file with 4.2GB and I could resume from that without any issues.
> I can't reproduce even with a 4GB guest, where almost all pages were made > dirty. The result was state file with 4.2GB and I could resume from that > without any issues. Ah but I was able to reproduce it with 0.6.3-based libvirt from RHEL-5.5. It seems like the rebase fixed this issue. Humble, could you try with the most recent libvirt packages for RHEL-5? Current version is libvirt-0.8.2-12.el5
(In reply to comment #6) > > Even a guest with 1G that keep dirtying his pages will create the same timeout > > for you. So the above BZ is not a blocker here > > I can't reproduce even with a 4GB guest, where almost all pages were made > dirty. The result was state file with 4.2GB and I could resume from that > without any issues. My fault, savevm/loadvm are not live migration into file so it doesn't matter what the guest do since it is paused.
I was able to reproduce this bug with RHEL55. Also, I can confirm that RHEL56 fixes it.