Bug 1003293 - qemu crash when boot from snapshot image file
qemu crash when boot from snapshot image file
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Jeff Cody
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-01 14:26 EDT by xu
Modified: 2013-11-05 14:28 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-05 14:28:06 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backtrack (5.13 KB, text/plain)
2013-09-01 14:26 EDT, xu
no flags Details
top output when boot with sn2(not crash) (157.99 KB, text/plain)
2013-09-06 02:23 EDT, xu
no flags Details
top output when boot with sn1(crashed) (157.99 KB, text/plain)
2013-09-06 02:23 EDT, xu
no flags Details
ps command output (35.86 KB, application/x-xz)
2013-09-09 23:05 EDT, xu
no flags Details

  None (edit)
Description xu 2013-09-01 14:26:46 EDT
Created attachment 792646 [details]
backtrack

Description of problem:

when boot win7 64 guest from a snapshot image, qemu crashed and report "Failed to allocate 4294967296 B: Cannot allocate memory" 


Version-Release number of selected component (if applicable):

qemu-kvm-1.5.3-2.el7.x86_64
kernel-3.10.0-15.el7.x86_64
glibc-2.17-27.el7.x86_64

How reproducible:

abort 80%

Steps to Reproduce:
1. boot win7 guest:
/root/test/autotest-devel/client/tests/virt/qemu/qemu \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20130902-003659-dpmQSqE4,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20130902-003659-dpmQSqE4,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20130902-003659-dpmQSqE4,path=/tmp/seabios-20130902-003659-dpmQSqE4,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20130902-003659-dpmQSqE4,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 \
    -drive file='/root/test/autotest-devel/client/tests/virt/shared/data/images/win7-64.qcow2',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writeback,snapshot=off,format=qcow2,aio=native \
    -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0 \
    -device rtl8139,netdev=idSjgfXF,mac='9a:30:31:32:33:34',bus=pci.0,addr=0x3,id='idQb0e95' \
    -netdev tap,id=idSjgfXF,vhost=on,vhostfd=25,fd=24 \
    -m 4096 \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -cpu 'SandyBridge',hv_relaxed \
    -M pc \
    -drive file='/root/test/autotest-devel/client/tests/virt/shared/data/isos/windows/winutils.iso',index=1,if=none,id=drive-ide0-0-1,media=cdrom,format=raw \
    -device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga std \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off  \
    -enable-kvm

2. create file in guest:
   D:\coreutils\DummyCMD.exe C:\test\image1 1048576 1
   
3. create live snapshot file sn1
    {'execute': 'blockdev-snapshot-sync', 'arguments': {'device': u'drive-ide0-0-0', 'snapshot-file': '/root/test/autotest-devel/client/tests/virt/shared/data/images/sn1.qcow2', 'format': 'qcow2'}, 'id': 'xXsGotfc'}

4. create file in guest:
   D:\coreutils\DummyCMD.exe C:\test\sn1 1048576 1
5.create live snapshot file sn2
    {'execute': 'blockdev-snapshot-sync', 'arguments': {'device': u'drive-ide0-0-0', 'snapshot-file': '/root/test/autotest-devel/client/tests/virt/shared/data/images/sn2.qcow2', 'format': 'qcow2'}, 'id': 'xXsGotfc'}

6. create file in guest:
D:\coreutils\DummyCMD.exe C:\test\sn2 1048576 1

7. shutdown guest

8. boot guest with sn2, and check file in guest, then shutdown guest

9. boot guest with sn1, and check file in guest, then shutdown guest

Actual results:

qemu will crash at step8 or step9

Expected results:

guest works fine

Additional info:

[root@localhost qemu]# cat /proc/meminfo 
MemTotal:        7791860 kB
MemFree:         6955336 kB
Buffers:               0 kB
Cached:           646232 kB
SwapCached:         3780 kB
Active:           394932 kB
Inactive:         258260 kB
Active(anon):       2312 kB
Inactive(anon):     5160 kB
Active(file):     392620 kB
Inactive(file):   253100 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8142844 kB
SwapFree:        8101444 kB
Dirty:                20 kB
Writeback:             0 kB
AnonPages:          3952 kB
Mapped:             4916 kB
Shmem:               460 kB
Slab:              71192 kB
SReclaimable:      22596 kB
SUnreclaim:        48596 kB
KernelStack:        1464 kB
PageTables:         4520 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12038772 kB
Committed_AS:     220052 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      357436 kB
VmallocChunk:   34359372444 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      101840 kB
DirectMap2M:     8165376 kB

cpu info:

...

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
stepping	: 7
microcode	: 0x25
cpu MHz		: 1666.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6784.18
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

see full backtrack in attachment
Comment 2 xu 2013-09-01 14:42:10 EDT
Not reproduce this issue with RHEL6.4 guest

Check memory info with top on host found that when qemu crash, qemu used abort 51% memory (win7 x86_64 guest), but only used abort 10% memory when guest from sn1 image(rhel6.4 x86_64 guest);

Thanks,
Xu
Comment 3 xu 2013-09-02 03:55:54 EDT
(In reply to xu from comment #2)
> Not reproduce this issue with RHEL6.4 guest
> 
> Check memory info with top on host found that when qemu crash, qemu used
> abort 51% memory (win7 x86_64 guest), but only used abort 10% memory when
> guest from sn1 image(rhel6.4 x86_64 guest);
> 
> Thanks,
> Xu

Change guest memory (windows) to 2048M guest can boot from sn1 image and no crash happened;

Feeling strange why qemu eat so much memory when boot from a snapshot image file; 

Thanks,
Xu
Comment 4 Xiaoqing Wei 2013-09-03 07:05:50 EDT
Tested on migration case: 2*4G vms on a 8G host, met same seg fault,

but this works for me after glibc downgrade to

glibc-common-2.17-4.el7.x86_64
glibc-devel-2.17-4.el7.x86_64
glibc-headers-2.17-4.el7.x86_64
glibc-2.17-4.el7.x86_64
glibc-debuginfo-common-2.17-4.el7.x86_64


Regards,
Xiaoqing.
Comment 5 Xiaoqing Wei 2013-09-03 22:13:43 EDT
(In reply to Xiaoqing Wei from comment #4)
> Tested on migration case: 2*4G vms on a 8G host, met same seg fault,
> 
> but this works for me after glibc downgrade to
> 
> glibc-common-2.17-4.el7.x86_64
> glibc-devel-2.17-4.el7.x86_64
> glibc-headers-2.17-4.el7.x86_64
> glibc-2.17-4.el7.x86_64
> glibc-debuginfo-common-2.17-4.el7.x86_64
> 
> 
> Regards,
> Xiaoqing.

Oops, I have to take this comment back, as more rounds of testing still seg faults

:(
Comment 6 Jeff Cody 2013-09-05 17:12:56 EDT
I've been unable to reproduce this, on qemu-kvm-1.5.3-2.el7.  This is on a machine with a little less than 8GB of RAM, and 8GB of swap.

A couple of questions:

In comment #4, Xiaoqing referenced "seg fault".  I assume that means the same abort that was hit in the backtrace and in the description (and not an actual SEGFAULT)?

Also, in the description /proc/meminfo shows <8GB of RAM, and ~8GB of swap.  What does the qemu memory usage look like while running the guest, with the base, sn1, and sn2?

Is there anything else running on this host (any other qemu instances, etc..)?

Thanks!
Comment 7 xu 2013-09-06 02:22:21 EDT
(In reply to Jeff Cody from comment #6)
> I've been unable to reproduce this, on qemu-kvm-1.5.3-2.el7.  This is on a
> machine with a little less than 8GB of RAM, and 8GB of swap.
> 
> A couple of questions:
> 
> In comment #4, Xiaoqing referenced "seg fault".  I assume that means the
> same abort that was hit in the backtrace and in the description (and not an
> actual SEGFAULT)?
> 
yes, the backtrack that xiaoqing mention in comment#4 same as I post in attachment;

> Also, in the description /proc/meminfo shows <8GB of RAM, and ~8GB of swap. 
> What does the qemu memory usage look like while running the guest, with the
> base, sn1, and sn2?
no crash at sn2, crash at sn1, so post 'top' command output here:

> 
> Is there anything else running on this host (any other qemu instances,
> etc..)?
only one qemu instance run on that host, you can check output of 'top'
> 
> Thanks!
Comment 8 xu 2013-09-06 02:23:18 EDT
Created attachment 794593 [details]
top output when boot with sn2(not crash)
Comment 9 xu 2013-09-06 02:23:53 EDT
Created attachment 794594 [details]
top output when boot with sn1(crashed)
Comment 10 Jeff Cody 2013-09-06 11:37:44 EDT
Thanks for the attachments.  Unfortunately, the memory usage does not seem to add up, and is not accounted for in the buffers / cache usage, either.

I hate to ask, but could you run it again, this time starting the following command in a different terminal window prior to initiating the qemu process? Let it run throughout the process, and then terminate it (with ^C) after the process is complete:

while [ 1 ]; do ps krsz -e -o pid,vsz,rsz,comm,args=; echo -e "\nfree:"; free; echo -e "\n\n"; sleep 1; done|tee ps_output.txt

Afterwards, could you attach the ps_output.txt for the time it aborts, and when it does not abort (you may want to bzip2 ps_output.txt prior to attaching it)? Also note that command will display the commandline arguments for all processes, so if you have something sensitive on the commandline (e.g. passwords passed by argument, etc..) you may want to censor that info.

Thanks again,
Jeff
Comment 11 juzhang 2013-09-09 01:25:01 EDT
Hi Xu,

Can you have a look comment10 and give feedback?
Comment 12 xu 2013-09-09 23:01:46 EDT
(In reply to Jeff Cody from comment #10)
> Thanks for the attachments.  Unfortunately, the memory usage does not seem
> to add up, and is not accounted for in the buffers / cache usage, either.
> 
> I hate to ask, but could you run it again, this time starting the following
> command in a different terminal window prior to initiating the qemu process?
> Let it run throughout the process, and then terminate it (with ^C) after the
> process is complete:
> 
> while [ 1 ]; do ps krsz -e -o pid,vsz,rsz,comm,args=; echo -e "\nfree:";
> free; echo -e "\n\n"; sleep 1; done|tee ps_output.txt
> 
> Afterwards, could you attach the ps_output.txt for the time it aborts, and
> when it does not abort (you may want to bzip2 ps_output.txt prior to
> attaching it)? Also note that command will display the commandline arguments
> for all processes, so if you have something sensitive on the commandline
> (e.g. passwords passed by argument, etc..) you may want to censor that info.
> 
> Thanks again,
> Jeff

produce steps:

1. boot base image, then make live snapshot chain  base -> sn1 -> sn2 (not crash)

'ps' output file: ps_output_make_snapshot_chain.txt

2. shutdown guest ,then boot from sn2 (not crash)

'ps' output file: ps_output_boot_sn2.txt

3. shutdown guest, then boot from sn1 (not crash)

'ps' output file: ps_output_boot_sn1.txt

4. shutdown guest, then boot from base (crashed)

'ps' output file: ps_output_boot_base_crashed.txt

Thanks,
Xu
Comment 13 xu 2013-09-09 23:05:06 EDT
Created attachment 795811 [details]
ps command output

decompress attachment file you will see, ps output files;

ps_output.tgz.xz && tar -xzvf  ps_output.tgz
Comment 14 xu 2013-09-09 23:05:48 EDT
(In reply to xu from comment #13)
> Created attachment 795811 [details]
> ps command output
> 
> decompress attachment file you will see, ps output files;
> 
> ps_output.tgz.xz && tar -xzvf  ps_output.tgz

xz -d ps_output.tgz.xz && tar -xzvf  ps_output.tgz
Comment 15 Jeff Cody 2013-11-05 14:28:06 EST
I have been unable to reproduce this problem.  I believe the issue may be with memory allocation by autotest, and residual qemu instances, rather than a bug with qemu itself.  QEMU is aborting because it is not able to allocate the memory requested.

If you are able to still reproduce the issue outside of autotest, please reopen or create a new bz.

Note You need to log in before you can comment on or make changes to this bug.