Bug 1003818

Summary: qemu refuse to migrate when host ram over committed : error while loading state for instance 0x0 of device 'ram'
Product: Red Hat Enterprise Linux 7 Reporter: Xiaoqing Wei <xwei>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: hhuang, juzhang, michen, qzhang, rbalakri, shuang, virt-maint, xwei
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-09 17:59:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xiaoqing Wei 2013-09-03 09:43:11 UTC
Description of problem:

qemu refuse to migrate when host ram over committed : error while loading state for instance 0x0 of device 'ram'

Version-Release number of selected component (if applicable):
kernel-3.10.0-12.el7.x86_64
qemu-kvm-1.5.3-2.el7.x86_64
spice-server-0.12.4-1.el7.x86_64
seabios-bin-1.7.2.2-2.el7.noarch
seavgabios-bin-1.7.2.2-2.el7.noarch
sgabios-bin-0.20110622svn-3.el7.noarch
ipxe-roms-qemu-20130517-1.gitc4bce43.el7.noarch

How reproducible:
8/10

Steps to Reproduce:
1.on a 8GB Ram host boot a 4GB guest(src),
2.then boot a incoming vm for migrate(dest) -incoming tcp:0:5200
3.on src: migrate -d tcp:0:5200

Actual results:
dest:
QEMU 1.5.3 monitor - type 'help' for more information
(qemu) info status
VM status: paused (inmigrate)
(qemu) qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed


Expected results:
migrate success even host has little RAM over commitment.

Additional info:

1) host has swap enabled
[root@dhcp-8-232 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7607       6283       1324          0         42       2106
-/+ buffers/cache:       4133       3474
Swap:         7951       7929         22


2) same test can pass on rhel6 host

Comment 3 Juan Quintela 2015-01-07 10:29:32 UTC
It is not little RAM overcommit.  You are running 2 4GB Guests + the host OS on the same host, that is quite a bit of overcommit.  just do a "free" after launching the source guest.  As if it works on RHEL6.6 host, it uses memory diferently (it appears differently enough to make this work).  Notice from the free command that you have got without swap, and without swap you just have used too much RAM.

Could you do a ps -auxm while migration is happening?  It appears that there is a memory leak somewhere.

Comment 4 juzhang 2015-01-08 00:03:31 UTC
Hi Xwei,

Could you review comment3?

Best Regards,
Junyi

Comment 5 Xiaoqing Wei 2015-01-08 06:55:15 UTC
(In reply to Juan Quintela from comment #3)
> 
> Could you do a ps -auxm while migration is happening?  It appears that there
> is a memory leak somewhere.

Will do, but currently that origin host is doing Windows timer device tests,

I would setup testbed for this BZ later.

Best Regards,
Xiaoqing Wei.

Comment 6 Xiaoqing Wei 2015-01-15 05:03:59 UTC
Hmmm, tried several rounds of migration, no luck to reproduce....

I searched my test run history, it was happen in a test loop which has hundreds of cases, I will re-submit one to see if reproducible now.


Regards,
Xiaoqing.

Comment 8 Juan Quintela 2015-09-09 17:59:29 UTC
It hasn't been reproducible since February, and reporter didn't send a new submission.