Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1088155

Summary: qemu exec migration destination qmp monitor takes a long time to be accessed with huge mem
Product: Red Hat Enterprise Linux 6 Reporter: xhan
Component: qemu-kvmAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.6CC: bsarathy, chayang, dgilbert, juzhang, michen, mkenneth, qzhang, rbalakri, shuang, virt-bugs, virt-maint, xuhan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-21 18:13:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
debug.log
none
sosreport.tar.xz none

Description xhan 2014-04-16 07:11:32 UTC
Description of problem:

Test migration exec.gzip_exec case with autotest over RHEL.6.6 host which is with Six-Core AMD Opteron(tm) Processor 2427 and 32G memory always fails.

If decrease the cpus numbers and amount of memory for the VM, then test would can boot migration destination successfully without error.

Test records:
   2 cpus, 1024M    PASS
   8 cpus, 4096M    PASS
   8 cpus, 8192M    PASS
   8 cpus, 16384M   FAIL

This issue is hardly reproduced manually, but it always happens with autotest.
The command line for autoest execution:

python ConfigTest.py --testcase=migrate.default.exec.gzip_exec --guestname=RHEL.6.5 --clone=no --dryrun=no

Host info:
processor	: 11
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 8
model name	: Six-Core AMD Opteron(tm) Processor 2427
stepping	: 0
cpu MHz		: 800.000
cache size	: 512 KB
physical id	: 1
siblings	: 6
core id		: 5
cpu cores	: 6
apicid		: 13
initial apicid	: 13
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save pausefilter
bogomips	: 4399.62
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


Mem status:

free -m
             total       used       free     shared    buffers     cached
Mem:         32104      14329      17775          0        175      13616
-/+ buffers/cache:        537      31567
Swap:        57343         18      57325



Version-Release number of selected component (if applicable):
kernel-2.6.32-457.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.424.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. start migrate.default.exec.gzip_exec case test with autotest



Actual results:

23:01:01 INFO | [qemu output] (Process terminated with status 0)
23:01:02 ERROR| Test failed: MonitorConnectError: Could not connect to monitor 'qmpmonitor1'
23:01:02 ERROR| /home/manual_test/stable/autotest/client/tests/virt/virttest/video_maker.py:190: DeprecationWarning: gst.Bin.add_many() is deprecated, use gst.Bin.add()
23:01:02 ERROR|   pipeline.add_many(source, decoder, encoder, container, output)
23:01:21 INFO | [qemu output] (Process terminated with status 0)
23:01:22 ERROR| Exception escaping from test:
Traceback (most recent call last):
  File "/home/manual_test/stable/autotest/client/shared/test.py", line 411, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/home/manual_test/stable/autotest/client/shared/test.py", line 830, in _call_test_function
    raise error.UnhandledTestFail(e)
UnhandledTestFail: Unhandled MonitorConnectError: Could not connect to monitor 'qmpmonitor1'    [context: migrating 'virt-tests-vm1' --> creating destination VM --> creating 'virt-tests-vm1']
Traceback (most recent call last):
  File "/home/manual_test/stable/autotest/client/shared/test.py", line 823, in _call_test_function
    return func(*args, **dargs)
  File "/home/manual_test/stable/autotest/client/shared/test.py", line 291, in execute
    postprocess_profiled_run, args, dargs)
  File "/home/manual_test/stable/autotest/client/shared/test.py", line 209, in _call_run_once
    *args, **dargs)
  File "/home/manual_test/stable/autotest/client/shared/test.py", line 313, in run_once_profiling
    self.run_once(*args, **dargs)
  File "/home/manual_test/stable/autotest/client/tests/virt/virt.py", line 139, in run_once
    run_func(self, params, env)
  File "/home/manual_test/stable/autotest/client/tests/virt/qemu/tests/migration.py", line 179, in run_migration
    migration_exec_cmd_dst=mig_exec_cmd_dst)
  File "/home/manual_test/stable/autotest/client/shared/error.py", line 141, in new_fn
    return fn(*args, **kwargs)
  File "/home/manual_test/stable/autotest/client/tests/virt/virttest/qemu_vm.py", line 3063, in migrate
    migration_exec_cmd=migration_exec_cmd_dst)
  File "/home/manual_test/stable/autotest/client/shared/error.py", line 141, in new_fn
    return fn(*args, **kwargs)
  File "/home/manual_test/stable/autotest/client/tests/virt/virttest/qemu_vm.py", line 2078, in create
    monitor_name, monitor_params, timeout)
  File "/home/manual_test/stable/autotest/client/tests/virt/virttest/qemu_monitor.py", line 151, in wait_for_create_monitor
    raise MonitorConnectError(monitor_name)
MonitorConnectError: Could not connect to monitor 'qmpmonitor1'

23:01:22 ERROR| child process failed
23:01:22 INFO |                 FAIL    virt.qemu.smp_8.16384m.repeat1.Host_RHEL.6.5.qcow2.virtio_blk.up.virtio_net.RHEL.6.5.x86_64.migrate.exec.gzip_exec    virt.qemu.smp_8.16384m.repeat1.Host_RHEL.6.5.qcow2.virtio_blk.up.virtio_net.RHEL.6.5.x86_64.migrate.exec.gzip_exec  timestamp=1397660482    localtime=Apr 16 23:01:22       Unhandled MonitorConnectError: Could not connect to monitor 'qmpmonitor1'    [context: migrating 'virt-tests-vm1' --> creating destination VM --> creating 'virt-tests-vm1']
  Traceback (most recent call last):
    File "/home/manual_test/stable/autotest/client/shared/test.py", line 823, in _call_test_function
      return func(*args, **dargs)
    File "/home/manual_test/stable/autotest/client/shared/test.py", line 291, in execute
      postprocess_profiled_run, args, dargs)
    File "/home/manual_test/stable/autotest/client/shared/test.py", line 209, in _call_run_once
      *args, **dargs)
    File "/home/manual_test/stable/autotest/client/shared/test.py", line 313, in run_once_profiling
      self.run_once(*args, **dargs)
    File "/home/manual_test/stable/autotest/client/tests/virt/virt.py", line 139, in run_once
      run_func(self, params, env)
    File "/home/manual_test/stable/autotest/client/tests/virt/qemu/tests/migration.py", line 179, in run_migration
      migration_exec_cmd_dst=mig_exec_cmd_dst)
    File "/home/manual_test/stable/autotest/client/shared/error.py", line 141, in new_fn
      return fn(*args, **kwargs)
    File "/home/manual_test/stable/autotest/client/tests/virt/virttest/qemu_vm.py", line 3063, in migrate
      migration_exec_cmd=migration_exec_cmd_dst)
    File "/home/manual_test/stable/autotest/client/shared/error.py", line 141, in new_fn
      return fn(*args, **kwargs)
    File "/home/manual_test/stable/autotest/client/tests/virt/virttest/qemu_vm.py", line 2078, in create
      monitor_name, monitor_params, timeout)
    File "/home/manual_test/stable/autotest/client/tests/virt/virttest/qemu_monitor.py", line 151, in wait_for_create_monitor
      raise MonitorConnectError(monitor_name)
  MonitorConnectError: Could not connect to monitor 'qmpmonitor1'
  
23:01:22 INFO |         END FAIL        virt.qemu.smp_8.16384m.repeat1.Host_RHEL.6.5.qcow2.virtio_blk.up.virtio_net.RHEL.6.5.x86_64.migrate.exec.gzip_exec    virt.qemu.smp_8.16384m.repeat1.Host_RHEL.6.5.qcow2.virtio_blk.up.virtio_net.RHEL.6.5.x86_64.migrate.exec.gzip_exec  timestamp=1397660482    localtime=Apr 16 23:01:22
23:01:23 INFO | END GOOD        ----    ----    timestamp=1397660483    localtime=Apr 16 23:01:23


Expected results:
The case should pass.

Additional info:

Comment 3 Dr. David Alan Gilbert 2014-04-17 15:26:16 UTC
Are there any more useful logs ? virt-test normally has the qemu command line and other errors in it's debug.log.
Also is there anything in the dmesg of the host - e.g. an out-of-memory or anything like that?

Comment 4 xhan 2014-04-18 01:36:22 UTC
The qemu command line:

qemu \
    -S  \
    -name 'virt-tests-vm1' \
    -M rhel6.5.0  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device intel-hda,bus=pci.0,addr=03 \
    -device hda-duplex  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140417-095106-3izSEl9f,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140417-095106-3izSEl9f,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04  \
    -chardev socket,id=devvs,path=/tmp/virtio_port-vs-20140417-095106-3izSEl9f,server,nowait \
    -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0  \
    -chardev socket,id=seabioslog_id_20140417-095106-3izSEl9f,path=/tmp/seabios-20140417-095106-3izSEl9f,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140417-095106-3izSEl9f,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=05 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/home/manual_test/stable/autotest/client/tests/virt/shared/data/images/RHEL-Server-6.5-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=06 \
    -device virtio-net-pci,mac=9a:76:77:78:79:7a,id=idvNSnUW,netdev=idLpgLMx,bus=pci.0,addr=07  \
    -netdev tap,id=idLpgLMx,vhost=on,vhostfd=36,fd=35  \
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3' \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3001,password=123456,addr=0,tls-port=3201,x509-dir=/tmp/spice_x509d,tls-channel=main,tls-channel=inputs,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off  \
    -no-kvm-pit-reinjection \
    -enable-kvm \
    -incoming "exec:gzip \
    -c \
    -d /var/tmp/exec-W8BLLlhy"

Comment 5 xhan 2014-04-18 01:37:38 UTC
Created attachment 887384 [details]
debug.log

There is not output of dmesg. 
The attachment is the debug log for reference.

Comment 8 xhan 2014-04-18 10:52:02 UTC
This problem can be reproduced manually. I describe it again below.

qemu exec migration destination qmp monitor takes a long time to be accessed with huge mem

Version-Release number of selected component (if applicable):
kernel-2.6.32-457.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.424.el6.x86_64

How reproducible:
always


1. Run stress on the host:
 stress --cpu 8 --io 4 --vm 4 --vm-bytes 6G --timeout 999999s
2. Launch source VM
/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/virt/qemu/qemu \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -monitor stdio \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/virt/shared/data/images/RHEL-Server-6.5-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=06 \
    -device virtio-net-pci,mac=9a:4e:4f:50:51:52,id=idAxWBxM,netdev=idaHwbcU,bus=pci.0,addr=07  \
    -netdev tap,id=idaHwbcU,vhost=on,script=/usr/local/staf/test/RHEV/kvm-new/autotest/client/virt/scripts/qemu-ifup-switch \
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3' \
    -vnc :0 \
    -enable-kvm \

3. execute migration in src vm monitor
   (qemu) migrate -d "exec:gzip -c > mig.file"
4. launch migration dest vm
   /usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/virt/qemu/qemu \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor2,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -monitor stdio \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/virt/shared/data/images/RHEL-Server-6.5-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=06 \
    -device virtio-net-pci,mac=9a:4e:4f:50:51:52,id=idAxWBxM,netdev=idaHwbcU,bus=pci.0,addr=07  \
    -netdev tap,id=idaHwbcU,vhost=on,script=/usr/local/staf/test/RHEV/kvm-new/autotest/client/virt/scripts/qemu-ifup-switch \
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3' \
    -vnc :1 \
    -enable-kvm \
    -incoming "exec:gzip \
    -c \
    -d mig.file"
     
5. access dst vm qmp monitor
   nc -U /tmp/monitor-qmpmonitor2
 
 
Actual results:
   qmp monitor could not be access in expected duration, e.g. 20s
 
Expected results:
   qmp monitor should be accessed in expected duration, e.g. 20s
 
Additional info:
     
    1. Host info:
    processor       : 11
    vendor_id       : AuthenticAMD
    cpu family      : 16
    model           : 8
    model name      : Six-Core AMD Opteron(tm) Processor 2427
    stepping        : 0
    cpu MHz         : 800.000
    cache size      : 512 KB
    physical id     : 1
    siblings        : 6
    core id         : 5
    cpu cores       : 6
    apicid          : 13
    initial apicid  : 13
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 5
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save pausefilter
    bogomips        : 4399.62
    TLB size        : 1024 4K pages
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate
     
    2. mem info
    [root@amd-2427-32-2 ~]# free -m
                 total       used       free     shared    buffers     cached
    Mem:         32104      24999       7105          0          5         31
    -/+ buffers/cache:      24962       7142
    Swap:        57343       4073      53270
    [root@amd-2427-32-2 ~]# free -m
                 total       used       free     shared    buffers     cached
    Mem:         32104      23885       8219          0          5         25
    -/+ buffers/cache:      23854       8250
    Swap:        57343       4759      52584
    [root@amd-2427-32-2 ~]# free -m
                 total       used       free     shared    buffers     cached
    Mem:         32104      26076       6028          0          5         27
    -/+ buffers/cache:      26043       6061
    Swap:        57343       4758      52585
    [root@amd-2427-32-2 ~]# free -m
                 total       used       free     shared    buffers     cached
    Mem:         32104      18658      13446          0          5         27
    -/+ buffers/cache:      18624      13480
    Swap:        57343       4758      52585

Comment 9 xhan 2014-04-18 10:53:51 UTC
Created attachment 887527 [details]
sosreport.tar.xz

Attach the sosreport.tar.xz

Comment 10 Dr. David Alan Gilbert 2014-04-22 08:23:31 UTC
Comment 8's test doesn't seem to be a fair test; the 'stress' is using a lot of CPU and a reasonable amount of RAM, so it's not surprising it's slow.  'free' seems to be showing it's using quite a bit of swap, so again that will make it very slow.

1) Does the problem happen on a RHEL7 host?

2) The SOS report from comment 9 showes qemu crashes/core files; can you provide a copy of these core files or a backtrace of them stating which tests you were running during that time that the core was created:

Apr 17 22:28:05 amd-2427-32-2 AutotestCrashHandler: Writing core files to ['/home/manual_test/stable/autotest/client/results/default/virt.qemu.repeat1.smp_8.16384m.Host_RHEL.6.5.spice.qcow2.virtio_blk.up.virtio_net.RHEL.6.5.x86_64.unittest/debug/crash.qemu.25663']

3) Rerun test shown in original comment but before it starts run the command
   vmstate 5
it will produce some information every 5 seconds - please capture this information for the whole of the test run.

Dave

Comment 11 xhan 2014-04-23 10:58:47 UTC
1) I would test RHEL7 on the same host and paste the result later
2) The crash is not related to this case. It is for unittest case.
3) vmstat result 
vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
14  0  86772 5555012  67616 7732964    1    8   138   215    1    1  3  1 95  1  0	
14  0  86772 10342464  67616 7732964    0    0     0     0 12232 4050 61 38  1  0  0	
14  0  86772 16946312  67624 7732968    0    0    13     4 12645 8010 59 38  3  0  0	
15  0  86772 9575516  67624 7732968    0    0     0     0 12147 4820 57 40  3  0  0	
15  0  86772 11933160  67632 7732968    0    0     0     3 12272 4782 56 41  3  0  0	
14  0  86772 10864108  67632 7732968    0    0     0     0 12333 4554 59 41  0  0  0	
14  0  86772 12207928  67632 7732968    0    0     0     0 12338 4372 58 42  0  0  0	
15  0  86772 7566284  67632 7732968    0    0     0     0 12338 4449 58 42  0  0  0	
14  0  86772 7988424  67632 7732968    0    0     0     0 12329 4547 58 42  0  0  0	
14  0  86772 10355852  67632 7732968    0    0     0     0 12327 5609 60 40  0  0  0	
14  0  86772 9869864  67632 7732968    0    0    13     0 12615 5815 61 39  0  0  0	
14  0  86772 16282644  67632 7732968    0    0     0     0 12461 4436 62 38  0  0  0	
14  0  86772 10630492  67632 7732968    0    0     0     0 12454 4455 63 37  0  0  0	
14  0  86772 9757984  67632 7732968    0    0     0     0 12465 4449 62 38  0  0  0	
14  0  86772 9082580  67632 7732968    0    0     0     0 12451 4426 62 38  0  0  0	
15  0  86772 4386028  67632 7732968    0    0     0     0 12483 4543 63 36  1  0  0	
18  0  86772 16334580  67640 7732968    0    0     0     3 12387 4407 61 38  1  0  0

Comment 12 Dr. David Alan Gilbert 2014-04-24 13:46:55 UTC
If I try something similar I get very different looking vmstat results:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 1  3 174880 22412996      0 1628560    1    0     1 51488 1437  110 12  1 79  8  0
 2  0 174880 22266488      0 1766392    0    0   547 61575 1573  262  9  3 74 14  0
 1  0 174852 13701300      0 1772376  202    0  1361 23834 1404  912  5  6 85  4  0
 1  0 174852 6239080      0 1772388    0    0     0     4 1090  152  7  5 87  0  0
 0  2 174852 1444784      0 1755636    0    0     0 33502 1087  195  5  4 84  7  0
 0  1 174852 680204      0 1224016    0    0     0 73031  973  328  1  3 81 16  0
 1  5 439100 140552      0 848416  286 53012   330 92345 1634  577  1  3 65 32  0
 0  2 1001524 195132      0 663220   32 112501   182 121563 1368  524  0  2 79 19  0
 1  0 1638268 198492      0 332036   10 127362    11 127398 1859  922  1  4 76 19  0
 0  2 2294908 178736      0 163700    2 131334     2 131337 1594  738  1  4 86 10  0
 1  3 2949280 185840      0   2572   34 132481    59 132483 1755  706  0  5 76 18  0
 2  4 3585968 139880      0   2600   22 125831    90 125847 2814 1212  0 17 72 11  0
 1  3 4238060 169748      0   2180    1 130347    70 130351 2953 1187  0 18 58 24  0
 1  1 4884180 164588      0   3008    6 129227     6 129227 2889 1127  0 18 72 10  0
 1  3 5465944 141124      0   2556  128 116363   258 116368 2838 1186  0 18 66 16  0
 1  3 5949520 141380      0   3692  268 96811   446 96821 2822 1042  0 17 67 16  0
 0  4 6462316 140336      0   5480    8 102599   514 102599 2381  799  0 11 69 20  0
 0  4 7012308 140268      0   5592  266 110110   486 110110 1929  418  0  5 62 33  0
 0  2 7557772 183720      0   6184  388 109202   591 109208 2149  346  0  5 70 25  0
 1  2 8036568 204604      0   9332  481 95951   998 95951 2223  698  0  6 75 19  0
 0  2 8363644 377896      0  11868  423 65537  2477 65545 1594  861  0  6 71 23  0
 0  0 8362900 358228      0  22516  403    0  2554     2  278  474  0  0 95  4  0
 0  0 8362900 358760      0  22516    0    0     0     0   97  199  0  0 100  0  0
 0  1 209224 23624664      0 470908 2845    0 48701    29 1515 1518  1  4 86  9  0
 0  1 209224 22669972      0 1415268   11    0 94471 18026 2025 1595  1  2 87 10  0
 0  2 178592 22517564      0 1581904 4234    0 21058 71110 1846  788  0  1 79 20  0
 0  0 178588 22518336      0 1581916    1    0     1 65788 1174  148  0  0 85 14  0
 0  0 178564 22518380      0 1582624    7    0   150     0   41   70  0  0 100  0  0


in those you can see it using a lot of swap, and there being a lot less free memory during the test.

Comment 13 Dr. David Alan Gilbert 2014-04-24 14:53:56 UTC
OK forget comment #12 - I understand I think why your vmstate isn't as bad as mine (I was using much more memory); still I'd like to know what you see on rhel7 (I think it will work), But I don't currently understand why this is happening:

  1) I thought it was a problem getting into swap - but it's not from your vmstat, and since you're only running half of all memory you should be OK without swap

  2) I thought it might be the time taken by the gzip compressing the data and the whole test timing out - but looking at the log file the actual migration has completed and the timeout is in the response of the QMP on the destination.

Comment 14 xhan 2014-04-25 09:03:24 UTC
1) 
for the first item in comment #11:
This machine is with LSI SAS1068E controller could not install RHEL7. I could not verify this issue in time. 
After I find one proper machine, I would update as soon. 

2)
The same time, paste the more vmstat data to see if it help you.

vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
13  0      0 18135776   1692  15856    0    0     1     5   30   52  0  0 100  0  0	
13  1      0 11893600   4552  26812    0    0  2806    34 63006 111954 57 37  6  1  0
15  0      0 20977100   5244  72112    0    0  9127     4 50280 148887 63 37  0  0  0
15  0      0 21085216   5244  77184    0    0  1050     0 56356 127965 63 37  0  0  0
15  0      0 17025412   5244  78976    0    0   358     0 55102 123056 63 37  0  0  0
14  0      0 22979644   5244  80768    0    0   358     0 50739 92749 62 38  0  0  0	
14  0      0 18961784   5244  82560    0    0   358     0 65050 134604 60 40  0  0  0
14  0      0 14542212   5244 130536    0    0  9677     0 50020 139625 62 38  0  0  0
14  0      0 20332268   5244 189744    0    0 11747     0 40875 200698 63 37  0  0  0
13  0      0 22473412   5244 189744    0    0     0     0 16490 234301 60 40  0  0  0
15  0      0 13195864   5244 189744    0    0     0     0 29798 212481 60 39  1  0  0
14  0      0 21593392   5256 189744    0    0    13     5 52339 223964 59 40  1  0  0
13  0      0 23558524   5256 189744    0    0     0     0 60888 222610 59 40  1  0  0
13  0      0 11281780   5256 189744    0    0     0     0 60607 221468 59 40  1  0  0
14  0      0 19971672   5268 189744    0    0     0     4 58802 211733 59 40  1  0  0
14  0      0 22031304   5292 189744    0    0    26    13 45420 197057 60 39  1  0  0
13  0      0 12413640   5308 189744    0    0     0     9 44629 167640 60 39  1  0  0
13  0      0 18648304   5308 189744    0    0     0     0 43551 199735 60 39  1  0  0
13  0      0 25181500   5308 189744    0    0     0     0 44836 217363 60 39  1  0  0
14  0      0 8951264   5308 189744    0    0     0     0 36832 223811 60 39  1  0  0	
14  0      0 17260380   5308 189744    0    0     0     0 29683 210714 60 39  1  0  0
14  0      0 26100216   5308 189744    0    0     0     0 36656 222195 60 39  1  0  0

Comment 17 Xu Han 2014-08-15 11:57:55 UTC
I could reproduce this issue with following test:
-------------------------------------------------
1. Launch a VM with '-m 16384'

2. Do migration
    (qemu) migrate -d "exec:gzip -c > mig.file"

3. Launch the destination VM with below script
    # cat do_test.sh 
    #!/bin/bash
    bash qemu_cmd_line.txt &

    sleep 1
    qemu_pid=`pidof qemu-kvm`
    gzip_pid=`pidof gzip`

    top -bS -d 1 -p $qemu_pid,$gzip_pid > top.log &
    top_pid=$!

    time nc -U /tmp/monitor-qmpmonitor1-20140815-014653-7hWSzDsO << EOF
    {"execute": "qmp_capabilities"}
    {"execute": "quit"}
    EOF

    kill -9 $top_pid

Result:
Would got response from QMP after ~22s:
# ./do_test.sh 
{"QMP": {"version": {"qemu": {"micro": 1, "minor": 12, "major": 0}, "package": "(qemu-kvm-rhev-0.12.1.2-2.435.el6)"}, "capabilities": []}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1408130391, "microseconds": 227388}, "event": "SHUTDOWN"}
{"timestamp": {"seconds": 1408130391, "microseconds": 249939}, "event": "DEVICE_TRAY_MOVED", "data": {"device": "drive_cd1", "tray-open": true}}
/etc/qemu-ifdown: could not launch network script

real	0m22.973s
user	0m0.000s
sys	0m0.003s


I also observed that the duration of gzip subprocess would be around 20s
# grep -i pid -A 2 top.log    (the interval between each entry is 1s)
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16749 root      20   0 17.1g 160m 5656 R 83.8  0.5   0:00.83 qemu-kvm
16771 root      20   0  4432  560  392 S 25.9  0.0   0:00.71 gzip

...

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16771 root      20   0  4432  564  392 R 88.8  0.0   0:09.11 gzip
16749 root      20   0 17.1g 849m 5660 S 42.9  2.6   0:16.47 qemu-kvm
--
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16749 root      20   0 17.1g 857m 5748 R 57.9  2.7   0:26.49 qemu-kvm

So, the monitor may just wait for migration finish.

Additional info:
----------------
Memory file size:
# du -h mig.file 
214M	 mig.file

QEMU command line:
# cat qemu_cmd_line.txt 
/usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1' \
    -M rhel6.6.0  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device AC97,bus=pci.0,addr=03  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140815-014653-7hWSzDsO,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140815-014653-7hWSzDsO,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20140815-014653-7hWSzDsO,path=/tmp/seabios-20140815-014653-7hWSzDsO,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140815-014653-7hWSzDsO,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/root/autotest-devel/client/tests/virt/shared/data/images/win2012-64r2-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=05 \
    -device virtio-net-pci,mac=9a:70:71:72:73:74,id=idUtm6cI,vectors=4,netdev=idN5msFQ,bus=pci.0,addr=06  \
    -netdev tap,id=idN5msFQ,vhost=on \
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3',hv_relaxed \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file=/root/autotest-devel/client/tests/virt/shared/data/isos/windows/winutils.iso \
    -device ide-drive,id=cd1,drive=drive_cd1,bootindex=1,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3000,disable-ticketing \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off \
    -enable-kvm \
    -incoming "exec:gzip \
    -c \
    -d mig.file"

And the host is which used in comment 0.

Comment 19 Ademar Reis 2015-01-21 18:13:40 UTC
The analysis so far indicate that the monitor is waiting for the migration to finish (or at least the unziping of it), so a larger timeout should

I'm closing it as WONTFIX in RHEL6 given it's a hard to reproduce corner case that we see as expected behavior, but it would be nice if you could try the same scenario on RHEL7 and open a new BZ for it if it can be reproduced.