Bug 979411 - virsh migration copy-storage-all fails with "Unable to read from monitor: Connection reset by peer"
virsh migration copy-storage-all fails with "Unable to read from monitor: Con...
Status: CLOSED UPSTREAM
Product: Virtualization Tools
Classification: Community
Component: libvirt (Show other bugs)
unspecified
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Michal Privoznik
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-28 09:15 EDT by chandrashekar shastri
Modified: 2014-12-09 11:04 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-12-09 11:04:47 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Source-migrate-logs (698.16 KB, text/plain)
2013-06-28 09:15 EDT, chandrashekar shastri
no flags Details
Source Logs (49.51 KB, application/x-bzip2)
2013-06-28 09:59 EDT, chandrashekar shastri
no flags Details
Destination Logs (10.10 KB, application/x-bzip2)
2013-06-28 09:59 EDT, chandrashekar shastri
no flags Details
source sos (3.59 MB, application/x-xz)
2013-07-01 06:05 EDT, chandrashekar shastri
no flags Details
Destination sos (3.03 MB, application/x-xz)
2013-07-01 06:14 EDT, chandrashekar shastri
no flags Details
source libvirtd log (1.66 MB, text/plain)
2013-07-01 07:36 EDT, chandrashekar shastri
no flags Details
Destination libvird logs (3.36 MB, text/plain)
2013-07-01 07:38 EDT, chandrashekar shastri
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1192499 None None None Never

  None (edit)
Description chandrashekar shastri 2013-06-28 09:15:38 EDT
Created attachment 766573 [details]
Source-migrate-logs

Description of problem:

virsh migration copy-storage-all fails with "Unable to read from monitor: Connection reset by peer" and shut downs the guest on the source host.

Kernel Version: 3.10.0-rc5+

Libvirt Version: 1.0.6

Qemu Version: 1.5.50

Steps to Reproduce:

1. Created the qemu-img create -f qcow2 vm.qcow2 11G on the destination host which is same as the source.
2. Started the guest on the source
3. Started the vncdisplay to monitor the guest
4. Initiated the migration "virsh migrate --live VM1 qemu+ssh://host-ip/system tcp://host-ip --verbose --copy-storage-all"
5. It started the copying the storage from souce to destination (conitinously monitored it was growing)
6. Guest on the destination was paused and was running on the source
7. At some point the VM on the source got shutdown and migration failed with "Unable to read from monitor: Connection reset by peer"

Attached the libvirt debug logs.

The debug logs shows :

2013-06-19 08:43:12.253+0000: 4026: debug : virEventPollInterruptLocked:716 : Interrupting
2013-06-19 08:43:12.253+0000: 4026: debug : virEventPollAddTimeout:248 : EVENT_POLL_ADD_TIMEOUT: timer=1 frequency=0 cb=0x7fe930baa960 opaque=(nil) ff=(nil)

Note: The virsh live migration works fine with nfs storage from source to destination and vice versa.
With libvirt 1.0.5 and qemu 1.5 also we were facing the same issue, but with that even "Live migration with nfs also was not working".

Guest XML:
----------------

<domain type='kvm'>
  <name>VM1</name>
  <uuid>47feb0e1-0c23-9be9-da12-2ead34864de2</uuid>
  <memory unit='KiB'>4096000</memory>
  <currentMemory unit='KiB'>2048000</currentMemory>
  <vcpu placement='auto'>1</vcpu>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/home/images/VM1.qcow2'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='network'>
      <mac address='52:54:00:9d:cf:bb'/>
      <source network='default'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='none' model='selinux'/>
</domain>
Comment 1 Michal Privoznik 2013-06-28 09:27:20 EDT
Can you provide the destination debug logs? Esp. content of /var/log/libvirt/qemu/VM1.log as there's supposed to be the reason why qemu died.
Comment 2 chandrashekar shastri 2013-06-28 09:59:18 EDT
Created attachment 766578 [details]
Source Logs
Comment 3 chandrashekar shastri 2013-06-28 09:59:51 EDT
Created attachment 766579 [details]
Destination Logs
Comment 4 Michal Privoznik 2013-06-28 10:20:28 EDT
From the VM1_dest.log:

...
Completed 97 %
Completed 98 %
Completed 99 %
qemu: warning: error while loading state section id 1
load of migration failed

So the qemu is unable to migrate itself. Therefore I think this is actually a qemu bug.
On the other hand, I wonder why the guest on the source is shut down. There's no sign of that in the logs.
Comment 5 chandrashekar shastri 2013-07-01 06:02:08 EDT
When the migration fails the guest gets shutdown on the source.

[root@9 images]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 5     VM1                     running

[root@9 images]# virsh migrate --live rhel64-64 qemu+ssh:/IP/system tcp://IP --verbose --copy-storage-all
 
Migration: [ 93 %]error: Unable to read from monitor: Connection reset by peer


At the destination:

[root@9 images]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     VM1                      shut off

[root@destination]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 16    VM1                      paused

[root@destination]# virsh list --all
 Id    Name                           State
----------------------------------------------------
Comment 6 chandrashekar shastri 2013-07-01 06:02:48 EDT
Attached the SOS report of the Source and Destination machines.
Comment 7 chandrashekar shastri 2013-07-01 06:05:36 EDT
Created attachment 767310 [details]
source sos
Comment 8 chandrashekar shastri 2013-07-01 06:14:21 EDT
Created attachment 767311 [details]
Destination sos
Comment 9 Michal Privoznik 2013-07-01 06:29:44 EDT
Unfortunately, the libvirtd.log is missing. I've written some guide as well:

http://wiki.libvirt.org/page/DebugLogs

Please set the correct debug logs, reproduce and attach the new reports.
Comment 10 chandrashekar shastri 2013-07-01 07:34:34 EDT
Attached libvirtd logs of both Source and Destination
Comment 11 chandrashekar shastri 2013-07-01 07:36:13 EDT
Created attachment 767340 [details]
source libvirtd log
Comment 12 chandrashekar shastri 2013-07-01 07:38:53 EDT
Created attachment 767342 [details]
Destination libvird logs
Comment 13 Michal Privoznik 2013-07-01 08:19:24 EDT
From the *source* libvirtd log:

2013-07-01 11:30:29.582+0000: 3164: debug : virObjectRef:297 : OBJECT_REF: obj=0x7fe97000cb00
2013-07-01 11:30:29.582+0000: 3164: error : qemuMonitorIORead:511 : Unable to read from monitor: Connection reset by peer
2013-07-01 11:30:29.582+0000: 3164: debug : qemuMonitorIO:644 : Error on monitor Unable to read from monitor: Connection reset by peer
2013-07-01 11:30:29.582+0000: 3164: debug : virObjectUnref:260 : OBJECT_UNREF: obj=0x7fe97000cb00
2013-07-01 11:30:29.582+0000: 3165: debug : qemuMonitorSend:905 : Send command resulted in error Unable to read from monitor: Connection reset by peer
2013-07-01 11:30:29.582+0000: 3164: debug : qemuMonitorIO:678 : Triggering error callback
2013-07-01 11:30:29.582+0000: 3164: debug : qemuProcessHandleMonitorError:351 : Received error on 0x7fe9881337f0 'rhel64-64'

This means, qemu died unexpectedly. The qemu error message should be in /var/log/libvirt/qemu/rhel64-64.log on the source.
Comment 14 Michal Privoznik 2014-12-09 11:04:47 EST
Since this is a qemu bug (probably fixed already) I'm closing this one. If you disagree please reopen.

Note You need to log in before you can comment on or make changes to this bug.