1889131 – Do migration with copy storage->cancel migration at memory copy phase -> do migration with copy storage again, failed

Bug 1889131 - Do migration with copy storage->cancel migration at memory copy phase -> do migration with copy storage again, failed

Summary: Do migration with copy storage->cancel migration at memory copy phase -> do m...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.3
Assignee:	Peter Krempa
QA Contact:	Han Han
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-18 04:19 UTC by Fangge Jin
Modified:	2021-05-25 06:44 UTC (History)
CC List:	16 users (show)
Fixed In Version:	libvirt-7.0.0-1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-25 06:43:38 UTC
Type:	Bug
Target Upstream Version:	7.0.0
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
logs (286.75 KB, application/x-bzip) 2020-10-18 04:19 UTC, Fangge Jin	no flags	Details
vm xml (7.49 KB, text/plain) 2020-10-18 04:19 UTC, Fangge Jin	no flags	Details
VM xml, libvirtd log, qmp log (13.79 KB, application/gzip) 2021-01-20 01:49 UTC, Han Han	no flags	Details
View All

Description Fangge Jin 2020-10-18 04:19:21 UTC

Created attachment 1722434 [details]
logs

Description of problem:
Do migration with copy storage->cancel migration at memory copy phase -> do migration with copy storage again, migration failed


Version-Release number of selected component (if applicable):
libvirt-6.6.0-7.module+el8.3.0+8424+5ea525c5.x86_64
qemu-kvm-5.1.0-13.module+el8.3.0+8424+5ea525c5.x86_64

How reproducible:
100%

Steps to Reproduce:
0. Start a vm with local storage, and create a vm image on dest host with same size as src vm image:
On src:
# virsh start avocado-vt-vm1
# virsh domblklist avocado-vt-vm1
 Target   Source
------------------------------------------------------------------------
 vda      /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2


On dest:
# qemu-img create -f qcow2 /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 10G

1.Do migration with copy storage, and cancel it in memory copy phase:
[root@libvirt-rhel-8 ~]# virsh migrate avocado-vt-vm1 qemu+ssh://10.0.139.240/system --live --p2p --verbose --copy-storage-inc
Migration: [ 69 %]error: operation aborted: migration out job: canceled by client

2.Do migration again:
[root@libvirt-rhel-8 ~]# virsh migrate avocado-vt-vm1 qemu+ssh://10.0.139.240/system --live --p2p --verbose --copy-storage-inc
Migration: [ 35 %]error: internal error: qemu unexpectedly closed the monitor: 2020-10-18T03:51:03.805809Z qemu-kvm: -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
2020-10-18T03:51:04.984709Z qemu-kvm: check_section_footer: Read section footer failed: -5
2020-10-18T03:51:04.985272Z qemu-kvm: load of migration failed: Invalid argument


Actual results:


Expected results:


Additional info:
1. virsh event output:
Src:
2020-10-18 03:50:36.157+0000: event 'agent-lifecycle' for domain avocado-vt-vm1: state: 'disconnected' reason: 'domain started'
2020-10-18 03:50:36.558+0000: event 'lifecycle' for domain avocado-vt-vm1: Resumed Unpaused
2020-10-18 03:50:36.558+0000: event 'lifecycle' for domain avocado-vt-vm1: Started Booted
2020-10-18 03:50:56.409+0000: event 'block-job' for domain avocado-vt-vm1: Block Copy for /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 ready
2020-10-18 03:50:56.409+0000: event 'block-job-2' for domain avocado-vt-vm1: Block Copy for vda ready
2020-10-18 03:50:56.427+0000: event 'migration-iteration' for domain avocado-vt-vm1: iteration: '1'
2020-10-18 03:50:56.621+0000: event 'block-job' for domain avocado-vt-vm1: Block Copy for /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 failed
2020-10-18 03:50:56.621+0000: event 'block-job-2' for domain avocado-vt-vm1: Block Copy for vda failed

2020-10-18 03:51:04.774+0000: event 'migration-iteration' for domain avocado-vt-vm1: iteration: '1'
2020-10-18 03:51:05.084+0000: event 'migration-iteration' for domain avocado-vt-vm1: iteration: '2'
2020-10-18 03:51:05.085+0000: event 'lifecycle' for domain avocado-vt-vm1: Suspended Migrated
2020-10-18 03:51:05.300+0000: event 'block-job' for domain avocado-vt-vm1: Block Copy for /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 failed
2020-10-18 03:51:05.300+0000: event 'block-job-2' for domain avocado-vt-vm1: Block Copy for vda failed
2020-10-18 03:51:05.341+0000: event 'lifecycle' for domain avocado-vt-vm1: Resumed Unpaused
2020-10-18 03:51:15.736+0000: event 'agent-lifecycle' for domain avocado-vt-vm1: state: 'connected' reason: 'channel event'

Dest:
2020-10-18 03:50:42.473+0000: event 'agent-lifecycle' for domain avocado-vt-vm1: state: 'disconnected' reason: 'domain started'
2020-10-18 03:50:42.915+0000: event 'lifecycle' for domain avocado-vt-vm1: Started Migrated
2020-10-18 03:50:56.454+0000: event 'lifecycle' for domain avocado-vt-vm1: Stopped Failed
2020-10-18 03:51:03.832+0000: event 'agent-lifecycle' for domain avocado-vt-vm1: state: 'disconnected' reason: 'domain started'
2020-10-18 03:51:04.284+0000: event 'lifecycle' for domain avocado-vt-vm1: Started Migrated
2020-10-18 03:51:05.037+0000: event 'control-error' for domain avocado-vt-vm1
2020-10-18 03:51:05.260+0000: event 'lifecycle' for domain avocado-vt-vm1: Stopped Failed

2. Not sure whether it has same root cause as Bug 1459801 - nbd/server.c:nbd_receive_request():L706: read failed when do migration_cancel

Comment 1 Fangge Jin 2020-10-18 04:19:46 UTC

Created attachment 1722435 [details]
vm xml

Comment 2 Li Xiaohui 2020-10-20 03:30:35 UTC

Hi, 
I have done some analyse from provided libvirtd-src.log:
this bz is about storage vm migration, including copy storage and block related operations(blockdev-add and blockdev-mirror).

Cong, could you help mark this right QA contact and then reproduce this bz, thanks.

Comment 5 Gu Nini 2020-10-21 04:04:18 UTC

The bug could be reproduced with live migration steps, i.e. using shared storage instead of combining any blockdev-add/mirror steps. Following are the detailed steps:

1. Boot up the vm in source host:

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pseries,max-cpu-compat=power8  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2 \
    -m 1024  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'host' \
    -chardev socket,id=qmp_id_qmpmonitor1,server,nowait,path=/var/tmp/avocado_1  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=chardev_serial0,server,nowait,path=/var/tmp/avocado_2 \
    -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/ngu/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device virtio-net-pci,mac=9a:fa:14:5e:3d:15,id=idLviSMj,netdev=idlrY22o,bus=pci.0,addr=0x5  \
    -netdev tap,id=idlrY22o,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \

2. Boot up the destination qemu process using the same qemu command lines as the soure one but appending '-incoming defer' at the end
3. Set migration capabilities in both source and destination host:

SRC: {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"capability": "pause-before-switchover", "state": true}]}}
DST: {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"capability": "late-block-activate", "state": true}]}}

4. Set incoming port in destination host:
DST: {"execute": "migrate-incoming","arguments": {"uri": "tcp:[::]:5200"}}

5. Migrate the vm to destination host from source host and wait until it reach "pre-switchover" status:

SRC: {"execute": "migrate", "arguments": {"uri": "tcp:10.16.200.238:5200"}, "id": "DXcuz3D7"}
SRC: {"execute": "query-migrate"}
{"return": {"expected-downtime": 305, "status": "pre-switchover", "setup-time": 27, "total-time": 387578, "ram": {"total": 1090846720, "postcopy-requests": 0, "dirty-sync-count": 111, "multifd-bytes": 0, "pages-per-second": 34539, "page-size": 4096, "remaining": 27930624, "mbps": 934.542353, "transferred": 10266041346, "duplicate": 614439, "dirty-pages-rate": 33886, "skipped": 0, "normal-bytes": 10240499712, "normal": 2500122}}}

6. Cancel the migration:
{ "execute": "migrate_cancel" }
{"return": {}}
{"timestamp": {"seconds": 1603252232, "microseconds": 344845}, "event": "RESUME"}
{"timestamp": {"seconds": 1603252435, "microseconds": 939974}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"timestamp": {"seconds": 1603252436, "microseconds": 17285}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "reason": "No medium found", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1603252436, "microseconds": 17335}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "reason": "No medium found", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1603252436, "microseconds": 17380}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "reason": "No medium found", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1603252436, "microseconds": 17423}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "reason": "No medium found", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1603252436, "microseconds": 17464}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "reason": "No medium found", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1603252436, "microseconds": 17505}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "reason": "No medium found", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1603252436, "microseconds": 17546}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "reason": "No medium found", "operation": "read", "action": "report"}}

7. The bug reproduced:

DES:
(qemu) qemu-kvm: check_section_footer: Read section footer failed: -5
qemu-kvm: load of migration failed: Invalid argument
[root@ibm-p8-kvm-02-qe ngu]# vim vm11.sh


Software versions:
Host kernel: 4.18.0-240.el8.ppc64le
Guest kernel: 4.18.0-240.el8.ppc64le
Qemu: qemu-kvm-5.1.0-14.module+el8.3.0+8438+644aff69.ppc64le
SLOF: SLOF-20200717-1.gite18ddad8.module+el8.3.0+7638+07cf13d2.noarch


So Xiaohui, reassign the bug to you.

Comment 6 Li Xiaohui 2020-10-21 05:50:52 UTC

Hi Nini,
I didn't reproduce this bz only with shared storage image.
This bz is about "migration failed with try the second time migration after cancel migration", so you should do like following steps:
when the first time migration failed(like the steps from your comment 5), start migration again:
1.Boot the guest on dst host again;
2.Enable late-block-activate on dst host;
3.Set migration incoming on dst host; 
4.Start migration on src host again;

Check whether migration failed after step 4: if failed, and error like comment 0, then you reproduce bz.


BTW the first time migration failed, the hmp on dst host will quit automatically and print following information, 
it's expecting action since we cancel migration on src host:
(qemu) qemu-kvm: check_section_footer: Read section footer failed: -5
qemu-kvm: load of migration failed: Invalid argument

Comment 7 Li Xiaohui 2020-10-21 06:05:12 UTC

Sorry for clear two needinfo:
1. Needinfo from Ademar to fjin in Comment 3;
2. Needinfo from Ademar to Eric in Comment 4.

Comment 8 Gu Nini 2020-10-21 07:48:15 UTC

(In reply to Li Xiaohui from comment #6)
> Hi Nini,
> I didn't reproduce this bz only with shared storage image.
> This bz is about "migration failed with try the second time migration after
> cancel migration", so you should do like following steps:
> when the first time migration failed(like the steps from your comment 5),
> start migration again:
> 1.Boot the guest on dst host again;
> 2.Enable late-block-activate on dst host;
> 3.Set migration incoming on dst host; 
> 4.Start migration on src host again;
> 
> Check whether migration failed after step 4: if failed, and error like
> comment 0, then you reproduce bz.
> 
> 
> BTW the first time migration failed, the hmp on dst host will quit
> automatically and print following information, 
> it's expecting action since we cancel migration on src host:
> (qemu) qemu-kvm: check_section_footer: Read section footer failed: -5
> qemu-kvm: load of migration failed: Invalid argument

Ok, thanks Xiaohui.

I originally thought the error info inthe  dst qemu process after cancelling the migration itself was a problem, so there is comment #5. Since we don't have the migration cancel test scenario in our existed storage vm migration test cases, I am trying to figure out the test steps and doing the reproduce.

Comment 9 Fangge Jin 2020-10-21 09:39:39 UTC

It's not a recent regression, also reproduced with:
libvirt-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64

Comment 10 Fangge Jin 2020-10-21 11:03:43 UTC

When do migration at the second time, libvirt thought that "All disk mirrors are ready" wrongly and started to do memory migrate, that's the problem:

28920:2020-10-18 03:51:04.736+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": {}, "id": "libvirt-420"}]
28949:2020-10-18 03:51:04.736+0000: 13827: info : qemuMonitorSend:946 : QEMU_MONITOR_SEND_MSG: mon=0x7f952803aa20 msg={"execute":"blockdev-add","arguments":{"node-name":"migration-vda-format","read-only":false,"driver":"raw","file":"migration-vda-storage"},"id":"libvirt-421"}
28958:2020-10-18 03:51:04.738+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": {}, "id": "libvirt-421"}]
28990:2020-10-18 03:51:04.738+0000: 13827: info : qemuMonitorSend:946 : QEMU_MONITOR_SEND_MSG: mon=0x7f952803aa20 msg={"execute":"blockdev-mirror","arguments":{"job-id":"drive-virtio-disk0","device":"libvirt-1-format","target":"migration-vda-format","speed":9223372036853727232,"sync":"top","auto-finalize":true,"auto-dismiss":false},"id":"libvirt-422"}
28999:2020-10-18 03:51:04.745+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"timestamp": {"seconds": 1602993064, "microseconds": 741875}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive-virtio-disk0"}}]
29039:2020-10-18 03:51:04.745+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"timestamp": {"seconds": 1602993064, "microseconds": 741971}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive-virtio-disk0"}}]
29105:2020-10-18 03:51:04.761+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": {}, "id": "libvirt-422"}]
29133:2020-10-18 03:51:04.762+0000: 13827: debug : qemuMigrationSrcNBDStorageCopyReady:551 : All disk mirrors are ready
                                                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^

29147:2020-10-18 03:51:04.762+0000: 13827: info : qemuMonitorSend:946 : QEMU_MONITOR_SEND_MSG: mon=0x7f952803aa20 msg={"execute":"query-block-jobs","id":"libvirt-423"}
29156:2020-10-18 03:51:04.763+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive-virtio-disk0", "auto-dismiss": false, "busy": true, "len": 1712586752, "offset": 0, "status": "running", "paused": false, "speed": 9223372036853727232, "ready": false, "type": "mirror"}], "id": "libvirt-423"}]
29262:2020-10-18 03:51:04.766+0000: 13827: debug : qemuMonitorSendFileHandle:2644 : fdname=migrate fd=40
29263:2020-10-18 03:51:04.766+0000: 13827: debug : qemuMonitorSendFileHandle:2646 : mon:0x7f952803aa20 vm:0x561aae1226a0 fd:38
29272:2020-10-18 03:51:04.766+0000: 13827: info : qemuMonitorSend:946 : QEMU_MONITOR_SEND_MSG: mon=0x7f952803aa20 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-424"}
29282:2020-10-18 03:51:04.767+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": {}, "id": "libvirt-424"}]
29309:2020-10-18 03:51:04.767+0000: 13827: info : qemuMonitorSend:946 : QEMU_MONITOR_SEND_MSG: mon=0x7f952803aa20 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-425"}
29318:2020-10-18 03:51:04.769+0000: 14869: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"timestamp": {"seconds": 1602993064, "microseconds": 768381}, "event": "MIGRATION", "data": {"status": "setup"}}]

Comment 11 Peter Krempa 2021-01-11 17:46:45 UTC

(In reply to Fangge Jin from comment #10)
> When do migration at the second time, libvirt thought that "All disk mirrors
> are ready" wrongly and started to do memory migrate, that's the problem:

[...]

> 29133:2020-10-18 03:51:04.762+0000: 13827: debug :
> qemuMigrationSrcNBDStorageCopyReady:551 : All disk mirrors are ready

Indeed. In blockdev mode we didn't reset the variable which holds the 'ready' state so the job was wrongly considered done right away.

Fixed upstream:

commit 202d61db4870468a02817e64fefb5ede06b8a59a
Author: Peter Krempa <pkrempa>
Date:   Mon Jan 11 10:42:15 2021 +0100

    qemuBlockJobEventProcess: Always clear 'mirrorState' when a job finishes
    
    When a block job is terminated we should clear the 'mirrorState' and
    'mirrorJob' variables so that stale values are not present prior to a
    new job.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

commit a09c421e3e5969582ed0a5cf9a132ecbfd9794d2
Author: Peter Krempa <pkrempa>
Date:   Mon Jan 11 10:40:43 2021 +0100

    qemuMigrationSrcNBDStorageCopyReady: Use ready-state of mirror from qemuBlockJobData
    
    Use the per-job state to determine when the non-shared-storage mirror is
    complete rather than the per-disk definition one. The qemuBlockJobData
    is a newer approach and is always cleared after the blockjob is
    terminated while the 'mirrorState' variable in the definition of the
    disk may be left over. In such case the disk mirror would be considered
    complete prematurely.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1889131
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

v7.0.0-rc1-10-g202d61db48

Comment 13 Han Han 2021-01-14 09:45:12 UTC

Test on libvirtv7.0.0-rc2-2-g0a6cb05e95 and qemu-5.2.0-4.fc34.x86_64 as following:

while true;do
        virsh migrate test qemu+ssh://root/system --live --p2p --copy-storage-inc&
        sleep $((1 + $RANDOM % 7))
        virsh domjobabort test
        fg
done

Never hit an error like comment0 after thousands of loops.

Comment 17 Han Han 2021-01-20 01:49:08 UTC

Created attachment 1748904 [details]
VM xml, libvirtd log, qmp log

A error happens when cancelling the migration with --copy-storage-all
Version:
libvirt-7.0.0-2.module+el8.4.0+9520+ef609c5f.x86_64
qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08.x86_64


Step:
Start a VM and cancel the migration --copy-storage-all:
➜  ~ virsh migrate seabios qemu+ssh://root.redhat.com/system --live --p2p --copy-storage-all
^Cerror: End of file while reading data: : Input/output error


The error happens at:
2021-01-20 01:19:50.754+0000: 106219: error : virNetSocketReadWire:1814 : End of file while reading data: : Input/output error

It looks similiar to https://bugzilla.redhat.com/show_bug.cgi?id=1821277

Please help to check if it is the regression introduced by this bugfix

Comment 18 Han Han 2021-01-20 01:50:02 UTC

See comment17

Comment 19 Peter Krempa 2021-01-20 09:06:53 UTC

No it's not a regression caused by the bugfix. The relevant context is:

2021-01-20 01:19:50.751+0000: 106219: debug : qemuDomainObjEndJob:1143 : Stopping job: async nested (async=migration out vm=0x7f30bc1aca70 name=seabios)
2021-01-20 01:19:50.751+0000: 106219: debug : qemuDomainObjEnterRemote:5833 : Entering remote (vm=0x7f30bc1aca70 name=seabios)
2021-01-20 01:19:50.751+0000: 106219: debug : virConnectUnregisterCloseCallback:1479 : conn=0x7f30ac00e8d0
2021-01-20 01:19:50.752+0000: 106219: debug : qemuDomainObjExitRemote:5844 : Exited remote (vm=0x7f30bc1aca70 name=seabios)
2021-01-20 01:19:50.754+0000: 106219: error : virNetSocketReadWire:1814 : End of file while reading data: : Input/output error

The error is reported when closing the migration connection on one side apparently and this bug was not touching that at all. This bugfix was related to clearing stale data after blockjob completion and has nothing to do with the remote driver.

The main cause may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1821277 but I will not go into detailed analysis here. Given that aforementioned bug is too crowded with comments and here you have exact steps to reproduce, filing a new one, if you want might be better than trying to attach the migration case there too.

Comment 20 Han Han 2021-01-20 09:56:17 UTC

(In reply to Peter Krempa from comment #19)
> No it's not a regression caused by the bugfix. The relevant context is:
> 
> 2021-01-20 01:19:50.751+0000: 106219: debug : qemuDomainObjEndJob:1143 :
> Stopping job: async nested (async=migration out vm=0x7f30bc1aca70
> name=seabios)
> 2021-01-20 01:19:50.751+0000: 106219: debug : qemuDomainObjEnterRemote:5833
> : Entering remote (vm=0x7f30bc1aca70 name=seabios)
> 2021-01-20 01:19:50.751+0000: 106219: debug :
> virConnectUnregisterCloseCallback:1479 : conn=0x7f30ac00e8d0
> 2021-01-20 01:19:50.752+0000: 106219: debug : qemuDomainObjExitRemote:5844 :
> Exited remote (vm=0x7f30bc1aca70 name=seabios)
> 2021-01-20 01:19:50.754+0000: 106219: error : virNetSocketReadWire:1814 :
> End of file while reading data: : Input/output error
> 
> The error is reported when closing the migration connection on one side
> apparently and this bug was not touching that at all. This bugfix was
> related to clearing stale data after blockjob completion and has nothing to
> do with the remote driver.
> 
> The main cause may be related to
> https://bugzilla.redhat.com/show_bug.cgi?id=1821277 but I will not go into
> detailed analysis here. Given that aforementioned bug is too crowded with
> comments and here you have exact steps to reproduce, filing a new one, if
> you want might be better than trying to attach the migration case there too.

Thank you. New bug: https://bugzilla.redhat.com/show_bug.cgi?id=1918211

Comment 21 Han Han 2021-02-04 09:48:44 UTC

Verified on libvirt-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64 qemu-kvm-5.2.0-5.module+el8.4.0+9775+0937c167.x86_64 as comment13.

Comment 24 errata-xmlrpc 2021-05-25 06:43:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098

Note You need to log in before you can comment on or make changes to this bug.