Bug 1631127
| Summary: | Guest is broken when do ping-pong migration with iscsi backend via -blockdev | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Xueqiang Wei <xuwei> | ||||||||||
| Component: | qemu-kvm | Assignee: | Virtualization Maintenance <virt-maint> | ||||||||||
| qemu-kvm sub component: | iSCSI | QA Contact: | qing.wang <qinwang> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||||||||
| Severity: | high | ||||||||||||
| Priority: | high | CC: | aliang, chayang, coli, jinzhao, juzhang, kanderso, knoel, kwolf, mtessun, ngu, qinwang, rbalakri, virt-maint, xuwei, yhong, yuhuang | ||||||||||
| Version: | --- | Keywords: | Triaged | ||||||||||
| Target Milestone: | pre-dev-freeze | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2020-07-23 04:01:33 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 1649160 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Xueqiang Wei
2018-09-20 03:10:20 UTC
Found error in src host and dst host. # cat /var/log/message ...... Sep 19 21:38:24 localhost kernel: connection3:0: detected conn error (1020) Sep 19 21:38:25 localhost iscsid: Kernel reported iSCSI connection 3:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) Sep 19 21:38:27 localhost iscsid: connection3:0 is operational after recovery (1 attempts) Sep 19 21:38:56 localhost kernel: connection3:0: detected conn error (1020) Sep 19 21:38:56 localhost iscsid: Kernel reported iSCSI connection 3:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) # cat /var/log/message ...... Sep 20 11:29:46 localhost iscsid: connection1:0 is operational after recovery (1 attempts) Sep 20 11:29:47 localhost iscsid: connection2:0 is operational after recovery (1 attempts) Sep 20 11:29:48 localhost kernel: connection1:0: detected conn error (1020) Sep 20 11:29:48 localhost iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) Sep 20 11:29:49 localhost kernel: connection2:0: detected conn error (1020) Sep 20 11:29:49 localhost iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) Sep 20 11:29:51 localhost iscsid: connection1:0 is operational after recovery (1 attempts) Sep 20 11:29:52 localhost iscsid: connection2:0 is operational after recovery (1 attempts) Sep 20 11:29:52 localhost kernel: connection1:0: detected conn error (1020) Sep 20 11:29:53 localhost iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) Created attachment 1485009 [details]
screenshot1
(In reply to Miya Chen from comment #3) > Xueqiang, could you please check if there is image corruption? Thanks. Recheck it with qemu-img command, no errors were found. # qemu-img info rhel76-64-virtio-scsi.qcow2 image: rhel76-64-virtio-scsi.qcow2 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 4.4G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img check rhel76-64-virtio-scsi.qcow2 No errors were found on the image. 71606/327680 = 21.85% allocated, 15.93% fragmented, 0.00% compressed clusters Image end offset: 4694147072 Retested without multipath, also hit this issue. Define an new iSCSI LUN on iscsi server and connect to it. (/dev/sdd) Host: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 464.7G 0 disk ├─sda1 8:1 0 1G 0 part /boot └─sda2 8:2 0 463.7G 0 part ├─rhel_ibm--x3100m4--02-root 253:0 0 50G 0 lvm / ├─rhel_ibm--x3100m4--02-swap 253:1 0 7.9G 0 lvm [SWAP] └─rhel_ibm--x3100m4--02-home 253:2 0 405.9G 0 lvm /home sdb 8:16 0 40G 0 disk └─mpathg 253:3 0 40G 0 mpath sdc 8:32 0 40G 0 disk └─mpathg 253:3 0 40G 0 mpath sdd 8:48 0 20G 0 disk /home/iscsi_mount sr0 11:0 1 1024M 0 rom # mkfs.xfs /dev/sdd # mount /dev/sdd /home/iscsi_mount/ You never mentioned what the storage setup looked like inside the guest. Could this be a duplicate of BZ 1673080, fixed with qemu-kvm-rhev-2.12.0-23.el7? In this case, you would see a multipath setup inside the guest. Can you try whether the problem still happens in the current build? If that's not it, can you please check if you can see any difference between -drive and -blockdev from inside the guest (e.g. in lsblk)? Testing virtio-blk instead of virtio-scsi could give us another hint. (In reply to Kevin Wolf from comment #13) > You never mentioned what the storage setup looked like inside the guest. > Could this be a duplicate of BZ 1673080, fixed with > qemu-kvm-rhev-2.12.0-23.el7? In this case, you would see a multipath setup > inside the guest. Can you try whether the problem still happens in the > current build? They are not the same issue. The multipath is on host via connect to iscsi server. Details in below. > If that's not it, can you please check if you can see any difference between > -drive and -blockdev from inside the guest (e.g. in lsblk)? > > Testing virtio-blk instead of virtio-scsi could give us another hint. Retested on latest qemu-kvm-rhev: qemu-kvm-rhev-2.12.0-25.el7 via -blockdev, both virio-scsi and virtio-blk are all failed to boot up guest after step 6 (do system reset after migration). Do ping-pong migration for 5 times, not hit this issue via -drive. Details: Host: kernel-3.10.0-1030.el7.x86_64 qemu-kvm-rhev-2.12.0-25.el7 Guest: kernel-3.10.0-1031.el7.x86_64 1. setup environment on host (1) src host connect to iscsi server. src host: # multipath -ll mpathb (3600140560a54e0e0daf4b3ba7724b80d) dm-3 LIO-ORG ,stor0 size=40G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 6:0:0:0 sdb 8:16 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 7:0:0:0 sdc 8:32 active ready running # lsscsi [0:0:0:0] disk ATA ST1000NX0423 LE43 /dev/sda [6:0:0:0] disk LIO-ORG stor0 4.0 /dev/sdb [7:0:0:0] disk LIO-ORG stor0 4.0 /dev/sdc [8:0:0:0] disk LIO-ORG stor1 4.0 /dev/sdd # mkfs.xfs /dev/mapper/mpathb # mount /dev/mapper/mpathb /home/iscsi_mount/ # copy system image and data image to /home/iscsi_mount/ (2) dst host connect to iscsi server. dst host: # multipath -ll mpathb (3600140560a54e0e0daf4b3ba7724b80d) dm-3 LIO-ORG ,stor0 size=40G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 30:0:0:0 sdc 8:32 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 31:0:0:0 sdd 8:48 active ready running # lsscsi [0:0:9:0] disk IBM-ESXS ST9146803SS B53C /dev/sda [0:0:10:0] disk IBM-ESXS ST9146803SS B53C /dev/sdb [30:0:0:0] disk LIO-ORG stor0 4.0 /dev/sdc [31:0:0:0] disk LIO-ORG stor0 4.0 /dev/sdd [32:0:0:0] disk LIO-ORG stor1 4.0 /dev/sde # mount /dev/mapper/mpathb /home/iscsi_mount/ 2. boot guest with system disk and data disk on src and dst host, and then do ping-pong migration. Steps in Comment 0. /home/iscsi_mount/rhel77-64-virtio-scsi.qcow2 /home/iscsi_mount/data.qcow2 Guest via virtio-scsi: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 20G 0 disk ├─sda1 8:1 0 1G 0 part /boot └─sda2 8:2 0 19G 0 part ├─rhel_bootp--73--227--124-root 253:0 0 17G 0 lvm / └─rhel_bootp--73--227--124-swap 253:1 0 2G 0 lvm [SWAP] sdb 8:16 0 5G 0 disk # lsscsi [2:0:0:0] disk QEMU QEMU HARDDISK 2.5+ /dev/sda [3:0:0:0] disk QEMU QEMU HARDDISK 2.5+ /dev/sdb Guest via virtio-blk: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 20G 0 disk ├─vda1 252:1 0 1G 0 part /boot └─vda2 252:2 0 19G 0 part ├─rhel_bootp--73--227--124-root 253:0 0 17G 0 lvm / └─rhel_bootp--73--227--124-swap 253:1 0 2G 0 lvm [SWAP] vdb 252:16 0 5G 0 disk Created attachment 1550951 [details]
screenshot_via_virtio-blk
Does the bug occur when you use -drive ...,node-name=foo -device ...,drive=foo? That is, using a node-name to create the device, like with -blockdev, but still using -drive? (In reply to Kevin Wolf from comment #16) > Does the bug occur when you use -drive ...,node-name=foo -device > ...,drive=foo? That is, using a node-name to create the device, like with > -blockdev, but still using -drive? Tested 20 times on the same environment as Comment 14, not hit this issue with below cmd lines. -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/iscsi_mount/rhel77-64-virtio-scsi.qcow2,node-name=image1 \ -device scsi-hd,id=image1,drive=image1,bootindex=0,bus=virtio_scsi_pci0.0 \ -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x6 \ -drive id=drive_data,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/iscsi_mount/data.qcow2,node-name=data_disk \ -device scsi-hd,id=data-disk1,drive=data_disk,bus=scsi1.0 \ But I found error in dmesg log, please refer to attachment for full log: Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current] Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] CDB: Write(10) 2a 00 00 06 9e f0 00 04 00 00 Apr 25 15:54:15 bootp-73-226-153 kernel: blk_update_request: I/O error, dev sda, sector 433904 Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current] Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated Apr 25 15:54:15 bootp-73-226-153 kernel: sd 2:0:0:0: [sda] CDB: Write(10) 2a 00 00 06 aa f0 00 04 00 00 Apr 25 15:54:15 bootp-73-226-153 kernel: blk_update_request: I/O error, dev sda, sector 436976 Apr 25 15:54:15 bootp-73-226-153 kernel: XFS (sda1): writeback error on sector 418544 Created attachment 1558608 [details]
dmesg log
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Test on Host A 4.18.0-193.13.2.el8_2.x86_64 qemu-kvm-core-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64 host B: 4.18.0-227.el8.x86_64 qemu-kvm-core-4.2.0-30.module+el8.3.0+7298+c26a06b8.x86_64 due to can not connect same lun in hosts,https://community.spiceworks.com/topic/545423-two-computers-one-iscsi-target 1. setup environment (1) src host connect to iscsi server. # multipath -ll mpatha (360014054a19629fbe434180acbdbe077) dm-3 LIO-ORG,mpath-disk1 size=60G features='0' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 2:0:0:0 sdb 8:16 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 3:0:0:0 sdc 8:32 active undef running # mkfs.xfs /dev/mapper/mpatha # mount /dev/mapper/mpatha /home/iscsi_mount/ # copy system image and data image to /home/iscsi_mount/ #export /home/iscsi_mount/ with nfs-server (2) dst host connect to nfs server. mount hostA:/home/iscsi_mount/ /home/iscsi_mount 2. boot src guest via blockdev /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/iscsi_mount/rhel820-64-virtio-scsi.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device scsi-hd,drive=my,bus=virtio_scsi_pci0.0 \ -device virtio-net-pci,mac=9a:ad:ae:af:b0:b1,id=id942Wof,vectors=4,netdev=idirzdj4,bus=pci.0,addr=0x5 \ -netdev tap,id=idirzdj4,vhost=on \ -m 4G \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :5 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,strict=off,order=cdn,once=d \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:5955,server,nowait \ -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x6 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/iscsi_mount/data.qcow2,node-name=drive2 \ -blockdev driver=qcow2,node-name=my1,file=drive2 \ -device scsi-hd,drive=my1,id=data-disk1,bus=scsi1.0 \ 3. Install stressapptest and run it in guest. # git clone https://github.com/stressapptest/stressapptest.git # cd stressapptest && ./configure && make && make install # stressapptest -M 100 -s 1000 4. boot dst guest with listening mode /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/iscsi_mount/rhel820-64-virtio-scsi.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device scsi-hd,drive=my,bus=virtio_scsi_pci0.0 \ -device virtio-net-pci,mac=9a:ad:ae:af:b0:b1,id=id942Wof,vectors=4,netdev=idirzdj4,bus=pci.0,addr=0x5 \ -netdev tap,id=idirzdj4,vhost=on \ -m 4G \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :5 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,strict=off,order=cdn,once=d \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:5955,server,nowait \ -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x6 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/iscsi_mount/data.qcow2,node-name=drive2 \ -blockdev driver=qcow2,node-name=my1,file=drive2 \ -device scsi-hd,drive=my1,id=data-disk1,bus=scsi1.0 \ -incoming tcp:0:5800 \ 5. Migrate guest from src host to dst host during stressapptest running. Do system_reset after migration. src host: (qemu) migrate -d tcp:10.73.130.203:5800 (qemu) migrate_set_downtime 20 (qemu) info status VM status: paused (postmigrate) dst host: (qemu) info status VM status: paused (inmigrate) (qemu) info status VM status: running (qemu) system_reset 6. Do ping-pong migration for 5 times. start stressapptest in guest after system_set. # stressapptest -M 100 -s 1000 (qemu) migrate -d tcp:10.73.114.14:5800 (qemu) migrate_set_downtime 20 Not found issue. |