Bug 1257119

Summary: Blockcopy fails with iscsi disk
Product: Red Hat Enterprise Linux 7 Reporter: Dan Zheng <dzheng>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: dyuan, eblake, famz, gsun, jferlan, jsuchane, juzhang, knoel, kwolf, mazhang, mzhan, pkrempa, rbalakri, shu, shyu, xuzhang, yanyang, yisun
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-23 14:43:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
for raw option none

Description Dan Zheng 2015-08-26 10:14:49 UTC
Description of problem:
The blockcopy command fails when using iscsi disk. This issues happen on both X86_64 and PPC64LE.


Version-Release number of selected component (if applicable):
libvirt-1.2.17-6.el7.ppc64le
qemu-kvm-rhev-2.3.0-18.el7.ppc64le
kernel-3.10.0-302.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
Scenario A:
1. Prepare a transient guest and make it running with below setting using iscsi disk

     <disk type='network' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source protocol='iscsi' name='iqn.2015-08.com.virttest:emulated-iscsi.target/0'>
        <host name='127.0.0.1' port='3260'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
    </disk>
2. With wait option, then copy failed.

# /usr/bin/virsh blockcopy virt-tests-vm1 vda /tmp/blockcopy1.img --wait --verbose
Block Copy: [  0 %]
Copy failed

# echo $?
1

3.  Without wait option, still no block job is found.
# virsh blockcopy virt-tests-vm1 vda /tmp/blockcopy2.img  
Block Copy started

# virsh blockjob virt-tests-vm1 vda --info
No current block job for vda

4. Create another iscsi disk using ,for example, /dev/sdc, then 
# /usr/bin/virsh blockcopy virt-tests-vm1 vda /dev/sdc  --raw --blockdev
Block Copy started

# echo $?
0
# virsh blockjob virt-tests-vm1 vda --info
No current block job for vda

4.1 Wait for a while and check dumpxml of the guest. But no <mirror> is found within <disk>.

Actual result:
See above

Expected result:
Blockcopy operation should be successful.


Additional information:
--

Comment 2 Dan Zheng 2015-08-26 10:39:40 UTC
Created attachment 1067218 [details]
for raw option

virsh blockcopy virt-tests-vm1 vda /dev/sdc  --raw --blockdev
See libvirtd.log.1

Comment 3 Dan Zheng 2015-08-27 02:24:52 UTC
Below is the steps to set up the iscsi environment.

Setup iscsi environment:

1. # dd if=/dev/zero of=/tmp/blockdev-iscsi count=10240 bs=1024K
2. # setenforce 0
3. # targetcli /backstores/fileio/ create device.blockdev-iscsi  /tmp/blockdev-iscsi'
4. # targetcli /iscsi/ create iqn.2015-08.com.virttest:blockdev-iscsi.target'
5. # targetcli /iscsi/iqn.2015-08.com.virttest:blockdev-iscsi.target/tpg1/portals ls'
6. # targetcli /iscsi/iqn.2015-08.com.virttest:blockdev-iscsi.target/tpg1/luns/ create /backstores/fileio/device.blockdev-iscsi'
7. # firewall-cmd --state'
Make sure it is not running.
8. # setenforce 1
9. # targetcli /iscsi/iqn.2015-08.com.virttest:blockdev-iscsi.target/tpg1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1
Define access rights:
Parameter demo_mode_write_protect is now '0'.
Parameter authentication is now '0'.
Parameter generate_node_acls is now '1'.
Parameter cache_dynamic_acls is now '1'.

10. # targetcli / saveconfig
 Last 10 configs saved in /etc/target/backup.
 Configuration saved to /etc/target/saveconfig.json

11. Try to update iscsi initiatorname
# mv /etc/iscsi/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi-virt
12. # service iscsid restart
13. # iscsiadm --mode node --login --targetname iqn.2015-08.com.virttest:blockdev-iscsi.target --portal 127.0.0.1
14. Searching iscsi device name
# iscsiadm -m session -P 3
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 9        State: running
                scsi9 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdc                State: running

15. Create another iscsi disk using step 1 ~ step 10 with 'emulated-iscsi' instead of  'blockdev-iscsi'
16. # qemu-img convert -f qcow2 -O qcow2 /var/lib/virt_test/images/jeos-19-64.qcow2 /tmp/emulated-iscsi

Comment 4 Eric Blake 2015-08-27 20:13:38 UTC
Could this be the qemu bug fixed upstream by:

commit e424aff5f307227b1c2512bbb8ece891bb895cef
Author: Kevin Wolf <kwolf>
Date:   Thu Aug 13 10:41:50 2015 +0200

mirror: Fix coroutine reentrance

Meanwhile, I'm trying to reproduce locally.

Comment 5 Yang Yang 2015-09-02 01:33:12 UTC
It cannot be reproduced if iscsi server is created by tgtd on rhel6. It can be reproduced if iscsi server is created by targetcli on rhel7.

Comment 6 mazhang 2015-09-08 08:15:40 UTC
Just test with qemu-kvm directly, hit block job error twice, but not sure it's the same problem.

Host:
3.10.0-314.el7.x86_64
qemu-kvm-rhev-2.3.0-22.el7.x86_64

Steps for reproduce:
1. Boot vm with libiscsi disk.
-drive file=iscsi://10.66.9.236/iqn.2003-01.org.linux-iscsi.dhcp-9-236.x8664:sn.aba7fca8bbd3/0,if=none,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-scsi-pci,id=scsi0 \
-device scsi-hd,drive=drive-scsi-disk,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk,bootindex=1 \

2. Do block mirror.
{ "execute": "drive-mirror", "arguments": { "device": "drive-scsi-disk", "target": "/home/snapshot1", "format": "raw","mode": "absolute-paths", "sync": "full"}}
{"return": {}}
{"timestamp": {"seconds": 1441699467, "microseconds": 93838}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi-disk", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1441699467, "microseconds": 186866}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-scsi-disk", "len": 10256711680, "offset": 774832128, "speed": 0, "type": "mirror", "error": "Input/output error"}}
{ "execute" : "query-block-jobs", "arguments" : {} }
{"return": []}

Also got error message from hmp:

(qemu) qemu-kvm: iSCSI Failure: SENSE KEY:NOT READY(2) ASCQ:(null)(0x0800)

Comment 8 Kevin Wolf 2015-09-09 12:26:34 UTC
(In reply to Eric Blake from comment #4)
> Could this be the qemu bug fixed upstream by:
> 
> commit e424aff5f307227b1c2512bbb8ece891bb895cef
> Author: Kevin Wolf <kwolf>
> Date:   Thu Aug 13 10:41:50 2015 +0200
> 
> mirror: Fix coroutine reentrance

Unlikely. The bug fixed by that commit is a qemu crash (failing assertion),
whereas this one seems to be a block job failing with -EINVAL if I'm reading the
logged BLOCK_JOB_COMPLETED event correctly; more precisely it seems to be a
failed read from the source according to the BLOCK_JOB_ERROR event.

I'm not sure how to read the libvirt logs correctly, but it seems to be talking
about removing a device before the error occurs. Might this be related?

2015-08-26 10:18:53.819+0000: 25723: debug : udevEventHandleCallback:1543 : udev action: 'remove'
2015-08-26 10:18:53.819+0000: 25723: debug : udevRemoveOneDevice:1314 : Failed to find device to remove that has udev name '/sys/devices/platform/host9/session9/target9:0:0/9:0:0:0/block/sdc/sdc1'

Comment 9 Yang Yang 2015-09-10 01:58:16 UTC
Eric and Kevin,
This bug is not same as Bug 1251487 - qemu core dump when do drive mirror. I can still reproduce this one with latest qemu-kvm-rhev-2.3.0-22.el7 in which patch *mirror: Fix coroutine reentrance* has been backported

Comment 10 mazhang 2015-09-10 02:30:57 UTC
(In reply to mazhang from comment #6)
> Just test with qemu-kvm directly, hit block job error twice, but not sure
> it's the same problem.
> 
> Host:
> 3.10.0-314.el7.x86_64
> qemu-kvm-rhev-2.3.0-22.el7.x86_64
> 
> Steps for reproduce:
> 1. Boot vm with libiscsi disk.
> -drive
> file=iscsi://10.66.9.236/iqn.2003-01.org.linux-iscsi.dhcp-9-236.x8664:sn.
> aba7fca8bbd3/0,if=none,id=drive-scsi-disk,format=qcow2,cache=none,
> werror=stop,rerror=stop \
> -device virtio-scsi-pci,id=scsi0 \
> -device
> scsi-hd,drive=drive-scsi-disk,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk,
> bootindex=1 \
> 
> 2. Do block mirror.
> { "execute": "drive-mirror", "arguments": { "device": "drive-scsi-disk",
> "target": "/home/snapshot1", "format": "raw","mode": "absolute-paths",
> "sync": "full"}}
> {"return": {}}
> {"timestamp": {"seconds": 1441699467, "microseconds": 93838}, "event":
> "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi-disk", "operation":
> "read", "action": "report"}}
> {"timestamp": {"seconds": 1441699467, "microseconds": 186866}, "event":
> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-scsi-disk", "len":
> 10256711680, "offset": 774832128, "speed": 0, "type": "mirror", "error":
> "Input/output error"}}
> { "execute" : "query-block-jobs", "arguments" : {} }
> {"return": []}
> 
> Also got error message from hmp:
> 
> (qemu) qemu-kvm: iSCSI Failure: SENSE KEY:NOT READY(2) ASCQ:(null)(0x0800)

I can't make sure it's the same problem, so file a new bug 1261701 to trace it.

Comment 11 Ademar Reis 2015-09-14 20:11:08 UTC
(In reply to Kevin Wolf from comment #8)
> I'm not sure how to read the libvirt logs correctly, but it seems to be
> talking
> about removing a device before the error occurs. Might this be related?
> 
> 2015-08-26 10:18:53.819+0000: 25723: debug : udevEventHandleCallback:1543 :
> udev action: 'remove'
> 2015-08-26 10:18:53.819+0000: 25723: debug : udevRemoveOneDevice:1314 :
> Failed to find device to remove that has udev name
> '/sys/devices/platform/host9/session9/target9:0:0/9:0:0:0/block/sdc/sdc1'

Eric?

Comment 13 Jaroslav Suchanek 2015-09-17 14:59:29 UTC
Setting an exception flag as it may lead to data corruption.

Comment 14 Peter Krempa 2015-09-23 14:43:56 UTC
The problem manifests itself very similarly to the scsi-hd issue that was reported later here and was filed separately as https://bugzilla.redhat.com/show_bug.cgi?id=1261701

I'll close this as duplicate of the qemu bug, since it looks like a issue in the iSCSI backend of qemu. If it's deemed that this operation should be forbidden this bug can be reopened or the qemu bug can be moved to libvirt.

*** This bug has been marked as a duplicate of bug 1261701 ***