Bug 1445596
| Summary: | forbid setting unsupported 'write_threshold' property for direct iscsi backends | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | yisun | ||||||
| Component: | qemu-kvm-rhev | Assignee: | Stefan Hajnoczi <stefanha> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | CongLi <coli> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.4 | CC: | chayang, coli, eblake, hhan, jiyan, juzhang, knoel, kwolf, lmen, michen, ngu, pkrempa, pzhang, rbalakri, stefanha, virt-maint, xuzhang, yisun | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-01-08 16:17:34 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
I'm currently getting "error: Operation not supported: threshold currently can't be set for block device 'sdb'" with the above configuration.
Could you please post the output of:
virsh qemu-monitor-command --pretty $VMNAME '{"execute" : "query-named-block-nodes"}'
if you are able to set the threshold?
(In reply to Peter Krempa from comment #2) > I'm currently getting "error: Operation not supported: threshold currently > can't be set for block device 'sdb'" with the above configuration. > > Could you please post the output of: > > virsh qemu-monitor-command --pretty $VMNAME '{"execute" : > "query-named-block-nodes"}' > > if you are able to set the threshold? the machine I used for test is maintained by beaker and not it's not available I tried on my desktop machine and it also gives me the same error msg you get. And even if I downgrade libvirt and qemu-kvm-rhev to: libvirt-3.2.0-2.el7.x86_64 qemu-kvm-rhev-2.9.0-1.el7.x86_64 I am not quite sure what's other differences between the machines now :( Well, in that case the issue is similar to the LUKS issue, since the names differ and thus the algorithm to lookup the node names is not able to find this. It will be properly fixed once we provide node names manually. Upstream fixes this in the refactor of the node detection code:
commit 0175dc6ea024d4edd0f59571c3f5fa80d1ec1c0e
Author: Peter Krempa <pkrempa>
Date: Wed Jul 26 09:36:21 2017 +0200
qemu: block: Refactor node name detection code
Remove the complex and unreliable code which inferred the node name
hierarchy only from data returned by 'query-named-block-nodes'. It turns
out that query-blockstats contain the full hierarchy of nodes as
perceived by qemu so the inference code is not necessary.
In query blockstats, the 'parent' object corresponds to the storage
behind a storage volume and 'backing' corresponds to the lower level of
backing chain. Since all have node names this data can be really easily
used to detect node names.
In addition to the code refactoring the one remaining test case needed
to be fixed along.
Following test case shows that libvirt can detect the node name for the iSCSI volume:
Author: Peter Krempa <pkrempa>
Date: Wed May 3 08:45:00 2017 +0200
tests: qemumonitorjson: Test extraction of iSCSI device node names
Test storage was created on a rhel/centos 7 node using targetcli.
Try this on 2 machines but still failed, so marked as assigned for now
(why there is no "failed_qa" now...)
ON HOST:
## rpm -qa | egrep "libvirt-3|qemu-kvm-rhev"
libvirt-3.9.0-6.el7.x86_64
qemu-kvm-rhev-2.10.0-12.el7.x86_64
## virsh dumpxml avocado-vt-vm2
...
<disk type='network' device='lun'>
<driver name='qemu' type='raw'/>
<source protocol='iscsi' name='iqn.2016-03.com.virttest:logical-pool.target/0'>
<host name='10.66.5.64' port='3260'/>
</source>
<target dev='sdb' bus='scsi'/>
<alias name='scsi0-0-0-1'/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
...
## virsh domblklist avocado-vt-vm2
Target Source
------------------------------------------------
sda /var/lib/libvirt/images/RHEL-6.9-x86_64-latest.qcow2
sdb iqn.2016-03.com.virttest:logical-pool.target/0
## virsh domblkthreshold avocado-vt-vm2 sdb 200M
## virsh domstats avocado-vt-vm2 --block | grep thre
block.1.threshold=209715200
## virsh event --event block-threshold --loop
IN GUEST:
[root@localhost ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 10G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 9.5G 0 part
├─VolGroup-lv_root (dm-0) 253:0 0 8.5G 0 lvm /
└─VolGroup-lv_swap (dm-1) 253:1 0 1G 0 lvm [SWAP]
sdb 8:16 0 1000M 0 disk
[root@localhost ~]# dd if=/dev/urandom of=/dev/sdb bs=1M count=300
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 27.4868 s, 11.4 MB/s
[root@localhost ~]# sync
ON HOST:
nothing happens
## virsh event --event block-threshold --loop
^Cevent loop interrupted
events received: 0
Could you please check that the libvirtd debug log does not contain BLOCK_WRITE_THRESHOLD message from qemu. Also please run the following command:
virsh qemu-monitor-command avocado-vt-vm2 '{"execute":"query-named-block-nodes"}'
(In reply to Peter Krempa from comment #9) > Could you please check that the libvirtd debug log does not contain > BLOCK_WRITE_THRESHOLD message from qemu. Also please run the following > command: > > virsh qemu-monitor-command avocado-vt-vm2 > '{"execute":"query-named-block-nodes"}' I help yisun reply the comment: version: libvirt-3.9.0-6.el7.x86_64 qemu-kvm-rhev-2.10.0-13.el7.x86_64 I tried the similar steps: [root@lmen1 ~]# virsh domblklist test Target Source ------------------------------------------------ vda /var/lib/libvirt/images/test.s11 sdb iqn.2016-03.com.virttest:emulated-iscsi.target/0 [root@lmen1 ~]# virsh domblkthreshold test sdb 200M [root@lmen1 ~]# virsh domstats test --block | grep thre block.1.threshold=209715200 [root@lmen1 ~]# virsh event --event block-threshold --loop in the guest: [root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300 ON HOST: nothing happens After testing the above steps, 1)I checked the libvirtd log with keyword "BLOCK_WRITE_THRESHOLD" in the log,there is not the keyword 2)I run the command : [root@lmen1 ~]# virsh qemu-monitor-command test '{"execute":"query-named-block-nodes"}' the result of the command has been uploaded as the an attachment named "result from 'virsh qemu-monitor-command'" Created attachment 1370314 [details]
result from 'virsh qemu-monitor-command'
Thank you very much for providing the requested data. It looks like qemu does not deliver the event for the iSCSI driver:
{
"iops_rd": 0,
"detect_zeroes": "off",
"image": {
"virtual-size": 1048576000,
"filename": "json:{\"lun\": \"0\", \"portal\": \"10.66.70.107:3260\", \"driver\": \"iscsi\", \"transport\": \"tcp\", \"target\": \"iqn.2016-03.com.virttest:emulated-iscsi.target\"}",
"format": "iscsi",
"dirty-flag": false
},
"iops_wr": 0,
"ro": false,
"node-name": "#block1208",
"backing_file_depth": 0,
"drv": "iscsi",
"iops": 0,
"bps_wr": 0,
"write_threshold": 209715200,
"encrypted": false,
"bps": 0,
"bps_rd": 0,
"cache": {
"no-flush": false,
"direct": false,
"writeback": true
},
"file": "json:{\"lun\": \"0\", \"portal\": \"10.66.70.107:3260\", \"driver\": \"iscsi\", \"transport\": \"tcp\", \"target\": \"iqn.2016-03.com.virttest:emulated-iscsi.target\"}",
"encryption_key_missing": false
},
The iSCSI driver backend still reports the write threshold set to the 200MiB mark despite writing 300MiB from the beginning of the device. Since the logs did not contain any mention of BLOCK_WRITE_THRESHOLD it looks like qemu did not ever deliver the event.
I'm moving this to qemu for further investigation.
(In reply to lijuan men from comment #10) > (In reply to Peter Krempa from comment #9) > in the guest: > [root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300 This command may not send 300 MB of writes to the disk, they will stay in the guest page cache for some time. Please add oflag=direct so the writes are guaranteed to be sent to the disk. (In reply to Stefan Hajnoczi from comment #14) > (In reply to lijuan men from comment #10) > > (In reply to Peter Krempa from comment #9) > > in the guest: > > [root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300 > > This command may not send 300 MB of writes to the disk, they will stay in > the guest page cache for some time. Please add oflag=direct so the writes > are guaranteed to be sent to the disk. thanks for pointing out my mistake I used the command to test again: # dd if=/dev/urandom of=/dev/sda bs=1M count=300 oflag=direct 1)no output from command "# virsh event --event block-threshold --loop" 2)there is no keyword "BLOCK_WRITE_THRESHOLD" in libvirtd log 3)run command: [root@lmen1 ~]# virsh qemu-monitor-command test '{"execute":"query-named-block-nodes"}' I will uploaded the new result for your reference Created attachment 1370721 [details]
new result from 'virsh qemu-monitor-command'
(In reply to lijuan men from comment #15) > (In reply to Stefan Hajnoczi from comment #14) > > (In reply to lijuan men from comment #10) > > > (In reply to Peter Krempa from comment #9) > > > in the guest: > > > [root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300 > > > > This command may not send 300 MB of writes to the disk, they will stay in > > the guest page cache for some time. Please add oflag=direct so the writes > > are guaranteed to be sent to the disk. > > thanks for pointing out my mistake > > I used the command to test again: > # dd if=/dev/urandom of=/dev/sda bs=1M count=300 oflag=direct > > 1)no output from command "# virsh event --event block-threshold --loop" > > 2)there is no keyword "BLOCK_WRITE_THRESHOLD" in libvirtd log > > 3)run command: > [root@lmen1 ~]# virsh qemu-monitor-command test > '{"execute":"query-named-block-nodes"}' > > I will uploaded the new result for your reference Thanks for the info. It turns out the problem is that the disk was configured using "<disk type='network' device='lun'>". Here is what the libvirt documentation says about device='lun': Using "lun" (since 0.9.10) is only valid when the type is "block" or "network" for protocol='iscsi' or when the type is "volume" when using an iSCSI source pool for mode "host" or as an NPIV virtual Host Bus Adapter (vHBA) using a Fibre Channel storage pool. Configured in this manner, the LUN behaves identically to "disk", except that generic SCSI commands from the guest are accepted and passed through to the physical device. Also note that device='lun' will only be recognized for actual raw devices, but never for individual partitions or LVM partitions (in those cases, the kernel will reject the generic SCSI commands, making it identical to device='disk'). This means I/O requests are passed through to iSCSI. QEMU does not interpret them and cannot offer block layer features like write threshold or throttling. You can verify this by looking at the QEMU command-line. It contains -device scsi-block instead of -device scsi-hd. This is expected behavior although libvirt may wish to document it more clearly. In such case qemu should report an error when a user or libvirt attempts to set the 'write_threshold' property for a device which does not support it. Libvirt obviously could report the error here but this would be a workaround for this and if qemu changes behavior we would need to fix it. Reassigning back to qemu. I've also tweaked the summary to reflect the actual issue. (In reply to Peter Krempa from comment #18) > In such case qemu should report an error when a user or libvirt attempts to > set the 'write_threshold' property for a device which does not support it. > Libvirt obviously could report the error here but this would be a workaround > for this and if qemu changes behavior we would need to fix it. > > Reassigning back to qemu. I'm not sure if QEMU should prevent setting write_threshold. The write_threshold is a block driver graph node property. A node can have multiple users, like the built-in NBD server, in addition to device models like virtio-scsi-pci or virtio-blk. The write_threshold works for the NBD server even though it does not work for scsi-block on the same block driver graph node. Depending on what the user is trying to achieve, this may be okay and shouldn't be prevented. Additionally, it's up to the device model (virtio-scsi-pci/virtio-blk) whether write_threshold is bypassed. Some device models make this decision on a per-request basis at runtime: some I/O requests go through the write_threshold check while others do not, depending on guest activity. virtio_blk's SCSI passthrough command is an example of this. So it's not really possible to say whether all I/O requests will go through the write_threshold check. There would be false positives. Kevin: What do you think about this? (In reply to Stefan Hajnoczi from comment #20) > (In reply to Peter Krempa from comment #18) > > In such case qemu should report an error when a user or libvirt attempts to > > set the 'write_threshold' property for a device which does not support it. > > Libvirt obviously could report the error here but this would be a workaround > > for this and if qemu changes behavior we would need to fix it. > > > > Reassigning back to qemu. > > I'm not sure if QEMU should prevent setting write_threshold. The > write_threshold is a block driver graph node property. A node can have > multiple users, like the built-in NBD server, in addition to device models > like virtio-scsi-pci or virtio-blk. > > The write_threshold works for the NBD server even though it does not work > for scsi-block on the same block driver graph node. Depending on what the > user is trying to achieve, this may be okay and shouldn't be prevented. > > Additionally, it's up to the device model (virtio-scsi-pci/virtio-blk) > whether write_threshold is bypassed. Some device models make this decision > on a per-request basis at runtime: some I/O requests go through the > write_threshold check while others do not, depending on guest activity. > virtio_blk's SCSI passthrough command is an example of this. I'd say that this is in such case even more of a reason to do it in qemu since libvirt cannot keep up in knowing when this works. If qemu is not going to do anything about that, then please close it as "wontfix" since clearly rejecting it with 'LUN' disks in libvirt is wrong since it may work in some cases. Before scsi-block came up, things were easy: SCSI devices were shoehorned into block devices without really supporting any block layer features besides sending ioctls to the device. We would simply disable everything for bs->sg = 1. With scsi-block, we have devices that don't have bs->sg = 1, but still bypass the block layer functionality by sending ioctls. They don't consistently bypass the block layer though, but issue normal I/O requests for a few SCSI commands, like the usual READ and WRITE variants. If we can find out which SCSI command it is that writes to the image with an ioctl rather than going through the block layer, we may be able to fix this specific case. But I'm having a hard time imagining a full solution that keeps the hybrid nature of scsi-block, doesn't forbid more than necessary and still doesn't result in suprising behaviour. This doesn't only affect things like write threshold, but also block jobs etc. may not be seeing writes and therefore produce corrupted target images. By the way, if we continue our efforts to make things like block jobs and write threshold use their own block driver nodes, we will get very visible failure because these filter block drivers don't usually forward .bdrv_aio_ioctl() (and they shouldn't, because see above, filters + ioctl = corruption). SCSI passthrough does not support QEMU block layer features like the write threshold event. It's not feasible to implement support due to the nature of passthrough or to refuse setting the write_threshold property. Users must be aware that SCSI passthrough does not combine with block layer features. |
Description of problem: iscsi backend img cannot trigger block-threshold event Version-Release number of selected component (if applicable): libvirt-3.2.0-2.el7.x86_64 qemu-kvm-rhev-2.9.0-1.el7.x86_64 How reproducible: 100% Steps to Reproduce: @Terminal 1: ## iscsiadm -m discovery -t sendtargets -p 10.73.196.113 10.73.196.113:3260,1 iqn.2016-03.com.virttest:logical-pool.target ## virsh dumpxml vm1 ... <disk type='network' device='lun'> <driver name='qemu' type='raw'/> <source protocol='iscsi' name='iqn.2016-03.com.virttest:logical-pool.target/0'> <host name='10.73.196.113' port='3260'/> </source> <backingStore/> <target dev='sdb' bus='scsi'/> <alias name='scsi0-0-0-1'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> ... ## virsh domblkthreshold vm1 sdb 200M ## virsh domstats vm1 --block Domain: 'vm1' ... block.1.threshold=209715200 ## virsh event --event block-threshold --loop @Terminal 2: ## virsh console vm1 [root@localhost ~]# dd if=/dev/urandom of=/dev/sdb bs=1M count=300 300+0 records in 300+0 records out 314572800 bytes (315 MB) copied, 17.5543 s, 17.9 MB/s [root@localhost ~]# mkfs.ext4 /dev/sdb mke2fs 1.42.9 (28-Dec-2013) ... Writing superblocks and filesystem accounting information: done [root@localhost ~]# mount /dev/sdb /mnt [root@localhost ~]# dd if=/dev/urandom of=/mnt/1 bs=1M count=300 300+0 records in 300+0 records out 314572800 bytes (315 MB) copied, 18.8775 s, 16.7 MB/s [root@localhost ~]# sync @Terminal 1: ## virsh event --event block-threshold --loop <==== nothing happened here Actual results: block-threshold event not triggered when write data to iscsi backend img Expected results: block-threshold event triggered Additional info: This is separated from bz 1181659