RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1445596 - forbid setting unsupported 'write_threshold' property for direct iscsi backends
Summary: forbid setting unsupported 'write_threshold' property for direct iscsi backends
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: CongLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-26 05:14 UTC by yisun
Modified: 2018-01-25 07:39 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-08 16:17:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
result from 'virsh qemu-monitor-command' (14.48 KB, text/plain)
2017-12-20 07:13 UTC, lijuan men
no flags Details
new result from 'virsh qemu-monitor-command' (26.66 KB, text/plain)
2017-12-21 05:21 UTC, lijuan men
no flags Details

Description yisun 2017-04-26 05:14:36 UTC
Description of problem:
iscsi backend img cannot trigger block-threshold event

Version-Release number of selected component (if applicable):
libvirt-3.2.0-2.el7.x86_64
qemu-kvm-rhev-2.9.0-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
@Terminal 1:
## iscsiadm -m discovery -t sendtargets -p 10.73.196.113
10.73.196.113:3260,1 iqn.2016-03.com.virttest:logical-pool.target

## virsh dumpxml vm1
...
    <disk type='network' device='lun'>
      <driver name='qemu' type='raw'/>
      <source protocol='iscsi' name='iqn.2016-03.com.virttest:logical-pool.target/0'>
        <host name='10.73.196.113' port='3260'/>
      </source>
      <backingStore/>
      <target dev='sdb' bus='scsi'/>
      <alias name='scsi0-0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
...


## virsh domblkthreshold vm1 sdb 200M
## virsh domstats vm1 --block
Domain: 'vm1'
  ...
  block.1.threshold=209715200

## virsh event --event block-threshold --loop

@Terminal 2:
## virsh console vm1
[root@localhost ~]# dd if=/dev/urandom of=/dev/sdb bs=1M count=300
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 17.5543 s, 17.9 MB/s

[root@localhost ~]# mkfs.ext4 /dev/sdb
mke2fs 1.42.9 (28-Dec-2013)
...
Writing superblocks and filesystem accounting information: done

[root@localhost ~]# mount /dev/sdb /mnt

[root@localhost ~]# dd if=/dev/urandom of=/mnt/1 bs=1M count=300
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 18.8775 s, 16.7 MB/s

[root@localhost ~]# sync

@Terminal 1:
## virsh event --event block-threshold --loop
<==== nothing happened here

Actual results:
block-threshold event not triggered when write data to iscsi backend img

Expected results:
block-threshold event triggered

Additional info:
This is separated from bz 1181659

Comment 2 Peter Krempa 2017-05-02 15:07:51 UTC
I'm currently getting "error: Operation not supported: threshold currently can't be set for block device 'sdb'" with the above configuration.

Could you please post the output of:

virsh qemu-monitor-command --pretty $VMNAME '{"execute" : "query-named-block-nodes"}'

if you are able to set the threshold?

Comment 3 yisun 2017-05-05 09:20:32 UTC
(In reply to Peter Krempa from comment #2)
> I'm currently getting "error: Operation not supported: threshold currently
> can't be set for block device 'sdb'" with the above configuration.
> 
> Could you please post the output of:
> 
> virsh qemu-monitor-command --pretty $VMNAME '{"execute" :
> "query-named-block-nodes"}'
> 
> if you are able to set the threshold?

the machine I used for test is maintained by beaker and not it's not available
I tried on my desktop machine and it also gives me the same error msg you get.
And even if I downgrade libvirt and qemu-kvm-rhev to:
libvirt-3.2.0-2.el7.x86_64
qemu-kvm-rhev-2.9.0-1.el7.x86_64

I am not quite sure what's other differences between the machines now :(

Comment 4 Peter Krempa 2017-05-05 09:53:45 UTC
Well, in that case the issue is similar to the LUKS issue, since the names differ and thus the algorithm to lookup the node names is not able to find this. It will be properly fixed once we provide node names manually.

Comment 5 Peter Krempa 2017-07-27 11:22:56 UTC
Upstream fixes this in the refactor of the node detection code:

commit 0175dc6ea024d4edd0f59571c3f5fa80d1ec1c0e
Author: Peter Krempa <pkrempa>
Date:   Wed Jul 26 09:36:21 2017 +0200

    qemu: block: Refactor node name detection code
    
    Remove the complex and unreliable code which inferred the node name
    hierarchy only from data returned by 'query-named-block-nodes'. It turns
    out that query-blockstats contain the full hierarchy of nodes as
    perceived by qemu so the inference code is not necessary.
    
    In query blockstats, the 'parent' object corresponds to the storage
    behind a storage volume and 'backing' corresponds to the lower level of
    backing chain. Since all have node names this data can be really easily
    used to detect node names.
    
    In addition to the code refactoring the one remaining test case needed
    to be fixed along.

Following test case shows that libvirt can detect the node name for the iSCSI volume:

Author: Peter Krempa <pkrempa>
Date:   Wed May 3 08:45:00 2017 +0200

    tests: qemumonitorjson: Test extraction of iSCSI device node names
    
    Test storage was created on a rhel/centos 7 node using targetcli.

Comment 8 yisun 2017-12-15 07:28:44 UTC
Try this on 2 machines but still failed, so marked as assigned for now
(why there is no "failed_qa" now...)

ON HOST:
## rpm -qa | egrep "libvirt-3|qemu-kvm-rhev"
libvirt-3.9.0-6.el7.x86_64
qemu-kvm-rhev-2.10.0-12.el7.x86_64

## virsh dumpxml avocado-vt-vm2
...
    <disk type='network' device='lun'>
      <driver name='qemu' type='raw'/>
      <source protocol='iscsi' name='iqn.2016-03.com.virttest:logical-pool.target/0'>
        <host name='10.66.5.64' port='3260'/>
      </source>
      <target dev='sdb' bus='scsi'/>
      <alias name='scsi0-0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
...

## virsh domblklist avocado-vt-vm2
Target     Source
------------------------------------------------
sda        /var/lib/libvirt/images/RHEL-6.9-x86_64-latest.qcow2
sdb        iqn.2016-03.com.virttest:logical-pool.target/0

## virsh domblkthreshold avocado-vt-vm2 sdb 200M

## virsh domstats avocado-vt-vm2 --block | grep thre
  block.1.threshold=209715200

 ## virsh event --event block-threshold --loop


IN GUEST:
[root@localhost ~]# lsblk
NAME                        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                           8:0    0   10G  0 disk 
├─sda1                        8:1    0  500M  0 part /boot
└─sda2                        8:2    0  9.5G  0 part 
  ├─VolGroup-lv_root (dm-0) 253:0    0  8.5G  0 lvm  /
  └─VolGroup-lv_swap (dm-1) 253:1    0    1G  0 lvm  [SWAP]
sdb                           8:16   0 1000M  0 disk 
[root@localhost ~]# dd if=/dev/urandom of=/dev/sdb bs=1M count=300
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 27.4868 s, 11.4 MB/s
[root@localhost ~]# sync

ON HOST:
nothing happens
## virsh event --event block-threshold --loop
^Cevent loop interrupted
events received: 0

Comment 9 Peter Krempa 2017-12-19 14:00:56 UTC
Could you please check that the libvirtd debug log does not contain BLOCK_WRITE_THRESHOLD message from qemu. Also please run the following command:

virsh qemu-monitor-command avocado-vt-vm2 '{"execute":"query-named-block-nodes"}'

Comment 10 lijuan men 2017-12-20 07:12:51 UTC
(In reply to Peter Krempa from comment #9)
> Could you please check that the libvirtd debug log does not contain
> BLOCK_WRITE_THRESHOLD message from qemu. Also please run the following
> command:
> 
> virsh qemu-monitor-command avocado-vt-vm2
> '{"execute":"query-named-block-nodes"}'

I help yisun reply the comment:

version:
libvirt-3.9.0-6.el7.x86_64
qemu-kvm-rhev-2.10.0-13.el7.x86_64


I tried the similar steps:

[root@lmen1 ~]# virsh domblklist test
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/test.s11
sdb        iqn.2016-03.com.virttest:emulated-iscsi.target/0

[root@lmen1 ~]# virsh domblkthreshold test sdb 200M


[root@lmen1 ~]# virsh domstats test --block | grep thre
  block.1.threshold=209715200

[root@lmen1 ~]# virsh event --event block-threshold --loop

in the guest:
[root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300

ON HOST:
nothing happens


After testing the above steps,
1)I checked the libvirtd log with keyword "BLOCK_WRITE_THRESHOLD"
  in the log,there is not the keyword

2)I run the command :
[root@lmen1 ~]# virsh qemu-monitor-command test '{"execute":"query-named-block-nodes"}'

the result of the command has been uploaded as the an attachment named "result from 'virsh qemu-monitor-command'"

Comment 11 lijuan men 2017-12-20 07:13:45 UTC
Created attachment 1370314 [details]
result from 'virsh qemu-monitor-command'

Comment 12 Peter Krempa 2017-12-20 10:33:47 UTC
Thank you very much for providing the requested data. It looks like qemu does not deliver the event for the iSCSI driver:

        {
            "iops_rd": 0,
            "detect_zeroes": "off",
            "image": {
                "virtual-size": 1048576000,
                "filename": "json:{\"lun\": \"0\", \"portal\": \"10.66.70.107:3260\", \"driver\": \"iscsi\", \"transport\": \"tcp\", \"target\": \"iqn.2016-03.com.virttest:emulated-iscsi.target\"}",
                "format": "iscsi",
                "dirty-flag": false
            },
            "iops_wr": 0,
            "ro": false,
            "node-name": "#block1208",
            "backing_file_depth": 0,
            "drv": "iscsi",
            "iops": 0,
            "bps_wr": 0,
            "write_threshold": 209715200,
            "encrypted": false,
            "bps": 0,
            "bps_rd": 0,
            "cache": {
                "no-flush": false,
                "direct": false,
                "writeback": true
            },
            "file": "json:{\"lun\": \"0\", \"portal\": \"10.66.70.107:3260\", \"driver\": \"iscsi\", \"transport\": \"tcp\", \"target\": \"iqn.2016-03.com.virttest:emulated-iscsi.target\"}",
            "encryption_key_missing": false
        },

The iSCSI driver backend still reports the write threshold set to the 200MiB mark despite writing 300MiB from the beginning of the device. Since the logs did not contain any mention of BLOCK_WRITE_THRESHOLD it looks like qemu did not ever deliver the event.

I'm moving this to qemu for further investigation.

Comment 14 Stefan Hajnoczi 2017-12-20 16:35:58 UTC
(In reply to lijuan men from comment #10)
> (In reply to Peter Krempa from comment #9)
> in the guest:
> [root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300

This command may not send 300 MB of writes to the disk, they will stay in the guest page cache for some time.  Please add oflag=direct so the writes are guaranteed to be sent to the disk.

Comment 15 lijuan men 2017-12-21 05:19:39 UTC
(In reply to Stefan Hajnoczi from comment #14)
> (In reply to lijuan men from comment #10)
> > (In reply to Peter Krempa from comment #9)
> > in the guest:
> > [root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300
> 
> This command may not send 300 MB of writes to the disk, they will stay in
> the guest page cache for some time.  Please add oflag=direct so the writes
> are guaranteed to be sent to the disk.

thanks for pointing out my mistake

I used the command to test again:
# dd if=/dev/urandom of=/dev/sda bs=1M count=300 oflag=direct

1)no output from command "# virsh event --event block-threshold --loop"

2)there is no keyword "BLOCK_WRITE_THRESHOLD" in libvirtd log

3)run command:
[root@lmen1 ~]# virsh qemu-monitor-command test '{"execute":"query-named-block-nodes"}'

I will uploaded the new result for your reference

Comment 16 lijuan men 2017-12-21 05:21:29 UTC
Created attachment 1370721 [details]
new result from 'virsh qemu-monitor-command'

Comment 17 Stefan Hajnoczi 2017-12-21 16:14:24 UTC
(In reply to lijuan men from comment #15)
> (In reply to Stefan Hajnoczi from comment #14)
> > (In reply to lijuan men from comment #10)
> > > (In reply to Peter Krempa from comment #9)
> > > in the guest:
> > > [root@localhost ~]# dd if=/dev/urandom of=/dev/sda bs=1M count=300
> > 
> > This command may not send 300 MB of writes to the disk, they will stay in
> > the guest page cache for some time.  Please add oflag=direct so the writes
> > are guaranteed to be sent to the disk.
> 
> thanks for pointing out my mistake
> 
> I used the command to test again:
> # dd if=/dev/urandom of=/dev/sda bs=1M count=300 oflag=direct
> 
> 1)no output from command "# virsh event --event block-threshold --loop"
> 
> 2)there is no keyword "BLOCK_WRITE_THRESHOLD" in libvirtd log
> 
> 3)run command:
> [root@lmen1 ~]# virsh qemu-monitor-command test
> '{"execute":"query-named-block-nodes"}'
> 
> I will uploaded the new result for your reference

Thanks for the info.

It turns out the problem is that the disk was configured using "<disk type='network' device='lun'>".

Here is what the libvirt documentation says about device='lun':

  Using "lun" (since 0.9.10) is only valid when the type is "block"
  or "network" for protocol='iscsi' or when the type is "volume" when
  using an iSCSI source pool for mode "host" or as an NPIV virtual
  Host Bus Adapter (vHBA) using a Fibre Channel storage pool.
  Configured in this manner, the LUN behaves identically to "disk",
  except that generic SCSI commands from the guest are accepted and
  passed through to the physical device. Also note that device='lun'
  will only be recognized for actual raw devices, but never for
  individual partitions or LVM partitions (in those cases, the kernel
  will reject the generic SCSI commands, making it identical to
  device='disk').

This means I/O requests are passed through to iSCSI.  QEMU does not interpret them and cannot offer block layer features like write threshold or throttling.

You can verify this by looking at the QEMU command-line.  It contains -device scsi-block instead of -device scsi-hd.

This is expected behavior although libvirt may wish to document it more clearly.

Comment 18 Peter Krempa 2018-01-03 12:58:53 UTC
In such case qemu should report an error when a user or libvirt attempts to set the 'write_threshold' property for a device which does not support it. Libvirt obviously could report the error here but this would be a workaround for this and if qemu changes behavior we would need to fix it.

Reassigning back to qemu.

Comment 19 Peter Krempa 2018-01-04 07:02:44 UTC
I've also tweaked the summary to reflect the actual issue.

Comment 20 Stefan Hajnoczi 2018-01-04 10:44:26 UTC
(In reply to Peter Krempa from comment #18)
> In such case qemu should report an error when a user or libvirt attempts to
> set the 'write_threshold' property for a device which does not support it.
> Libvirt obviously could report the error here but this would be a workaround
> for this and if qemu changes behavior we would need to fix it.
> 
> Reassigning back to qemu.

I'm not sure if QEMU should prevent setting write_threshold.  The write_threshold is a block driver graph node property.  A node can have multiple users, like the built-in NBD server, in addition to device models like virtio-scsi-pci or virtio-blk.

The write_threshold works for the NBD server even though it does not work for scsi-block on the same block driver graph node.  Depending on what the user is trying to achieve, this may be okay and shouldn't be prevented.

Additionally, it's up to the device model (virtio-scsi-pci/virtio-blk) whether write_threshold is bypassed.  Some device models make this decision on a per-request basis at runtime: some I/O requests go through the write_threshold check while others do not, depending on guest activity.  virtio_blk's SCSI passthrough command is an example of this.

So it's not really possible to say whether all I/O requests will go through the write_threshold check.  There would be false positives.

Kevin: What do you think about this?

Comment 21 Peter Krempa 2018-01-04 12:45:07 UTC
(In reply to Stefan Hajnoczi from comment #20)
> (In reply to Peter Krempa from comment #18)
> > In such case qemu should report an error when a user or libvirt attempts to
> > set the 'write_threshold' property for a device which does not support it.
> > Libvirt obviously could report the error here but this would be a workaround
> > for this and if qemu changes behavior we would need to fix it.
> > 
> > Reassigning back to qemu.
> 
> I'm not sure if QEMU should prevent setting write_threshold.  The
> write_threshold is a block driver graph node property.  A node can have
> multiple users, like the built-in NBD server, in addition to device models
> like virtio-scsi-pci or virtio-blk.
> 
> The write_threshold works for the NBD server even though it does not work
> for scsi-block on the same block driver graph node.  Depending on what the
> user is trying to achieve, this may be okay and shouldn't be prevented.
> 
> Additionally, it's up to the device model (virtio-scsi-pci/virtio-blk)
> whether write_threshold is bypassed.  Some device models make this decision
> on a per-request basis at runtime: some I/O requests go through the
> write_threshold check while others do not, depending on guest activity. 
> virtio_blk's SCSI passthrough command is an example of this.

I'd say that this is in such case even more of a reason to do it in qemu since libvirt cannot keep up in knowing when this works. If qemu is not going to do anything about that, then please close it as "wontfix" since clearly rejecting it with 'LUN' disks in libvirt is wrong since it may work in some cases.

Comment 22 Kevin Wolf 2018-01-08 12:33:31 UTC
Before scsi-block came up, things were easy: SCSI devices were shoehorned into block devices without really supporting any block layer features besides sending ioctls to the device. We would simply disable everything for bs->sg = 1.

With scsi-block, we have devices that don't have bs->sg = 1, but still bypass the block layer functionality by sending ioctls. They don't consistently bypass the block layer though, but issue normal I/O requests for a few SCSI commands, like the usual READ and WRITE variants.

If we can find out which SCSI command it is that writes to the image with an ioctl rather than going through the block layer, we may be able to fix this specific case.

But I'm having a hard time imagining a full solution that keeps the hybrid nature of scsi-block, doesn't forbid more than necessary and still doesn't result in suprising behaviour. This doesn't only affect things like write threshold, but also block jobs etc. may not be seeing writes and therefore produce corrupted target images.

By the way, if we continue our efforts to make things like block jobs and write threshold use their own block driver nodes, we will get very visible failure because these filter block drivers don't usually forward .bdrv_aio_ioctl() (and they shouldn't, because see above, filters + ioctl = corruption).

Comment 23 Stefan Hajnoczi 2018-01-08 16:17:34 UTC
SCSI passthrough does not support QEMU block layer features like the write threshold event.  It's not feasible to implement support due to the nature of passthrough or to refuse setting the write_threshold property.

Users must be aware that SCSI passthrough does not combine with block layer features.


Note You need to log in before you can comment on or make changes to this bug.