Bug 1918914

Summary: qemu's NVMe blockdev backend doesn't support cache setting
Product: Red Hat Enterprise Linux 9 Reporter: Philippe Mathieu-Daudé <philmd>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED WONTFIX QA Contact: Han Han <hhan>
Severity: unspecified Docs Contact: Jiri Herrmann <jherrman>
Priority: medium    
Version: unspecifiedCC: areis, coli, jinzhao, juzhang, lmen, mprivozn, mrezanin, philmd, pkrempa, virt-maint, xuwei, xuzhang, yama, yanghliu
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Technology Preview
Doc Text:
Story Points: ---
Clone Of: 1900136 Environment:
Last Closed: 2021-12-07 14:35:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Philippe Mathieu-Daudé 2021-01-21 17:04:07 UTC
Using in the domain XML:

  <iothreads>1</iothreads>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/rhel84.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <disk type='nvme' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native' iothread='1'/>
      <source type='pci' managed='no' namespace='1'>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </disk>

libvirt reports:

Error starting domain: internal error: process exited while connecting to monitor: 2021-01-21T11:19:49.824211Z qemu-kvm: -blockdev {"driver":"nvme","device":"0000:04:00.0","namespace":1,"node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: NVMe controller doesn't support write cache configuration

cache='none' and io='native' should be invalid options for the 'nvme' disk type.

Comment 1 Peter Krempa 2021-01-21 17:32:17 UTC
In addition the blockdev props formatter shouldn't even attempt to format the chache (we use a shortcut to add it to all backends since all until now seemed to support it). That way if the nvme device is somewhere in the backing chain where it will skip formatting the cache props too, since we propagate disk cache settings to all layers of the backing chain (explicit cache config is not yet implemented).

Comment 3 Michal Privoznik 2021-02-11 10:55:27 UTC
Is really cache not supported? Looking into nvme_file_open() I see the following code:


    if (flags & BDRV_O_NOCACHE) {
        if (!s->write_cache_supported) {
            error_setg(errp,
                       "NVMe controller doesn't support write cache configuration");
            ret = -EINVAL;
        } else {
            ret = nvme_enable_disable_write_cache(bs, !(flags & BDRV_O_NOCACHE),
                                                  errp);
        }
        if (ret) {
            goto fail;
        }
    }


So error is reported only if s->write_cache_supported is not set. Looking further it looks like nvme_identify() sets write_cache_supported if underlying NVMe device supports it (based on what NVME_ADM_CMD_IDENTIFY returns).

Comment 4 John Ferlan 2021-09-09 16:01:35 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 5 Peter Krempa 2021-12-07 14:35:11 UTC
Based on the fact that not all NVMe drives reject cache it seems to be a valid configuration.

Unfortunately it doesn't seem simple enough for libvirt to probe the NVMe device to figure out whether cache is supported and thus it's not feasible for us to reimplement the check.

A similar situation is also with the 'io=' setting. Libvirt's XML has only one attribute per disk, but the setting is really per-image, thus libvirt propagates the value for every layer in the backing chain and storage protocol nodes which don't support that setting ignore it. There are few possible scenarios where we could forbid the setting but again it's not really simple and the benefit is questionable.

For now I'm closing this as I don't see a reasonable way forward.