Bug 1687633 - Support NVMe device (emulation) in QEMU
Summary: Support NVMe device (emulation) in QEMU
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Maxim Levitsky
QA Contact: qing.wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-12 01:39 UTC by Ming Lei
Modified: 2020-01-20 10:06 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-20 16:59:19 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ming Lei 2019-03-12 01:39:50 UTC
Description of problem:

https://lore.kernel.org/linux-block/20190312013245.GD28841@ming.t460p/T/#t

It is found that ext4 is easily corrupted when running the latest linus
tree(v5.0+), especially after commit 6e02318eaea53eaafe6 ("nvme: add support
for the Write Zeroes command").

Turns out it is one long-term issue in nvme emulation.


Version-Release number of selected component (if applicable):


All QEMU versions.

How reproducible:

100%

Comment 1 Ming Lei 2019-03-12 01:47:22 UTC
Patch has been posted in QEMU upstream list:

https://www.mail-archive.com/qemu-devel@nongnu.org/msg603903.html

Comment 8 qing.wang 2019-08-16 07:20:12 UTC
It looks like not fixed in qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1

root@ibm-x3650m4-06 ~ # uname -a
Linux ibm-x3650m4-06.lab.eng.pek2.redhat.com 4.18.0-129.el8.x86_64 #1 SMP Wed Aug 7 15:14:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

root@ibm-x3650m4-06 ~ # /usr/libexec/qemu-kvm -version
QEMU emulator version 4.0.94 (qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

root@ibm-x3650m4-06 ~ # /usr/libexec/qemu-kvm -device help|grep nvme

root@ibm-x3650m4-06 ~ # /usr/libexec/qemu-kvm -device nvme
qemu-kvm: -device nvme: 'nvme' is not a valid device model name


root@ibm-x3650m4-06 ~ # rpm -qa|grep qemu
qemu-kvm-block-iscsi-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-guest-agent-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-iscsi-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
ipxe-roms-qemu-20181214-1.git133f4c47.el8.noarch
libvirt-daemon-driver-qemu-4.5.0-31.module+el8.1.0+3808+3325c1a3.x86_64
qemu-kvm-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-core-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-core-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-common-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-gluster-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-img-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-ssh-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-guest-agent-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-curl-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-rbd-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-tests-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-tests-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-gluster-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-debugsource-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-img-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-rbd-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-ssh-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-block-curl-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
qemu-kvm-common-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64

Comment 9 Danilo de Paula 2019-08-20 00:03:19 UTC
$ git branch --contains 9d6459d21a6e630264ead21558d940366d2f2450
  rhel-av-8.1.0/master-4.1.0

The mentioned commit is there.
Can you take a look Maxim?

Comment 10 Maxim Levitsky 2019-08-20 08:32:03 UTC
Sure. The bug kind of transformed from one thing to completely different one.

Initially it was about corruption caused by write zeros on qemu virtual nvme drive, which was indeed long ago
fixed upstream in not one but two ways (both fix in qemu, and blacklist in the kernel to not even try write zeros on this nvme device)

However we discovered that redhat does'n even ship the virtual nvme drive (it is disabled on build) so that bug was changed to first enable it
in the build. 

(this btw might need some approval from upper management, since the virtual nvme drive is not ready IMHO for production,\
by a long shot, plus there is no urgent use case for it)

Note that we are talking about qemu virtual nvme drive, which implements a fully emulated nvme drive that guest can use.
This in theory can be used to avoid virtio drivers in the guest, and/or whatever uses a standard nvme drive has in a VM.

This is not the same as the qemu userspace nvme driver (written by Fam). That driver binds on _host_ to a real nvme drive,
and exposes it to the qemu as a drive, just like a qcow2/raw/whatever driver would, so guest could see that drive as anything,
for example as a virtio-block device, or even a a sata disk...
That driver I think is included in RHEL as a technological preview and its value is that it is a bit faster that going
through the kernel and in long term when qemu grows whatever storage daemon is decided upon, it might even be useful (currently
it more or less equivalent to pci assignment but slower)

Comment 11 Ademar Reis 2019-08-20 13:42:50 UTC
(In reply to Maxim Levitsky from comment #10)
> Sure. The bug kind of transformed from one thing to completely different one.
> 
> Initially it was about corruption caused by write zeros on qemu virtual nvme
> drive, which was indeed long ago
> fixed upstream in not one but two ways (both fix in qemu, and blacklist in
> the kernel to not even try write zeros on this nvme device)
> 
> However we discovered that redhat does'n even ship the virtual nvme drive
> (it is disabled on build) so that bug was changed to first enable it
> in the build. 
> 
> (this btw might need some approval from upper management, since the virtual
> nvme drive is not ready IMHO for production,\
> by a long shot, plus there is no urgent use case for it)
> 
> Note that we are talking about qemu virtual nvme drive, which implements a
> fully emulated nvme drive that guest can use.
> This in theory can be used to avoid virtio drivers in the guest, and/or
> whatever uses a standard nvme drive has in a VM.
> 
> This is not the same as the qemu userspace nvme driver (written by Fam).
> That driver binds on _host_ to a real nvme drive,
> and exposes it to the qemu as a drive, just like a qcow2/raw/whatever driver
> would, so guest could see that drive as anything,
> for example as a virtio-block device, or even a a sata disk...
> That driver I think is included in RHEL as a technological preview and its
> value is that it is a bit faster that going
> through the kernel and in long term when qemu grows whatever storage daemon
> is decided upon, it might even be useful (currently
> it more or less equivalent to pci assignment but slower)

Thanks for the clarification. I don't think we need to enable the NVME emulation in downstream yet.

My suggestion would be:

 - Keep improving the NVME VFIO driver (the one written by Fam). In downstream it's already enabled, but considered Tech Preview.
 - Improve the emulated NVME driver upstream (the one referenced here), but keep it disabled in downstream until we feel it's mature enough.

Stefan: what do you think?

Comment 12 Stefan Hajnoczi 2019-08-20 15:45:21 UTC
(In reply to Ademar Reis from comment #11)
> (In reply to Maxim Levitsky from comment #10)
> Thanks for the clarification. I don't think we need to enable the NVME
> emulation in downstream yet.
> 
> My suggestion would be:
> 
>  - Keep improving the NVME VFIO driver (the one written by Fam). In
> downstream it's already enabled, but considered Tech Preview.

Yes.  The NVMe VFIO QEMU block driver is unrelated to this bz but improving it is a long-term goal that will help QEMU offer low-latency I/O on physical NVMe drives.

>  - Improve the emulated NVME driver upstream (the one referenced here), but
> keep it disabled in downstream until we feel it's mature enough.

Sounds good.  QEMU's emulated NVMe storage controller is not suitable for downstream, let's keep it disabled.

Comment 13 Ademar Reis 2019-08-20 16:59:19 UTC
Looks like we have consensus, so I'm closing this BZ. It's OK to make progress upstream, but there are no plans to enable it in downstream for now. And the NVME VFIO driver is a higher priority, even though it covers a different use-case.

Comment 14 CongLi 2019-08-21 02:38:33 UTC
Thanks Maxim, Ademar and Stefan for the clarification.


Note You need to log in before you can comment on or make changes to this bug.