Description of problem: https://lore.kernel.org/linux-block/20190312013245.GD28841@ming.t460p/T/#t It is found that ext4 is easily corrupted when running the latest linus tree(v5.0+), especially after commit 6e02318eaea53eaafe6 ("nvme: add support for the Write Zeroes command"). Turns out it is one long-term issue in nvme emulation. Version-Release number of selected component (if applicable): All QEMU versions. How reproducible: 100%
Patch has been posted in QEMU upstream list: https://www.mail-archive.com/qemu-devel@nongnu.org/msg603903.html
It looks like not fixed in qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1 root@ibm-x3650m4-06 ~ # uname -a Linux ibm-x3650m4-06.lab.eng.pek2.redhat.com 4.18.0-129.el8.x86_64 #1 SMP Wed Aug 7 15:14:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux root@ibm-x3650m4-06 ~ # /usr/libexec/qemu-kvm -version QEMU emulator version 4.0.94 (qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1) Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers root@ibm-x3650m4-06 ~ # /usr/libexec/qemu-kvm -device help|grep nvme root@ibm-x3650m4-06 ~ # /usr/libexec/qemu-kvm -device nvme qemu-kvm: -device nvme: 'nvme' is not a valid device model name root@ibm-x3650m4-06 ~ # rpm -qa|grep qemu qemu-kvm-block-iscsi-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-guest-agent-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-iscsi-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 ipxe-roms-qemu-20181214-1.git133f4c47.el8.noarch libvirt-daemon-driver-qemu-4.5.0-31.module+el8.1.0+3808+3325c1a3.x86_64 qemu-kvm-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-core-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-core-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-common-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-gluster-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-img-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-ssh-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-guest-agent-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-curl-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-rbd-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-tests-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-tests-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-gluster-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-debugsource-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-img-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-rbd-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-ssh-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-block-curl-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 qemu-kvm-common-debuginfo-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64
$ git branch --contains 9d6459d21a6e630264ead21558d940366d2f2450 rhel-av-8.1.0/master-4.1.0 The mentioned commit is there. Can you take a look Maxim?
Sure. The bug kind of transformed from one thing to completely different one. Initially it was about corruption caused by write zeros on qemu virtual nvme drive, which was indeed long ago fixed upstream in not one but two ways (both fix in qemu, and blacklist in the kernel to not even try write zeros on this nvme device) However we discovered that redhat does'n even ship the virtual nvme drive (it is disabled on build) so that bug was changed to first enable it in the build. (this btw might need some approval from upper management, since the virtual nvme drive is not ready IMHO for production,\ by a long shot, plus there is no urgent use case for it) Note that we are talking about qemu virtual nvme drive, which implements a fully emulated nvme drive that guest can use. This in theory can be used to avoid virtio drivers in the guest, and/or whatever uses a standard nvme drive has in a VM. This is not the same as the qemu userspace nvme driver (written by Fam). That driver binds on _host_ to a real nvme drive, and exposes it to the qemu as a drive, just like a qcow2/raw/whatever driver would, so guest could see that drive as anything, for example as a virtio-block device, or even a a sata disk... That driver I think is included in RHEL as a technological preview and its value is that it is a bit faster that going through the kernel and in long term when qemu grows whatever storage daemon is decided upon, it might even be useful (currently it more or less equivalent to pci assignment but slower)
(In reply to Maxim Levitsky from comment #10) > Sure. The bug kind of transformed from one thing to completely different one. > > Initially it was about corruption caused by write zeros on qemu virtual nvme > drive, which was indeed long ago > fixed upstream in not one but two ways (both fix in qemu, and blacklist in > the kernel to not even try write zeros on this nvme device) > > However we discovered that redhat does'n even ship the virtual nvme drive > (it is disabled on build) so that bug was changed to first enable it > in the build. > > (this btw might need some approval from upper management, since the virtual > nvme drive is not ready IMHO for production,\ > by a long shot, plus there is no urgent use case for it) > > Note that we are talking about qemu virtual nvme drive, which implements a > fully emulated nvme drive that guest can use. > This in theory can be used to avoid virtio drivers in the guest, and/or > whatever uses a standard nvme drive has in a VM. > > This is not the same as the qemu userspace nvme driver (written by Fam). > That driver binds on _host_ to a real nvme drive, > and exposes it to the qemu as a drive, just like a qcow2/raw/whatever driver > would, so guest could see that drive as anything, > for example as a virtio-block device, or even a a sata disk... > That driver I think is included in RHEL as a technological preview and its > value is that it is a bit faster that going > through the kernel and in long term when qemu grows whatever storage daemon > is decided upon, it might even be useful (currently > it more or less equivalent to pci assignment but slower) Thanks for the clarification. I don't think we need to enable the NVME emulation in downstream yet. My suggestion would be: - Keep improving the NVME VFIO driver (the one written by Fam). In downstream it's already enabled, but considered Tech Preview. - Improve the emulated NVME driver upstream (the one referenced here), but keep it disabled in downstream until we feel it's mature enough. Stefan: what do you think?
(In reply to Ademar Reis from comment #11) > (In reply to Maxim Levitsky from comment #10) > Thanks for the clarification. I don't think we need to enable the NVME > emulation in downstream yet. > > My suggestion would be: > > - Keep improving the NVME VFIO driver (the one written by Fam). In > downstream it's already enabled, but considered Tech Preview. Yes. The NVMe VFIO QEMU block driver is unrelated to this bz but improving it is a long-term goal that will help QEMU offer low-latency I/O on physical NVMe drives. > - Improve the emulated NVME driver upstream (the one referenced here), but > keep it disabled in downstream until we feel it's mature enough. Sounds good. QEMU's emulated NVMe storage controller is not suitable for downstream, let's keep it disabled.
Looks like we have consensus, so I'm closing this BZ. It's OK to make progress upstream, but there are no plans to enable it in downstream for now. And the NVME VFIO driver is a higher priority, even though it covers a different use-case.
Thanks Maxim, Ademar and Stefan for the clarification.