We are working on packaging kata containers for Fedora. kata containers is an OCI container runtime that inserts a qemu/kvm VM inbetween your host and your docker/podman container. kata wants the KVM VM appliance to be as small as possible, so it doesn't take up much memory and it starts up quickly. There's two methods of booting the VM appliance filesystem: stuff it in an initrd, or pass it to the VM through nvdimm. The latter piece won't work for Fedora though unless the necessary drivers are built into the kernel. Here's the minimal kernel config diff I could distill to make it work, against the f30 kernel. Is this an acceptable addition? FWIW I haven't really consulted with other virt folks yet to see if we even want to go this direction, but I just got it all working and figured I'd float the idea early. Thanks! --- .config 2019-09-09 18:59:33.354576361 -0400 +++ minconfig 2019-09-09 19:01:02.004509091 -0400 @@ -531,7 +531,7 @@ CONFIG_ACPI_SBS=m CONFIG_ACPI_HED=y CONFIG_ACPI_CUSTOM_METHOD=m CONFIG_ACPI_BGRT=y -CONFIG_ACPI_NFIT=m +CONFIG_ACPI_NFIT=y # CONFIG_NFIT_SECURITY_DEBUG is not set CONFIG_ACPI_HMAT=y CONFIG_HAVE_ACPI_APEI=y @@ -7917,20 +7917,20 @@ CONFIG_THUNDERBOLT=m # CONFIG_ANDROID is not set # end of Android -CONFIG_LIBNVDIMM=m -CONFIG_BLK_DEV_PMEM=m -CONFIG_ND_BLK=m +CONFIG_LIBNVDIMM=y +CONFIG_BLK_DEV_PMEM=y +CONFIG_ND_BLK=y CONFIG_ND_CLAIM=y -CONFIG_ND_BTT=m +CONFIG_ND_BTT=y CONFIG_BTT=y -CONFIG_ND_PFN=m +CONFIG_ND_PFN=y CONFIG_NVDIMM_PFN=y CONFIG_NVDIMM_DAX=y CONFIG_NVDIMM_KEYS=y CONFIG_DAX_DRIVER=y CONFIG_DAX=y -CONFIG_DEV_DAX=m -CONFIG_DEV_DAX_PMEM=m +CONFIG_DEV_DAX=y +CONFIG_DEV_DAX_PMEM=y CONFIG_DEV_DAX_KMEM=m # CONFIG_DEV_DAX_PMEM_COMPAT is not set CONFIG_NVMEM=y
I'm not opposed to this change. I think this is a reasonable argument for why we want to have the drivers built in. We can leave this bug open until you make a final decision.
Thanks Laura. CCing some kata+virt folks. stefanha, dgilbert, any comments here? I'm still not that familiar with kata so I'd appreciate a second set of eyes
I think supporting boot from NVDIMM in Fedora is reasonable for the Kata Containers use case.
Thanks Stefan. Laura if you could add this to f31+ that would be great. f30 as well if it's not an issue
(In reply to Cole Robinson from comment #0) > kata wants the KVM VM appliance to be as small as possible, so it doesn't > take up much memory and it starts up quickly. There's two methods of booting > the VM appliance filesystem: stuff it in an initrd, or pass it to the VM > through nvdimm. The latter piece won't work for Fedora though unless the Looks reasonable to me. How would you configure that, though? Is that something you plan to add to kata-osbuilder?
It's worth checking if the vhost-user stuff plays well with nvdimm.
(In reply to Christophe de Dinechin from comment #5) > (In reply to Cole Robinson from comment #0) > > kata wants the KVM VM appliance to be as small as possible, so it doesn't > > take up much memory and it starts up quickly. There's two methods of booting > > the VM appliance filesystem: stuff it in an initrd, or pass it to the VM > > through nvdimm. The latter piece won't work for Fedora though unless the > > Looks reasonable to me. How would you configure that, though? Is that > something > you plan to add to kata-osbuilder? nvdimm is what kata uses for the 'image' boot option in configuration-qemu.toml. Current kata-osbuilder in Fedora 31+ will generate and image and an initrd (In reply to Dr. David Alan Gilbert from comment #6) > It's worth checking if the vhost-user stuff plays well with nvdimm. I haven't tried it. I figure enabling the kernel modules is good to do regardless, since it will give us the opportunity to actually test these things with the stock kernel. If nvdimm doesn't play well with vhost-user we can stick with initrd as the default appliance boot method (IMO we should do that anyways as a first packaging pass)
Turns out CONFIG_NVDIMM=m was an explicit request from intel (for non-kata reasons) back in April, see https://bugzilla.redhat.com/show_bug.cgi?id=1696481 I've asked for more info there
We are choosing the ignore the image= option in kata for the forseeable future so we don't need this change anymore. We will reopen in the future if it becomes relevant again