2014484 – [RHEL9] Enable virtio-mem as tech-preview on x86-64 - QEMU

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2014484 - [RHEL9] Enable virtio-mem as tech-preview on x86-64 - QEMU

Summary: [RHEL9] Enable virtio-mem as tech-preview on x86-64 - QEMU

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	9.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	David Hildenbrand
QA Contact:	Mario Casquero
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2014457 2014487 2047797
TreeView+	depends on / blocked

Reported:	2021-10-15 11:36 UTC by David Hildenbrand
Modified:	2023-05-10 07:08 UTC (History)
CC List:	18 users (show)
Fixed In Version:	qemu-kvm-6.2.0-2.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-17 12:25:10 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/centos-stream/src qemu-kvm merge_requests 56	None	None	None	2021-12-17 15:05:20 UTC
Red Hat Issue Tracker	RHELPLAN-99973	None	None	None	2021-10-15 11:38:59 UTC
Red Hat Product Errata	RHBA-2022:2307	None	None	None	2022-05-17 12:25:57 UTC

Description David Hildenbrand 2021-10-15 11:36:31 UTC

We want to enable virtio-mem as tech-preview in RHEL9.0 on x86-64.

On the QEMU side, most patches are already upstream in QEMU v6.1. What's still missing is:

[PATCH v1 0/9] migration/ram: Optimize for virtio-mem via RamDiscardManager
-> https://lkml.kernel.org/r/20211011175346.15499-1-david@redhat.com

[PATCH v1] virtio-mem: Don't skip alignment checks when warning about block size
-> https://lkml.kernel.org/r/20211011173305.13778-1-david@redhat.com

Both parts are expected to go into 6.2.

As a stretch goal, it would be great to also have:
[PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots
-> https://lore.kernel.org/r/20211013103330.26869-1-david@redhat.com
but it might still require discussions before it goes upstream, in which case we'll pull it in later.

Comment 1 David Hildenbrand 2021-11-22 12:02:38 UTC

Now upstream is:

[PATCH v1 0/9] migration/ram: Optimize for virtio-mem via RamDiscardManager
-> https://lkml.kernel.org/r/20211011175346.15499-1-david@redhat.com

We're still missing a minor fix:

[PATCH v1] virtio-mem: Don't skip alignment checks when warning about block size
-> https://lkml.kernel.org/r/20211011173305.13778-1-david@redhat.com

But we can consider the "feature" part of this BZ done.

Fixed in qemu-6.2 commit 6fee3a1fd9ecde99c43e659cf8eb6c35c116d05e

Comment 2 David Hildenbrand 2021-11-22 12:11:51 UTC

Setting needinfo to get a Internal Target Milestone (ITM).

@QE to test this feature we'll need the RHEL9 guest kernel support: the backport is still pending.

Comment 3 Yumei Huang 2021-11-22 13:43:39 UTC

(In reply to David Hildenbrand from comment #2)
> Setting needinfo to get a Internal Target Milestone (ITM).
> 
> @QE to test this feature we'll need the RHEL9 guest kernel support: the
> backport is still pending.

Do you have idea when the kernel will be ready? I can set a ITM accordingly. Thanks.

Comment 4 David Hildenbrand 2021-11-24 14:35:58 UTC

(In reply to Yumei Huang from comment #3)
> (In reply to David Hildenbrand from comment #2)
> > Setting needinfo to get a Internal Target Milestone (ITM).
> > 
> > @QE to test this feature we'll need the RHEL9 guest kernel support: the
> > backport is still pending.
> 
> Do you have idea when the kernel will be ready? I can set a ITM accordingly.
> Thanks.

I assume either Mid December or latest early/mid January. I'm preparing the MR right now.

Of course, I could provide custom (brew) kernels.

Thanks!

Comment 5 David Hildenbrand 2021-12-02 10:42:13 UTC

Hi Yumei,

the MR for the kernel part in bz2014492 has been created and artefacts are available.

I'm planning on crafting some basic QEMU-based test cases next week and will share them here.
Further, I'll extend the QEMU documentation located at https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html

Comment 6 Yumei Huang 2021-12-10 07:07:40 UTC

(In reply to David Hildenbrand from comment #5)
> Hi Yumei,
> 
> the MR for the kernel part in bz2014492 has been created and artefacts are
> available.
> 
> I'm planning on crafting some basic QEMU-based test cases next week and will
> share them here.
> Further, I'll extend the QEMU documentation located at
> https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html

Thanks for the doc. It's been a super busy week. I will check later. Thanks!

Comment 7 Yumei Huang 2021-12-17 04:55:04 UTC

Seems virtio-mem-pci is not supported by qemu-kvm-6.2.0-1.el9. 

# /usr/libexec/qemu-kvm --version
QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-1.el9)

# /usr/libexec/qemu-kvm -device virtio-mem-pci
qemu-kvm: -device virtio-mem-pci: 'virtio-mem-pci' is not a valid device model name


Hi David, would you please help check? Thanks.

Comment 8 David Hildenbrand 2021-12-17 09:15:40 UTC

(In reply to Yumei Huang from comment #7)
> Seems virtio-mem-pci is not supported by qemu-kvm-6.2.0-1.el9. 
> 
> # /usr/libexec/qemu-kvm --version
> QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-1.el9)
> 
> # /usr/libexec/qemu-kvm -device virtio-mem-pci
> qemu-kvm: -device virtio-mem-pci: 'virtio-mem-pci' is not a valid device
> model name
> 
> 
> Hi David, would you please help check? Thanks.

Oh, thanks! We have to enable the device in the downstream config! It should be as easy as:

diff --git a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
index 1f7a9ab024..dc03fbb671 100644
--- a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
+++ b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
@@ -88,6 +88,7 @@ CONFIG_VGA_CIRRUS=y
 CONFIG_VGA_PCI=y
 CONFIG_VHOST_USER=y
 CONFIG_VHOST_USER_BLK=y
+CONFIG_VIRTIO_MEM=y
 CONFIG_VIRTIO_PCI=y
 CONFIG_VIRTIO_VGA=y
 CONFIG_VMMOUSE=y


https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41983254

# /usr/libexec/qemu-kvm --version
QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-1.el9.dhildenb202112170957)
# /usr/libexec/qemu-kvm -device virtio-mem-pci
qemu-kvm: -device virtio-mem-pci: 'memdev' property must be set




I *assume* we can use this BZ for the purpose of enabling that flag.

@Miroslav, do we need a new BZ (and make this one depend on the new one?) or can I reuse this one for the gitlab MR?


----------------------------------

I want to provide some ideas for test cases -- unfortunately I got pulled into something urgent last week and have 2 weeks of PTO coming up.
So I won't really be able to provide a lot of test cases before early January :( Maybe I'll find some spare minutes today. (the doc/examples at https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html might be inspiring :) )Miros

Comment 9 Luiz Capitulino 2021-12-17 13:21:24 UTC

(In reply to David Hildenbrand from comment #8)
> (In reply to Yumei Huang from comment #7)
> > Seems virtio-mem-pci is not supported by qemu-kvm-6.2.0-1.el9. 
> > 
> > # /usr/libexec/qemu-kvm --version
> > QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-1.el9)
> > 
> > # /usr/libexec/qemu-kvm -device virtio-mem-pci
> > qemu-kvm: -device virtio-mem-pci: 'virtio-mem-pci' is not a valid device
> > model name
> > 
> > 
> > Hi David, would you please help check? Thanks.
> 
> Oh, thanks! We have to enable the device in the downstream config! It should
> be as easy as:
> 
> diff --git a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
> b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
> index 1f7a9ab024..dc03fbb671 100644
> --- a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
> +++ b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
> @@ -88,6 +88,7 @@ CONFIG_VGA_CIRRUS=y
>  CONFIG_VGA_PCI=y
>  CONFIG_VHOST_USER=y
>  CONFIG_VHOST_USER_BLK=y
> +CONFIG_VIRTIO_MEM=y
>  CONFIG_VIRTIO_PCI=y
>  CONFIG_VIRTIO_VGA=y
>  CONFIG_VMMOUSE=y
> 
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41983254
> 
> # /usr/libexec/qemu-kvm --version
> QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-1.el9.dhildenb202112170957)
> # /usr/libexec/qemu-kvm -device virtio-mem-pci
> qemu-kvm: -device virtio-mem-pci: 'memdev' property must be set
> 
> 
> 
> 
> I *assume* we can use this BZ for the purpose of enabling that flag.
> 
> @Miroslav, do we need a new BZ (and make this one depend on the new one?) or
> can I reuse this one for the gitlab MR?

Mirek is on PTO. Danilo, would you know who can help with this question?

> 
> 
> ----------------------------------
> 
> I want to provide some ideas for test cases -- unfortunately I got pulled
> into something urgent last week and have 2 weeks of PTO coming up.
> So I won't really be able to provide a lot of test cases before early
> January :( Maybe I'll find some spare minutes today. (the doc/examples at
> https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html might be
> inspiring :) )Miros

Yumei, feel free to defer the ITM for January.

Comment 13 Yanan Fu 2021-12-20 12:45:45 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 16 Yanan Fu 2021-12-27 02:09:59 UTC

(In reply to Yanan Fu from comment #13)
> QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test
> pass.

Reset it as bz be changed back to POST

Comment 18 Yanan Fu 2022-01-10 05:13:16 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 30 Yanghang Liu 2022-01-18 09:19:32 UTC

(In reply to David Hildenbrand from comment #24)

> Case 5. Test with vfio-pci + virtio-mem (no vIOMMU)

> a.Boot guest with vfio-pci device and virtio-mem device
> b. Check if the passthru device works well inside guest
> c. Resize virtio-mem device
> d. Repeat step b. 


> Case 6. Test with vfio-pci + virtio-mem + vIOMMU
> a. Boot guest with vfio-pci device and virtio-mem device and vIOMMU
> b. Check if the passthru device works well inside guest
> c. Resize virtio-mem device
> d. Repeat step b.


Hi David,

I have tested the Case 5 and Case 6 and updated my test result in the comment 27 and comment 28.
(related document: https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html)

Could you please help review the comment 27 and comment 28 and check if I miss anything or need to do more additional tests for verifying this bug ?

Comment 34 David Hildenbrand 2022-01-18 19:56:46 UTC

(In reply to Yanghang Liu from comment #30)
> (In reply to David Hildenbrand from comment #24)
> 
> > Case 5. Test with vfio-pci + virtio-mem (no vIOMMU)
> 
> > a.Boot guest with vfio-pci device and virtio-mem device
> > b. Check if the passthru device works well inside guest
> > c. Resize virtio-mem device
> > d. Repeat step b. 
> 
> 
> > Case 6. Test with vfio-pci + virtio-mem + vIOMMU
> > a. Boot guest with vfio-pci device and virtio-mem device and vIOMMU
> > b. Check if the passthru device works well inside guest
> > c. Resize virtio-mem device
> > d. Repeat step b.
> 
> 
> Hi David,
> 

Hi!

> I have tested the Case 5 and Case 6 and updated my test result in the
> comment 27 and comment 28.
> (related document:
> https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html)

You're also checking "info memory-devices" and "info numa" after each resize step, correct? If the size adjusts accordingly, this should be fine.

In an ideal world, we'd be stress-testing the pass-through device to see if our "hotplugged" memory behaves as expected, but the basic test should be good enough for now -- if the guest continues working as expected, fine.

> 
> Could you please help review the comment 27 and comment 28 and check if I
> miss anything or need to do more additional tests for verifying this bug ?

I think this testing is sufficient for tech-preview. We'll have to run additional tests once supporting huge pages, then we can also test what I proposed in comment 33.

Thanks!

Comment 37 Yanghang Liu 2022-01-19 09:33:21 UTC

(In reply to David Hildenbrand from comment #34)

> Hi!

> You're also checking "info memory-devices" and "info numa" after each resize step, correct? 
> If the size adjusts accordingly, this should be fine.

Hi David,

I have updated my related test info in comment 35 and comment 36 in case I missed something.

Here's a question I want to confirm with you：

After I changed the "requested-size" value several times and finally set it to 0, I found that the ”node 0 plugged“ value in "info numa" output is "4 MB" instead of "0 MB".
(The ”node 0 plugged“ value in "info numa" output has been equal to "4 MB" for a long time without any change)

Is this the expected result?

Comment 38 Yanghang Liu 2022-01-19 10:13:06 UTC

> In an ideal world, we'd be stress-testing the pass-through device to see if our "hotplugged" memory behaves as expected, but the basic test should be good enough for now 
> -- if the guest continues working as expected, fine.

Hi David,

I can't do any stress tests in Case 5 and Case 6 because my current QL41112 vfio-pf device seems can only work in pcie mode but our test scenario requires it to be plugged into a pci-pcie-bridge.

May I ask what do you think about the "vfio-pf/vfio-vf in pcie-root-port + virtio-mem device + stress" test scenario ？

Do I need to add this test scenario into my test case for testing this bug as well?

Comment 39 David Hildenbrand 2022-01-19 10:27:15 UTC

(In reply to Yanghang Liu from comment #37)
> (In reply to David Hildenbrand from comment #34)
> 
> > Hi!
> 
> > You're also checking "info memory-devices" and "info numa" after each resize step, correct? 
> > If the size adjusts accordingly, this should be fine.
> 
> Hi David,
> 
> I have updated my related test info in comment 35 and comment 36 in case I
> missed something.
> 
> Here's a question I want to confirm with you：
> 
> After I changed the "requested-size" value several times and finally set it
> to 0, I found that the ”node 0 plugged“ value in "info numa" output is "4
> MB" instead of "0 MB".
> (The ”node 0 plugged“ value in "info numa" output has been equal to "4 MB"
> for a long time without any change)
> 
> Is this the expected result?

In general, with "memhp_default_state=online_movable" we make memory hotunplug more likely to succeed. Unfortunately, there are corner cases where we can still fail to hotunplug all memory, that will similarly make CMA allocations fail. The VM will continue trying to unplug (e.g., after ~1min, then after ~2min, then after ~4min, ...) the remaining memory. In that scenario, can you

1) Wait a bit (e.g., 4 minutes) to see if it manages to unplug it?
2) Share the output of "cat /proc/zoneinfo" when it happens, just hat I can verify that "memhp_default_state=online_movable" is effective in your setup?

Again, it's not unusual, but with little system activity and "memhp_default_state=online_movable", it should usually succeed.

Can this be reproduced?

Comment 40 David Hildenbrand 2022-01-19 17:50:26 UTC

(In reply to Yanghang Liu from comment #38)
> > In an ideal world, we'd be stress-testing the pass-through device to see if our "hotplugged" memory behaves as expected, but the basic test should be good enough for now 
> > -- if the guest continues working as expected, fine.
> 
> Hi David,
> 
> I can't do any stress tests in Case 5 and Case 6 because my current QL41112
> vfio-pf device seems can only work in pcie mode but our test scenario
> requires it to be plugged into a pci-pcie-bridge.
> 

Interesting.

> May I ask what do you think about the "vfio-pf/vfio-vf in pcie-root-port +
> virtio-mem device + stress" test scenario ？

If there is an easy way to get it running that would be great. Do you have an idea on how to get it running or would I have to dig a bit? :)

> Do I need to add this test scenario into my test case for testing this bug
> as well?

It would be great if we could let the device some actual work, to see if the vfio mapping in the hypervisor is setup properly.

Comment 43 Yanghang Liu 2022-01-20 06:31:39 UTC

(In reply to David Hildenbrand from comment #39)

> > Here's a question I want to confirm with you：
> > 
> > After I changed the "requested-size" value several times and finally set it
> > to 0, I found that the ”node 0 plugged“ value in "info numa" output is "4
> > MB" instead of "0 MB".
> > (The ”node 0 plugged“ value in "info numa" output has been equal to "4 MB"
> > for a long time without any change)
> > 
> > Is this the expected result?
> 
> In general, with "memhp_default_state=online_movable" we make memory
> hotunplug more likely to succeed. Unfortunately, there are corner cases
> where we can still fail to hotunplug all memory, that will similarly make
> CMA allocations fail. The VM will continue trying to unplug (e.g., after
> ~1min, then after ~2min, then after ~4min, ...) the remaining memory. In
> that scenario, can you
> 
> 1) Wait a bit (e.g., 4 minutes) to see if it manages to unplug it?
> 2) Share the output of "cat /proc/zoneinfo" when it happens, just hat I can
> verify that "memhp_default_state=online_movable" is effective in your setup?
> 
> Again, it's not unusual, but with little system activity and
> "memhp_default_state=online_movable", it should usually succeed.
> 
> Can this be reproduced?

Hi David，

Thanks a lot for your explanation.

I have re-tested this scenario for 5 times : The vm can unplug the remaining 4M memory successfully after I waited for a bit minutes.

Comment 44 David Hildenbrand 2022-01-20 08:48:40 UTC

(In reply to Yanghang Liu from comment #43)
> (In reply to David Hildenbrand from comment #39)
> 
> > > Here's a question I want to confirm with you：
> > > 
> > > After I changed the "requested-size" value several times and finally set it
> > > to 0, I found that the ”node 0 plugged“ value in "info numa" output is "4
> > > MB" instead of "0 MB".
> > > (The ”node 0 plugged“ value in "info numa" output has been equal to "4 MB"
> > > for a long time without any change)
> > > 
> > > Is this the expected result?
> > 
> > In general, with "memhp_default_state=online_movable" we make memory
> > hotunplug more likely to succeed. Unfortunately, there are corner cases
> > where we can still fail to hotunplug all memory, that will similarly make
> > CMA allocations fail. The VM will continue trying to unplug (e.g., after
> > ~1min, then after ~2min, then after ~4min, ...) the remaining memory. In
> > that scenario, can you
> > 
> > 1) Wait a bit (e.g., 4 minutes) to see if it manages to unplug it?
> > 2) Share the output of "cat /proc/zoneinfo" when it happens, just hat I can
> > verify that "memhp_default_state=online_movable" is effective in your setup?
> > 
> > Again, it's not unusual, but with little system activity and
> > "memhp_default_state=online_movable", it should usually succeed.
> > 
> > Can this be reproduced?
> 
> Hi David，
> 
> Thanks a lot for your explanation.
> 
> I have re-tested this scenario for 5 times : The vm can unplug the remaining
> 4M memory successfully after I waited for a bit minutes.

Perfect, thanks a lot for testing!!!

Comment 46 Yanghang Liu 2022-01-20 09:16:19 UTC

(In reply to David Hildenbrand from comment #40)

> > Hi David,
> > 
> > I can't do any stress tests in Case 5 and Case 6 because my current QL41112
> > vfio-pf device seems can only work in pcie mode but our test scenario
> > requires it to be plugged into a pci-pcie-bridge.
> > 
> 
> Interesting.
> 
> > May I ask what do you think about the "vfio-pf/vfio-vf in pcie-root-port +
> > virtio-mem device + stress" test scenario ？
> 
> If there is an easy way to get it running that would be great. Do you have
> an idea on how to get it running or would I have to dig a bit? :)
> 
> > Do I need to add this test scenario into my test case for testing this bug
> > as well?
> 
> It would be great if we could let the device some actual work, to see if the
> vfio mapping in the hypervisor is setup properly.

Hi David,

I have drafted two another cases about the "vfio-pf/vfio-vf in pcie-root-port + virtio-mem device + stress" test scenario.

I would like to confirm if it's ok for you that I also add the following two cases into my test plan for testing this bug.

Could you please help review it ?


Case 7:  Test with vfio-pci device in pcie-root-port + virtio-mem (no vIOMMU)

  (1) Make sure the kernel config option "memhp_default_state=online_movable" is enabled in the vm

  (2) start a vm with the vfio-pf device and virtio-mem device
  ...
  -M q35 \
  -m 4G,maxmem=20G,slots=2 \
  -object memory-backend-ram,id=vmem0,size=16G  \
  -device virtio-mem-pci,id=vm0,memdev=vmem0,requested-size=1G,bus=root.3 \
  -device vfio-pci,host=0000:3b:00.0,id=pf1,bus=root.4 \  <--- the main difference from the vm qemu cmd line in Case 5
  ...

  (3) check the virtio-mem device and vfio-vf/vfio-pf status
  related cmd:(1)dmesg(2)lspci
  related qmp/hmp:(1)info memory-devices(2)info numa 

  (4) Resize virtio-mem device repeatedly and check if the requested-size adjusts accordingly
  related qmp/hmp: qom-set vm0 requested-size $size

  (5) running some stress tests with the vfio-vf/vfio-pf in the vm  <--- we can do some tests with the vfio-vf/vfio-pf in the vm now because the vfio-vf/vfio-pf is plugged into a pcie-root-port
  related cmd: (1)ping (2)netperf 

  (6) check if the vm works well 

Case 8:  Test with vfio-pci device in pcie-root-port + virtio-mem + vIOMMU

  (1) Make sure the kernel config option "memhp_default_state=online_movable" and "iommu=on" is enabled in the vm

  (2) start a vm with the vfio-pf device、virtio-mem device and iommu device
  ...
  -m 4G,maxmem=12G,slots=2 \
  -M q35,kernel-irqchip=split \
  -device intel-iommu,intremap=on,caching-mode=on \
  -object memory-backend-ram,id=vmem0,size=8G  \
  -device virtio-mem-pci,id=vm0,memdev=vmem0,requested-size=1G,iommu_platform=on,bus=root.3 \
  -device vfio-pci,host=0000:3b:00.0,id=pf1,bus=root.4 \  <--- the main difference from the vm qemu cmd line in Case 6
  ...

  (3) check the virtio-mem device and vfio-vf/vfio-pf status
  related cmd:(1)dmesg(2)lspci
  related qmp/hmp:(1)info memory-devices(2)info numa 

  (4) Resize virtio-mem device repeatedly and check if the requested-size adjusts accordingly
  related qmp/hmp: qom-set vm0 requested-size $size

  (5) running some stress tests with the vfio-vf/vfio-pf in the vm  <--- we can do some tests with the vfio-vf/vfio-pf in the vm because the vfio-vf/vfio-pf is plugged into a pcie-root-port
  related cmd: (1)ping (2)netperf 

  (6) check if the vm works well

Comment 49 David Hildenbrand 2022-01-20 10:34:06 UTC

> > It would be great if we could let the device some actual work, to see if the
> > vfio mapping in the hypervisor is setup properly.
> 
> Hi David,
> 
> I have drafted two another cases about the "vfio-pf/vfio-vf in
> pcie-root-port + virtio-mem device + stress" test scenario.
> 
> I would like to confirm if it's ok for you that I also add the following two
> cases into my test plan for testing this bug.
> 
> Could you please help review it ?

Absolutely, thanks for coming up with that (I'm not a vfio expert when it comes to these details, so it's highly appreciated!).

> 
> 
> Case 7:  Test with vfio-pci device in pcie-root-port + virtio-mem (no vIOMMU)
> 
>   (1) Make sure the kernel config option
> "memhp_default_state=online_movable" is enabled in the vm
> 
>   (2) start a vm with the vfio-pf device and virtio-mem device
>   ...
>   -M q35 \
>   -m 4G,maxmem=20G,slots=2 \
>   -object memory-backend-ram,id=vmem0,size=16G  \
>   -device virtio-mem-pci,id=vm0,memdev=vmem0,requested-size=1G,bus=root.3 \
>   -device vfio-pci,host=0000:3b:00.0,id=pf1,bus=root.4 \  <--- the main
> difference from the vm qemu cmd line in Case 5
>   ...
> 
>   (3) check the virtio-mem device and vfio-vf/vfio-pf status
>   related cmd:(1)dmesg(2)lspci
>   related qmp/hmp:(1)info memory-devices(2)info numa 
> 
>   (4) Resize virtio-mem device repeatedly and check if the requested-size
> adjusts accordingly
>   related qmp/hmp: qom-set vm0 requested-size $size
> 
>   (5) running some stress tests with the vfio-vf/vfio-pf in the vm  <--- we
> can do some tests with the vfio-vf/vfio-pf in the vm now because the
> vfio-vf/vfio-pf is plugged into a pcie-root-port
>   related cmd: (1)ping (2)netperf 
> 
>   (6) check if the vm works well 

Perfect! Would it actually make sense to do that *instead* of test case 5? The difference where/how a vfio device is plugged shouldn't change QEMU functionality.

> 
> Case 8:  Test with vfio-pci device in pcie-root-port + virtio-mem + vIOMMU
> 
>   (1) Make sure the kernel config option
> "memhp_default_state=online_movable" and "iommu=on" is enabled in the vm
> 
>   (2) start a vm with the vfio-pf device、virtio-mem device and iommu device
>   ...
>   -m 4G,maxmem=12G,slots=2 \
>   -M q35,kernel-irqchip=split \
>   -device intel-iommu,intremap=on,caching-mode=on \
>   -object memory-backend-ram,id=vmem0,size=8G  \
>   -device
> virtio-mem-pci,id=vm0,memdev=vmem0,requested-size=1G,iommu_platform=on,
> bus=root.3 \
>   -device vfio-pci,host=0000:3b:00.0,id=pf1,bus=root.4 \  <--- the main
> difference from the vm qemu cmd line in Case 6
>   ...
> 
>   (3) check the virtio-mem device and vfio-vf/vfio-pf status
>   related cmd:(1)dmesg(2)lspci
>   related qmp/hmp:(1)info memory-devices(2)info numa 
> 
>   (4) Resize virtio-mem device repeatedly and check if the requested-size
> adjusts accordingly
>   related qmp/hmp: qom-set vm0 requested-size $size
> 
>   (5) running some stress tests with the vfio-vf/vfio-pf in the vm  <--- we
> can do some tests with the vfio-vf/vfio-pf in the vm because the
> vfio-vf/vfio-pf is plugged into a pcie-root-port
>   related cmd: (1)ping (2)netperf 
> 
>   (6) check if the vm works well

Dito, I assume we can just modify test case 6. That will reduce testing effort without harming test scope.

Comment 50 Yanghang Liu 2022-01-21 06:43:38 UTC

Hi David，

> Perfect! Would it actually make sense to do that *instead* of test case 5? The difference where/how a vfio device is plugged shouldn't change QEMU functionality.

If I understand correctly， you mean I only needs to test Case 6, Case 7 and Case 8 for verifying this bug now, right ?

> I assume we can just modify test case 6. That will reduce testing effort without harming test scope.

I feel sorry that I am not understanding very clearly about what you mean here.

Could you please help share your details about how to merge Case 6 and Case 8 so that I can make my test steps more clear ?

Comment 51 David Hildenbrand 2022-01-21 07:31:44 UTC

(In reply to Yanghang Liu from comment #50)
> Hi David，
> 
> > Perfect! Would it actually make sense to do that *instead* of test case 5? The difference where/how a vfio device is plugged shouldn't change QEMU functionality.
> 
> If I understand correctly， you mean I only needs to test Case 6, Case 7 and
> Case 8 for verifying this bug now, right ?
> 
> > I assume we can just modify test case 6. That will reduce testing effort without harming test scope.
> 
> I feel sorry that I am not understanding very clearly about what you mean
> here.
> 
> Could you please help share your details about how to merge Case 6 and Case
> 8 so that I can make my test steps more clear ?

Sorry, I think I wasn't clear enough :)

I was thinking about *replacing* case 5 by case 7 and *replacing* case 6 by case 8. Essentially wiring up the vfio device differently which will allow for a stress test of the vfio device in the VM. Does that make sense?

Comment 52 Yanghang Liu 2022-01-21 07:38:56 UTC

(In reply to David Hildenbrand from comment #51)

Thanks David for the feedback.  :)

> I was thinking about *replacing* case 5 by case 7 and *replacing* case 6 by case 8. 
> Essentially wiring up the vfio device differently which will allow for a stress test of the vfio device in the VM. Does that make sense?

I agree. That looks good from my part.

Comment 53 Yanghang Liu 2022-01-21 07:54:31 UTC

The test result for Case 6 and Case 7: PASS

I will use the official kernel build to test this bug again after "Bug 2014492 - [RHEL9] Enable virtio-mem as tech-preview on x86-64 - Linux guests" is fixed.

Comment 54 Yanghang Liu 2022-01-21 08:04:57 UTC

(In reply to Yanghang Liu from comment #53)

> The test result for Case 6 and Case 7: PASS

Sorry for the typo here.

It should be "The test result for Case 7 and Case 8: PASS"

> I will use the official kernel build to test this bug again after "Bug 2014492 - [RHEL9] Enable virtio-mem as tech-preview on x86-64 - Linux guests" is fixed.

Comment 59 errata-xmlrpc 2022-05-17 12:25:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307

Comment 60 Mario Casquero 2023-05-10 07:08:18 UTC

Setting qe_test_coverage + as all the virtio_mem cases have been finally automated

Note You need to log in before you can comment on or make changes to this bug.