Bug 1979440

Summary: Can not list mdev device defined by mdevctl using different parent address with the existing mdev device
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: yafu <yafu>
Component: libvirtAssignee: Jonathon Jongsma <jjongsma>
Status: CLOSED ERRATA QA Contact: yafu <yafu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.5CC: bfu, jdenemar, jjongsma, jsuchane, lmen, smooney, virt-maint, xuzhang, zhguo
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-7.6.0-1.module+el8.5.0+12097+2c77910b Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-16 07:55:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 7.6.0
Embargoed:

Description yafu 2021-07-06 04:04:18 UTC
Description of problem:
Can not list mdev device defined by mdevctl using different parent address with the existing mdev device

Version-Release number of selected component (if applicable):
libvirt-daemon-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64
qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09.x86_64
mdevctl-0.78-1.el8.noarch

How reproducible:
100%

Steps to Reproduce:
1.Define a mdev device:
# mdevctl define --uuid=8e21b099-b7a0-4c79-ad9e-6744362e2b66 --parent=0000:9b:00.0 --type=nvidia-260 -a

2.List the mdev device by 'mdevctl list -d':
# mdevctl list -d
8e21b099-b7a0-4c79-ad9e-6744362e2b66 0000:9b:00.0 nvidia-260 auto

3.# virsh nodedev-list --inactive
mdev_8e21b099_b7a0_4c79_ad9e_6744362e2b66

4.Define another mdev device with parent address different with step 1:
#mdevctl define --uuid=8e21b099-b7a0-4c79-ad9e-6744362e2b77 --parent=0000:5b:00.0 --type=nvidia-260 -a

5.List the mdev device by 'mdevctl list -d':
# mdevctl list -d
8e21b099-b7a0-4c79-ad9e-6744362e2b77 0000:5b:00.0 nvidia-260 auto
8e21b099-b7a0-4c79-ad9e-6744362e2b66 0000:9b:00.0 nvidia-260 auto

6.List the mdev devices by 'virsh nodedev-list --inactive':
# virsh nodedev-list --cap mdev --inactive
mdev_8e21b099_b7a0_4c79_ad9e_6744362e2b66

Actual results:
Can not list mdev device defined by mdevctl using different parent address with the existing mdev device

Expected results:
Should always list the mdev device defined by mdevctl

Additional info:
1.Can list mdev device defined by mdevctl using the same parent address with existing mdev devcie

Comment 1 yafu 2021-07-06 04:47:37 UTC
And after executing steps in comment 0, can not list any mdev device if libvirtd quit after timeout:
# sleep 120
# virsh nodedev-list --cap mdev --inactive
no output

Comment 2 Jonathon Jongsma 2021-07-06 17:04:28 UTC
Ah, the support for non-unique UUIDs in mdevctl rears its head. Thanks, I'll investigate.

Comment 3 Jonathon Jongsma 2021-07-06 19:07:06 UTC
Potential patch sent upstream.

https://listman.redhat.com/archives/libvir-list/2021-July/msg00111.html

Comment 4 yafu 2021-07-07 00:34:33 UTC
(In reply to Jonathon Jongsma from comment #2)
> Ah, the support for non-unique UUIDs in mdevctl rears its head. Thanks, I'll
> investigate.

Hi Jonathon,

The uuids used in comment 0 are unique.  

# mdevctl list -d
***8e21b099-b7a0-4c79-ad9e-6744362e2b77*** 0000:5b:00.0 nvidia-260 auto
***8e21b099-b7a0-4c79-ad9e-6744362e2b66*** 0000:9b:00.0 nvidia-260 auto

Also can not list mdev device successfully with the following uuids:
# mdevctl list -d
c11fa1ca-18f6-4793-aa58-dcbfbb2e52db 0000:5b:00.0 nvidia-260 auto
83a54ae7-09b1-4046-b0e2-b31de31505a0 0000:9b:00.0 nvidia-264 auto

# virsh nodedev-list --cap mdev --inactive
mdev_83a54ae7_09b1_4046_b0e2_b31de31505a0

Comment 5 Jonathon Jongsma 2021-07-07 02:12:27 UTC
Oh, sorry. I read the initial report too quickly so I did not notice the difference in UUIDs. (The patch I sent upstream is still fixing a real issue, but I guess it won't solve this particular issue.)

Can you attach the output from the following command?
$ mdevctl list --defined --dumpjson

Comment 6 yafu 2021-07-07 02:57:06 UTC
(In reply to Jonathon Jongsma from comment #5)
> Oh, sorry. I read the initial report too quickly so I did not notice the
> difference in UUIDs. (The patch I sent upstream is still fixing a real
> issue, but I guess it won't solve this particular issue.)
> 
> Can you attach the output from the following command?
> $ mdevctl list --defined --dumpjson

# mdevctl list --defined --dumpjson
[
  {
    "0000:5b:00.0": [
      {
        "8e21b099-b7a0-4c79-ad9e-6744362e2b33": {
          "mdev_type": "nvidia-260",
          "start": "auto"
        }
      }
    ],
    "0000:9b:00.0": [
      {
        "8af4c8c3-959b-4596-bf0e-5eedc709b0b6": {
          "mdev_type": "nvidia-262",
          "start": "auto"
        }
      }
    ]
  }
]

Comment 7 Jonathon Jongsma 2021-07-07 18:47:06 UTC
Ah, this bug is solved by a patch that was just recently merged upstream. I can't reproduce it on git master anymore.

commit e9b534905f4fb03d8f31d007a0d1aa1c911e2a2c
Author: Jonathon Jongsma <jjongsma>
Date:   Thu Jun 10 13:15:37 2021 -0500

    nodedev: handle mdevs from multiple parents
    
    Due to a rather unfortunate misunderstanding, we were parsing the list
    of defined devices from mdevctl incorrectly. Since my primary
    development machine only has a single device capable of mdevs, I
    apparently neglected to test multiple parent devices and made some
    assumptions based on reading the mdevctl code. These assumptions turned
    out to be incorrect, so the parsing failed when devices from more than
    one parent device were returned.
    
    The details: mdevctl returns an array of objects representing the
    defined devices. But instead of an array of multiple objects (with each
    object representing a parent device), the array always contains only a
    single object. That object has a separate property for each parent
    device.
    
    Signed-off-by: Jonathon Jongsma <jjongsma>
    Reviewed-by: Michal Privoznik <mprivozn>

Comment 8 Jonathon Jongsma 2021-07-09 13:57:55 UTC
*** Bug 1979998 has been marked as a duplicate of this bug. ***

Comment 9 Jonathon Jongsma 2021-07-09 14:01:47 UTC
*** Bug 1979951 has been marked as a duplicate of this bug. ***

Comment 12 yafu 2021-08-05 08:41:02 UTC
Verified with libvirt-7.6.0-1.module+el8.5.0+12097+2c77910b.x86_64.

Test steps:
1.Define 2 mdev device with different parent address:
#mdevctl define --uuid=8e21b099-b7a0-4c79-ad9e-6744362e2b66 --parent=0000:21:00.0 --type=nvidia-231 -a
# mdevctl define --uuid=83a54ae7-09b1-4046-b0e2-b31de31505a0 --parent=0000:41:00.0 --type=nvidia-231 -a

2.List the mdev device with 'virsh nodedev-list':
# virsh nodedev-list --cap mdev --all
mdev_83a54ae7_09b1_4046_b0e2_b31de31505a0
mdev_8e21b099_b7a0_4c79_ad9e_6744362e2b66

3.Start one mdev device:
# mdevctl start --uuid=8e21b099-b7a0-4c79-ad9e-6744362e2b66

4.List the active mdev device with 'virsh nodedev-list', the result can be refreshed:
# virsh nodedev-list --cap mdev
mdev_8e21b099_b7a0_4c79_ad9e_6744362e2b66

5.Destroy the mdev device and list again:
# virsh nodedev-list --cap mdev

# virsh nodedev-list --cap mdev --all
mdev_83a54ae7_09b1_4046_b0e2_b31de31505a0
mdev_8e21b099_b7a0_4c79_ad9e_6744362e2b66


6.Undefine both of the mdev devices and list again:
# virsh nodedev-undefine mdev_83a54ae7_09b1_4046_b0e2_b31de31505a0
Undefined node device 'mdev_83a54ae7_09b1_4046_b0e2_b31de31505a0'
# virsh nodedev-undefine mdev_8e21b099_b7a0_4c79_ad9e_6744362e2b66
Undefined node device 'mdev_8e21b099_b7a0_4c79_ad9e_6744362e2b66'

# virsh nodedev-list --cap mdev --all
no output

7.Also test define 2 different parent address mdev devices with 'virsh nodedev-define', it works well.

Comment 14 errata-xmlrpc 2021-11-16 07:55:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4684