Bug 1567041

Summary: qemu-guest-agent does not parse PCI bridge links in "build_guest_fsinfo_for_real_device" (q35)
Product: Red Hat Enterprise Linux 7 Reporter: Lili Zhu <lizhu>
Component: qemu-guest-agentAssignee: Marc-Andre Lureau <marcandre.lureau>
Status: CLOSED ERRATA QA Contact: FuXiangChun <xfu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: chayang, dyuan, fjin, juzhang, knoel, lersek, michen, xfu, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-guest-agent-2.12.0-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 08:08:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lili Zhu 2018-04-13 10:28:59 UTC
Description of problem:
Some info of disk returned by QEMU agent is missing in q35+OVMF guest 

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64
OVMF-20171011-4.git92d07e48907f.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1.prepare a guest
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 12    OVMF                           running

2. Show a list of mounted filesystems within guest
# virsh domfsinfo OVMF
Mountpoint                           Name     Type     Target
-------------------------------------------------------------------
/                                    dm-0     xfs      
/boot                                vda2     xfs      
/boot/efi                            vda1     vfat     

target names are not shown

3.check the libvirtd.log
.....
qemuAgentIOProcessLine:317 : Line [{"return": [{"name": "vda1", "mountpoint": "/boot/efi", "disk": [], "type": "vfat"}, {"name": "vda2", "mountpoint": "/boot", "disk": [], "type": "xfs"}, {"name": "dm-0", "mountpoint": "/", "disk": [], "type": "xfs"}]}]
.....

Actual results:
No value for "disk"

Expected results:
Other info including bus and address should be got

Additional info:
For the pc-i440fx mechine type guest, disk info could be got 
2018-04-13 09:17:57.089+0000: 2055: debug : qemuAgentIOProcessLine:317 : Line [{"return": [{"name": "vda1", "mountpoint": "/boot", "disk": [{"bus-type": "virtio", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 6, "domain": 0, "function": 0}, "target": 0}], "type": "xfs"}, {"name": "dm-0", "mountpoint": "/", "disk": [{"bus-type": "virtio", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 6, "domain": 0, "function": 0}, "target": 0}], "type": "xfs"}]}]

Comment 2 Laszlo Ersek 2018-04-13 11:06:05 UTC
This is a problem with the guest agent, not OVMF.

In the guest agent, the get_pci_driver() function is used to retrieve the
driver for the disk / filesystem. get_pci_driver() fails for the following
"syspath" parameter, for example:

  /sys/devices/pci0000:00/0000:00:1e.0/0000:03:01.0/0000:04:05.0/virtio3/host6/target6:0:0/6:0:0:0/block/sda/sda3

And that's justified because there is no "driver" entry in that directory.

Because get_pci_driver() returns NULL, build_guest_fsinfo_for_real_device()
takes the early exit:

    if (!driver) {
        goto cleanup;
    }

and the fs->disk member will not be populated.

Here's the --verbose log from the guest agent:

> read data, count: 31, data: {"execute":"guest-get-fsinfo"}
> process_event: called
> processing command
> Building guest fsinfo for '/'
>   parse sysfs path '/sys/devices/virtual/block/dm-1'
>  slave device 'sda3'
>   parse sysfs path '/sys/devices/pci0000:00/0000:00:1e.0/0000:03:01.0/0000:04:05.0/virtio3/host6/target6:0:0/6:0:0:0/block/sda/sda3'
> Building guest fsinfo for '/boot'
>   parse sysfs path '/sys/devices/pci0000:00/0000:00:1e.0/0000:03:01.0/0000:04:05.0/virtio3/host6/target6:0:0/6:0:0:0/block/sda/sda2'
> Building guest fsinfo for '/boot/efi'
>   parse sysfs path '/sys/devices/pci0000:00/0000:00:1e.0/0000:03:01.0/0000:04:05.0/virtio3/host6/target6:0:0/6:0:0:0/block/sda/sda1'
> sending data, count: 220

Comment 3 Laszlo Ersek 2018-04-13 11:25:48 UTC
Actually, the syspath that get_pci_driver() operates on is the following string only:

  /sys/devices/pci0000:00/0000:00:1e.0

That's under which qga looks for the "driver" entry.

And, it cannot work -- there is no "driver" entry there -- because the PCI device identified like above is not a disk controller. It is a PCI bridge. The *actual* syspath slice that get_pci_driver() should receive, for investigation, is:

  /sys/devices/pci0000:00/0000:00:1e.0/0000:03:01.0/0000:04:05.0

Under this, a "driver" entry does exit, and it links to ..../virtio-pci.

In short, the issue is that the following code fragment:

    p = strstr(syspath, "/devices/pci");
    if (!p || sscanf(p + 12, "%*x:%*x/%x:%x:%x.%x%n",
                     pci, pci + 1, pci + 2, pci + 3, &pcilen) < 4) {
        g_debug("only pci device is supported: sysfs path \"%s\"", syspath);
        return;
    }

    driver = get_pci_driver(syspath, (p + 12 + pcilen) - syspath, errp);

from build_guest_fsinfo_for_real_device() cannot deal with PCI bridges. And, in the Q35 setup at hand, we have two bridges (a DMI-to-PCI bridge, and a PCI-PCI bridge) before we arrive at the disk controller.

The scanning should be extended to iterate (in a loop) over the PCI bridge links.

Comment 4 Laszlo Ersek 2018-04-13 11:43:04 UTC
Lili, a request for the future: before filing an RHBZ for the OVMF component, please check whether the issue reproduces with SeaBIOS (using an otherwise identical domain configuration).

I see that in this case, you did check i440fx, and that's great. However, you compared the following two configs:
- i440fx + SeaBIOS
versus
- q35 + OVMF

The issue in the guest agent wasn't triggered by the SeaBIOS -> OVMF change, but by the i440fx -> q35 change. Therefore I suggest that in the future please try to narrow down the issue as much as possibe. If you see an issue manifest only with q35+OVMF, then please compare it *first* to q35+SeaBIOS. If the issue disappears, then it is likely related to the OVMF<->SeaBIOS difference. If the issue persists, then you can compare q35+SeaBIOS vs. i440fx+SeaBIOS second. If the issue disappears then, then it is likely related to the q35<->i440fx difference.

In other words, please change only one element of the setup each step along the way. That's how we eliminate irrelevant components. Thanks.

Comment 5 Lili Zhu 2018-04-14 06:49:40 UTC
Laszlo, sorry for the misunderstanding about Q35. Thanks very much for your detailed explanation.

Comment 6 Marc-Andre Lureau 2018-04-20 15:25:52 UTC
sent a patch to qemu ML:
"[PATCH] qemu-ga: make get-fsinfo work over pci bridge"

Comment 7 Marc-Andre Lureau 2018-07-18 12:57:00 UTC
posted
[RHEL-7.6 qemu-guest-agent PATCH 0/2] Make get-fsinfo work over pci bridges

Comment 9 Miroslav Rezanina 2018-07-24 06:52:01 UTC
Fix included in qemu-guest-agent-2.12.0-2.el7

Comment 11 FuXiangChun 2018-07-26 07:16:50 UTC
Reproduced this bug with qemu-guest-agent-2.12.0-1.el7.x86_64.

steps:

1. Boot guest system disk with pci-bridge.

...
-device pci-bridge,bus=pci.0,id=bridge0,chassis_nr=1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=bridge0,addr=0x5 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel76-64-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1
...

{"execute":"guest-get-fsinfo"}
{"return": [{"name": "sda1", "mountpoint": "/boot", "disk": [], "type": "xfs"}, {"name": "dm-0", "mountpoint": "/", "disk": [], "type": "xfs"}]}


Verified this bug with qemu-guest-agent-2.12.0-2.el7.x86_64

{"execute":"guest-get-fsinfo"}
{"return": [{"name": "sda1", "mountpoint": "/boot", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 1, "slot": 5, "domain": 0, "function": 0}, "target": 0}], "type": "xfs"}, {"name": "dm-0", "mountpoint": "/", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 1, "slot": 5, "domain": 0, "function": 0}, "target": 0}], "type": "xfs"}]}

Base on this result.  This bug is fixed.

Comment 13 errata-xmlrpc 2018-10-30 08:08:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3072