Bug 2232361

Summary: The network interface could not be hot-detached in ppc64le
Product: [Fedora] Fedora Reporter: YunmingYang <yunyang>
Component: ppc64-diagAssignee: Than Ngo <than>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 38CC: hhan, jcajka, jinzhao, juzhang, ksinny, pkrempa, rdossant, than, virt-maint, xuma
Target Milestone: ---Flags: pm-rhel: mirror+
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-12 08:45:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description YunmingYang 2023-08-16 13:37:12 UTC
Description of problem:
Create a VM by "virt-install --name test --import --disk /var/lib/libvirt/images/fedora38.qcow2 --os-variant fedora38 --memory 2048 --vcpus 2 --noautoconsole --os-variant fedora38 --print-xml > xml && virsh define xml && virsh start test"(Use fedora 38 cloud base image), then wait until the VM boot completely, then detach the network interface by "virsh detach-interface test network --mac ${mac} --live", the network interface is not detached

Version-Release number of selected components (if applicable):
libvirt-libs-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-dbus-1.3.0-2.module+el8.9.0+18724+20190c23.ppc64le
libvirt-daemon-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-driver-network-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-driver-storage-disk-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-driver-nodedev-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-driver-storage-iscsi-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-glib-3.0.0-1.el8.ppc64le
python3-libvirt-8.0.0-2.module+el8.9.0+18724+20190c23.ppc64le
libvirt-client-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-driver-storage-core-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-config-network-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-driver-interface-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le
libvirt-daemon-driver-qemu-8.0.0-22.module+el8.9.0+19544+b3045133.ppc64le

How reproducible:
100%

Steps to Reproduce:
1 Create a VM by "virt-install --name test --import --disk /var/lib/libvirt/images/fedora38.qcow2 --os-variant fedora38 --memory 2048 --vcpus 2 --noautoconsole --os-variant fedora38 --print-xml > xml && virsh define xml && virsh start test"(Use fedora 38 cloud base image)
2 Wait until the VM boot completely
3 Detach the network interface by "virsh detach-interface test network --mac ${mac} --live"

Actual results:
1 After step 3, the network is not detached

Expected results:
2 After step 3, the network interface should be detached

Additional info:

Comment 1 Han Han 2023-08-17 02:19:57 UTC
Tested on libvirt-9.6.0-1.fc39.x86_64 qemu-system-ppc-core-8.0.0-4.fc39.x86_64. No problem
# virsh dumpxml rhel-ppc64 --xpath //interface
<interface type="network">
  <mac address="52:54:00:a1:a5:96"/>
  <source network="default" portid="2aa31b6f-16d5-4fa2-bc04-d644afee71c9" bridge="virbr0"/>
  <target dev="vnet0"/>
  <model type="virtio"/>
  <alias name="net0"/>
  <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
</interface>

# virsh detach-interface rhel-ppc64 network --mac 52:54:00:a1:a5:96
Interface detached successfully


# virsh dumpxml rhel-ppc64 --xpath //interface

Comment 2 Peter Krempa 2023-08-17 11:56:26 UTC
If device is not detached after a hot-unplug request it's usually caused by the guest OS not allowing the unplug. This can be either because the device is in use, or the OS is not responding. Since active cooperation with the guest OS is needed there is no way to force removal.

If you're able to reproduce the problem please attach debug logs of libvirtd, but based on my experience it'll be caused by the guest not allowing the removal. (Libvirt will sucessfully request the removal in qemu, which will also forward it to the OS but no acknowledgement will be seen).

Comment 3 YunmingYang 2023-08-22 08:04:13 UTC
Sorry for the late reply, here is the debug logs of libvirtd

Comment 5 Peter Krempa 2023-08-22 14:36:26 UTC
So libvirt sends 'device_del' ...

2023-08-22 07:58:47.832+0000: 11056: info : qemuMonitorSend:868 : QEMU_MONITOR_SEND_MSG: mon=0x7fff4c1852f0 msg={"execute":"device_del","arguments":{"id":"net0"},"id":"libvirt-33"}
 fd=-1

... and qemu acknowledged it.

2023-08-22 07:58:47.834+0000: 11304: debug : qemuMonitorJSONIOProcessLine:225 : Line [{"return": {}, "id": "libvirt-33"}]

The DEVICE_DELETED event was never delivered, so the guest OS didn't acknowledge the unplug most likely. If you think there's a technical problem, you can move this bug to 'qemu' to investigate further, otherwise I'll close it.

Comment 6 YunmingYang 2023-08-23 13:54:41 UTC
Hi, I also try with x86_64, it seems that the DEVICE_DELETED event was delivered normally, so I think maybe moving the bug to 'qemu' to get more information is better, many thanks.

Comment 7 Xujun Ma 2023-09-08 15:20:19 UTC
Reproduced this issue,and failed to hotunplug nic device on fedora38 guest.
Log:
(qemu) device_del nic0
Error: PCI device unplug already in progress for device nic0
Guest log:
[ 2181.084755] RTAS: event: 1, Type: Hotplug Event (229), Severity: 1
[root@dhcp16-215-248 ~]# lspci
00:00.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:01.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
00:02.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)

Guest kernel:6.2.9-300.fc38.ppc64le
Host qemu-kvm:qemu-kvm-6.2.0-39.module+el8.9.0+19787+17a83bb7.ppc64le

Comment 8 David Gibson 2023-09-12 00:48:29 UTC
Oops, updating here on Bugzilla, since this one isn't migrated to Jira yet.

Xujun,

1. Is the Fedora guest running an rtas_errd?  IIRC we need that for hotplug/unplug to be processed properly.

2. Whether or not that's the problem, the problem appears to be that the guest is not acknowledging the unplug request (comment 5).  That makes this a guest kernel bug - and therefore a Fedora bug - rather than a host or RHEL bug.  As you've noticed I barely have capacity to look at any RHEL ppc bugs, Fedora bugs are definitely not going to be investigated.

Comment 9 Xujun Ma 2023-09-12 02:00:06 UTC
(In reply to David Gibson from comment #8)
> Oops, updating here on Bugzilla, since this one isn't migrated to Jira yet.
> 
> Xujun,
> 
> 1. Is the Fedora guest running an rtas_errd?  IIRC we need that for
> hotplug/unplug to be processed properly.
Not running rtas_errd in guest because not installed.
When I installed ppc64-diag-rtas,it's no problem to hotunplug device.
> 
> 2. Whether or not that's the problem, the problem appears to be that the
> guest is not acknowledging the unplug request (comment 5).  That makes this
> a guest kernel bug - and therefore a Fedora bug - rather than a host or RHEL
> bug.  As you've noticed I barely have capacity to look at any RHEL ppc bugs,
> Fedora bugs are definitely not going to be investigated.

Comment 10 Than Ngo 2023-09-12 08:45:46 UTC
looking at https://pagure.io/fedora-comps/blob/main/f/comps-f38.xml.in the package ppc64-diag-rtas is missing here, therefore it is not installed by default.

It's not a bug in ppc64-diag, but in fedora-comps. I also reported this issue to fedora-comps

Comment 11 YunmingYang 2023-09-12 09:28:48 UTC
Many thanks. Could you put the link about the mirror in fedora-comps so that I could also track the issue, just record it as a known issue in the automation/manual test.

Comment 12 Than Ngo 2023-09-12 09:38:10 UTC
the link about the mirror in fedora-comps:  https://pagure.io/fedora-comps/issue/889