2029380 – Incompatibilities between 8.5 virsh and libvirtd from virt:av

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2029380 - Incompatibilities between 8.5 virsh and libvirtd from virt:av

Summary: Incompatibilities between 8.5 virsh and libvirtd from virt:av

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	yalzhang@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2038812 2053519 2053520
TreeView+	depends on / blocked

Reported:	2021-12-06 11:04 UTC by Christophe Fergeau
Modified:	2022-10-13 07:08 UTC (History)
CC List:	16 users (show)
Fixed In Version:	libvirt-8.0.0-0rc1.1.module+el8.6.0+13853+e8cd34b9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2038812 2053519 2053520 (view as bug list)
Environment:
Last Closed:	2022-05-10 13:24:19 UTC
Type:	Bug
Target Upstream Version:	7.2.0
Embargoed:
Dependent Products:
Flags:	toneata: needinfo- toneata: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-104815	0	None	None	None	2021-12-06 11:08:54 UTC

Description Christophe Fergeau 2021-12-06 11:04:42 UTC

I have a system running libvirt-daemon-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64 (this is from virt:av)

I connect to it from a different machine with libvirt-client-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64

Trying to run 'virsh net-update' from the second system to the first one will fail because of https://gitlab.com/libvirt/libvirt/-/commit/b0f78d626a18bcecae3a4d165540ab88bfbfc9ee 'lib: Fix calling of virNetworkUpdate() driver callback'. This is entirely expected as indicated by the commit log: "Unfortunately, older client talking to newer daemon can't be fixed. Let's hope that it's less frequent scenario."

Steps to reproduce:

- set up a machine with libvirtd from virt:av (7.6.0 or newer)
- set up qemu:///system remote access on this machine (can be qemu+ssh, qemu+tcp, ...)
- run:
$ podman run -it --rm  registry.access.redhat.com/ubi8/ubi bash -c  "yum -y install libvirt-client && virsh -c qemu+tcp://$VIRT_AV_MACHINE/system net-update --network default --command modify  --section ip-dhcp-host '<host mac=\"7e:75:1a:57:b9:5b\" name=\"libvirt-test\" ip=\"192.168.122.10\"></host>'"

Result:
error: Failed to update network default
error: Operation not supported: can't update 'bridge' section of network 'default'

Expected result:
Either success, or an error message mentioning "ip-dhcp-host" instead of "bridge"

Running the same virsh command with virsh 7.6.0 works fine. Backporting the client part of b0f78d626a1 would probably fix this issue as this would enable the old virsh to communicate with both old and new daemons.

This bug is coming from https://github.com/openshift/cluster-api-provider-libvirt/issues/231, installation of a (development) OpenShift cluster on a host with libvirtd 7.6.0. Some OpenShift container images running in the cluster embed libvirt client libraries from RHEL, and try to use virNetworkUpdate, which fails after the host was upgraded to libvirtd 7.6.0.

Comment 1 Christophe Fergeau 2021-12-07 09:02:53 UTC

It's probably possible to workaround this issue in cluster-api-provider-libvirt by implementing something similar to https://github.com/dmacvicar/terraform-provider-libvirt/commit/0d74474808fea1b94e3e8cdd06aac35b10a0b596# (this codebase is using digitalocean/go-libvirt though, while the api provider uses libvirt/libvirt-go), but it's better if libvirt.so handles it instead of adding workarounds in all the users that need it.

Comment 4 yalzhang@redhat.com 2021-12-20 13:56:26 UTC

I can reproduce the bug by pure libvirt with below steps(Host1 and Host2 can be VMs):
1) Host1 install libvirt-libs-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64
2) Host2 install libvirt-libs-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64
3) On Host2, run(Host1 ip is 192.168.122.183):
# virsh -c qemu+ssh://192.168.122.183/system net-update default add ip-dhcp-host '<host mac="7e:75:1a:57:b9:55" name="libvirt" ip="192.168.124.100"></host>'
error: Failed to update network default
error: Operation not supported: can't update 'ip' section of network 'default'

# virsh -c qemu+ssh://192.168.122.183/system net-update default modify ip-dhcp-host '<host mac="7e:75:1a:57:b9:5b" name="libvirt-test" ip="192.168.124.90"></host>'
error: Failed to update network default
error: Operation not supported: can't update 'bridge' section of network 'default'

Comment 5 John Ferlan 2021-12-20 14:33:48 UTC

(In reply to Christophe Fergeau from comment #3)
> > Is this meant as a request for rhel-8.5 Z-stream?
> 
> Yes it is, thanks for the reminder about the process :) I set `zstream?`,
> hopefully that's the right/only flag that needs to be set?
> 

Close... Since this will be included in the libvirt rebase for RHEL 8.6, I have set ITR=8.6.0 (Internal Target Release) and moved to POST. I have also set ZTR=8.5.0 (ZStream Target Release) in order request an 8.5.z-stream fix.

I'll let Jiri and Jarda manage the rest of the libvirt assignment process, flag updates, etc. 

> > rhel-8.6 will be rebased
> 
> The fact that 8.6 will get a newer libvirt makes it even more important that
> this is fixed in zstream, it's probably not that unusual to not have all
> machines upgraded to the latest RHEL. For example, RHEL CSB usually takes a
> few months before switching to new RHEL releases. `virsh net-update` from
> CSB to up to date server would fail if this is not backported to zstream.

Comment 13 yalzhang@redhat.com 2022-01-14 16:02:19 UTC

Test libvirt-8.0.0-0rc1.1.module+el8.6.0+13853+e8cd34b9.x86_64, with the steps in bug 1870552#c8, the result is as expected.

1. Switch to split daemon mode:
# cat split_daemon.sh 
#!/bin/bash
systemctl stop libvirtd.service
systemctl stop libvirtd{,-ro,-admin,-tcp,-tls}.socket
systemctl start virtqemud
for drv in qemu interface network nodedev nwfilter secret storage proxy; 
do
systemctl start virt${drv}d{,-ro,-admin}.socket ; 
done
 
# sh split_daemon.sh
# virsh uri
qemu:///system
 
# virsh net-update default add ip-dhcp-host  '<host mac="7e:75:1a:57:b9:55" name="libvirt" ip="192.168.122.100"></host>'
Updated network default live state
 
# virsh net-dumpxml default 
<network>
  <name>default</name>
  <uuid>b50a1dfc-b343-4950-a78d-fd6cdbdcabc8</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:07:a6:10'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.200'/>
      <host mac='7e:75:1a:57:b9:55' name='libvirt' ip='192.168.122.100'/>
    </dhcp>
  </ip>
</network>
 
2. Newer libvirt client connect to older libvirt server:
Prepare another host B with libvirt-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64(without the fix) as server:
[host A] # virsh -c qemu+ssh://192.168.122.200/system net-update default add ip-dhcp-host  '<host mac="7e:75:1a:57:b9:44" name="libvirt" ip="192.168.124.100"></host>'
The authenticity of host '192.168.122.200 (192.168.122.200)' can't be established.
ECDSA key fingerprint is SHA256:2C9hWtOW4CoEBGskObgRbSTzZ9ykCepW4nHnmmym69Q.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
root.122.200's password: 
Updated network default live state
 
[host A]# virsh -c qemu+ssh://192.168.122.200/system net-dumpxml default
root.122.200's password: 
<network>
  <name>default</name>
  <uuid>08816129-7b4d-46df-a1a3-a6e5a0b532e9</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:d1:5e:13'/>
  <ip address='192.168.124.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.124.2' end='192.168.124.254'/>
      <host mac='7e:75:1a:57:b9:44' name='libvirt' ip='192.168.124.100'/>
    </dhcp>
  </ip>
</network>
 
3. Older libvirt client connect to libvirt-8.0.0 server
[host B] # virsh -c qemu+ssh://192.168.122.8/system net-update default add dns-txt '<txt name="example" value="example value"/>'
root.122.8's password: 
error: Failed to update network default
error: Operation not supported: can't update 'ip' section of network 'default'

Comment 14 Vincent S. Cojot 2022-02-05 01:20:26 UTC

Hi, there have been reported today (Feb 4th) that the same issue is now happening on plain RHEL8.5 with virt:rhel:

https://github.com/openshift/installer/issues/5401

@ElCoyote27 I encountered it in stock RHEL 8.5. I had to downgrade libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.aarch64 to 6.0.0-37.module+el8.5.0+12162+40884dd2.aarch64. So watch out for the latest 8.5 updates. CC @cfergeau

Did we just break virt:rhel too?
@xuzhang

Comment 15 Vincent S. Cojot 2022-02-05 01:20:50 UTC

Hi, there have been reported today (Feb 4th) that the same issue is now happening on plain RHEL8.5 with virt:rhel:

https://github.com/openshift/installer/issues/5401

@ElCoyote27 I encountered it in stock RHEL 8.5. I had to downgrade libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.aarch64 to 6.0.0-37.module+el8.5.0+12162+40884dd2.aarch64. So watch out for the latest 8.5 updates. CC @cfergeau

Did we just break virt:rhel too?
@xuzhang

Comment 16 Vincent S. Cojot 2022-02-06 01:16:23 UTC

I can confirm that on plain virt:rhel, updating from 6.0.0-37 to 6.0.0-37.1.el8 breaks ocp_libvirt_ipi in a similar fashion to the breakage we reported on virt:av two months ago.

The changelog for the faulty libvirt shows:

* Thu Jan 13 2022 Jiri Denemark <jdenemar> - 6.0.0-37.1.el8
- network: Implement virConnectSupportsFeature() (rhbz#2038812)
- lib: Fix calling of virNetworkUpdate() driver callback (rhbz#2038812)

Unfortunately, this breaks ocp_libvirt_ipi just like it broke for virt:av two months ago.

* Last working version:
6.0.0-37.module+el8.5.0+12162+40884dd2

* first broken version:
6.0.0-37.1.module+el8.5.0+13858+39fdc467

the above is for virt:rhel

for virt:av, the breakage happened there:
* Last working version:
7.0.0-14.module+el8.4.0+10886+79296686

* first broken version:
7.6.0-6.module+el8.5.0+13051+7ddbe958
@jdenemar @mprivozn

Comment 18 Vincent S. Cojot 2022-02-06 14:42:52 UTC

OK, I spoke too fast. I think we're looking at a 2nd BZ hidden behind the first issue. I will confirm with @Christophe Fergeau later

Comment 19 Vincent S. Cojot 2022-02-06 18:13:52 UTC

Here's what I am seeing:

Feb 06 12:58:43 daltigoth libvirtd[932688]: Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs'
Feb 06 12:58:44 daltigoth libvirtd[932688]: Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs'
Feb 06 12:58:46 daltigoth libvirtd[932688]: Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs'


And the OCP workers remaing 'provisionning' for ever:

[root@daltigoth ~]# oc get machines -A
NAMESPACE               NAME                         PHASE          TYPE   REGION   ZONE   AGE
openshift-machine-api   ocp4d-5btvs-master-0         Running                               105m
openshift-machine-api   ocp4d-5btvs-master-1         Running                               105m
openshift-machine-api   ocp4d-5btvs-master-2         Running                               105m
openshift-machine-api   ocp4d-5btvs-worker-0-rzq8t   Provisioning                          101m
openshift-machine-api   ocp4d-5btvs-worker-0-xw6ql   Provisioning                          101m
openshift-machine-api   ocp4d-5btvs-worker-0-z6dl7   Provisioning                          101m

[root@daltigoth ~]# rpm -q libvirt
libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64


[root@daltigoth ~]# dnf module list virt
Updating Subscription Management repositories.

[....]
Advanced Virtualization for RHEL 8 x86_64 (RPMs)
Name                              Stream                                   Profiles                               Summary                                          
virt                              av                                       common                                 Virtualization module                            
virt                              8.0                                      common                                 Virtualization module                            
virt                              8.0.0                                    common                                 Virtualization module                            
virt                              8.1                                      common                                 Virtualization module                            
virt                              8.2                                      common                                 Virtualization module                            
virt                              8.3                                      common                                 Virtualization module                            

Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
Name                              Stream                                   Profiles                               Summary                                          
virt                              rhel [d][e]                              common [d]                             Virtualization module                            

Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled

Comment 20 Vincent S. Cojot 2022-02-06 18:15:14 UTC

@jdenemar @mprivozn Sorry for the duplicate needinfo, please accept my apologies.

Comment 21 Vincent S. Cojot 2022-02-06 19:45:59 UTC

I downgraded libvirt on the previously non working server:
# yum downgrade libvirt-daemon-driver-storage libvirt-client libvirt-daemon-kvm libvirt-daemon-driver-secret libvirt-daemon-driver-storage-logical libvirt-daemon-driver-storage-rbd libvirt-libs libvirt-daemon-driver-qemu libvirt-daemon-driver-storage-gluster libvirt-daemon-driver-nwfilter libvirt-daemon-driver-storage-scsi libvirt-daemon-driver-storage-iscsi libvirt-bash-completion libvirt-daemon-driver-storage-core libvirt-daemon-config-network libvirt-daemon-driver-interface libvirt-daemon-driver-storage-disk libvirt-daemon-driver-storage-mpath libvirt-devel libvirt-daemon libvirt-daemon-config-nwfilter libvirt-daemon-driver-nodedev libvirt-daemon-driver-storage-iscsi-direct libvirt-daemon-driver-network libvirt
[...]
Installed products updated.

Downgraded:
  libvirt-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-bash-completion-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-client-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-config-network-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-config-nwfilter-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-interface-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-network-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-nodedev-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-nwfilter-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-qemu-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-secret-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-core-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-disk-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-gluster-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-iscsi-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-iscsi-direct-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-logical-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-mpath-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-rbd-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-driver-storage-scsi-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-daemon-kvm-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-devel-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64
  libvirt-libs-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64

[root@daltigoth ~]# systemctl restart libvirtd

After that ocp_libvirt_ipi works again and I get this:
[root@daltigoth ~]# oc get machines -A
NAMESPACE               NAME                         PHASE     TYPE   REGION   ZONE   AGE
openshift-machine-api   ocp4d-2ks4x-master-0         Running                          29m
openshift-machine-api   ocp4d-2ks4x-master-1         Running                          29m
openshift-machine-api   ocp4d-2ks4x-master-2         Running                          29m
openshift-machine-api   ocp4d-2ks4x-worker-0-6xvqw   Running                          26m
openshift-machine-api   ocp4d-2ks4x-worker-0-mm6jp   Running                          26m
openshift-machine-api   ocp4d-2ks4x-worker-0-rwmhg   Running                          26m

Comment 22 Michal Privoznik 2022-02-07 10:24:17 UTC

Vincent, I'm not exactly sure what's going on. What is the API/virsh command that's failing? I mean, I can see:

  Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs'

but I'd love to see what arguments was the virNetworkUpdate() API called with. Do you think you can capture libvirt logs for me? And I'll need both client and daemon logs.

https://libvirt.org/kbase/debuglogs.html

Comment 23 Vincent S. Cojot 2022-02-07 13:25:22 UTC

Hi Michael,
I'm re-upgrading my server to 8.5-latest and will provide debug logs shortly.
As it is OCP talking to libvirt IPI locally, I will try to gather logs from the OC containers as well.

I looked at the libvirt kbase but what filters/outputs would you like to have me configure for runtime?

I was thinking of this:
# virt-admin daemon-log-outputs "1:file:/var/log/libvirt/libvirtd.log"
# virt-admin daemon-log-filters "3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 1:*"
Would that work for you?

Comment 24 Vincent S. Cojot 2022-02-07 13:35:39 UTC

To summarize the issue:
When ocp_libvirt_ipi starts, here's what happens:
- the openshift installer uses terraform to launch 1bootstrap + 3masters. This part works fine.
- Once the masters are up, they will launch a number of workers (3 for me).

This is the second part (launching the workers from the OCP machine config API) which stopped working two months ago in virt:av and two days ago in virt:rhel

Comment 25 Michal Privoznik 2022-02-07 13:48:13 UTC

(In reply to Vincent S. Cojot from comment #23)
> Hi Michael,
> I'm re-upgrading my server to 8.5-latest and will provide debug logs shortly.
> As it is OCP talking to libvirt IPI locally, I will try to gather logs from
> the OC containers as well.
> 
> I looked at the libvirt kbase but what filters/outputs would you like to
> have me configure for runtime?
> 
> I was thinking of this:
> # virt-admin daemon-log-outputs "1:file:/var/log/libvirt/libvirtd.log"
> # virt-admin daemon-log-filters "3:remote 4:event 3:util.json 3:util.object
> 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 1:*"
> Would that work for you?

this might work. but don't forget to set up & collect the client logs too. This is done by setting LIBVIRT_LOG_OUTPUTS env var, this should be sufficient: export LIBVIRT_LOG_OUTPUTS="1:file:/tmp/libvirt_client.log"

(In reply to Vincent S. Cojot from comment #24)
> To summarize the issue:
> When ocp_libvirt_ipi starts, here's what happens:
> - the openshift installer uses terraform to launch 1bootstrap + 3masters.
> This part works fine.
> - Once the masters are up, they will launch a number of workers (3 for me).
> 
> This is the second part (launching the workers from the OCP machine config
> API) which stopped working two months ago in virt:av and two days ago in
> virt:rhel

As I commented on the github issue ( https://github.com/openshift/installer/issues/5401#issuecomment-1031380358 ) could it be that the actual problem here is terraform which intentionally switches the arguments? I mean, comment 1 points to the commit that does exactly that. What can we do to make that bug^Wworkaround go away?

Comment 26 Vincent S. Cojot 2022-02-07 14:00:25 UTC

Here's what I am seeing in the log (I will attach the full log shortly) when I grep for errors:

2022-02-07 13:42:37.021+0000: 1193200: error : virNetSocketReadWire:1832 : End of file while reading data: Input/output error
2022-02-07 13:42:38.040+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125):
2022-02-07 13:42:38.045+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125):
2022-02-07 13:42:38.050+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125): internal error: child reported (status=125):
2022-02-07 13:42:38.055+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125): internal error: child reported (status=125): internal error: child reported (status=125):
2022-02-07 13:45:44.523+0000: 1193201: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'
2022-02-07 13:45:45.827+0000: 1193205: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'
2022-02-07 13:45:48.211+0000: 1193205: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'
2022-02-07 13:45:49.488+0000: 1193201: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'
2022-02-07 13:45:51.884+0000: 1193205: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'

this is on the latest libvirt for 8.5:
[root@daltigoth ~]# rpm -qa libvirt\*
libvirt-daemon-kvm-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-libs-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-nwfilter-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-gluster-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-scsi-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-bash-completion-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-qemu-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-disk-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-mpath-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-client-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-admin-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-core-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-nodedev-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-secret-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-iscsi-direct-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-glib-3.0.0-1.el8.x86_64
libvirt-daemon-driver-storage-logical-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-rbd-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-interface-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-config-network-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-storage-iscsi-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-devel-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-driver-network-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
libvirt-daemon-config-nwfilter-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64

Comment 27 Michal Privoznik 2022-02-07 14:02:11 UTC

(In reply to Vincent S. Cojot from comment #26)
> Here's what I am seeing in the log (I will attach the full log shortly) when
> I grep for errors:
> 

We've known this much. I'd rather look at the debug logs. Thanks.

Comment 29 Vincent S. Cojot 2022-02-07 14:09:41 UTC

@mprivozn I just attached the logs, please let me know if I did it right (not too familiar with taking libvirt logs)

Comment 30 Vincent S. Cojot 2022-02-07 14:16:46 UTC

Here's what I am seeing from the OCP logs:
[root@daltigoth ~]# oc project openshift-machine-api
Already on project "openshift-machine-api" on server "https://api.ocp4d.openshift.lasthome.solace.krynn:6443".
[root@daltigoth ~]# oc get pods
NAME                                           READY   STATUS    RESTARTS      AGE
cluster-autoscaler-operator-745d696cd7-jnhgl   2/2     Running   0             37m
cluster-baremetal-operator-649844f896-lw7mw    2/2     Running   1 (34m ago)   37m
machine-api-controllers-5546f846bd-qftdv       7/7     Running   3 (30m ago)   34m
machine-api-operator-78b4684b94-72dpw          2/2     Running   0             37m
[root@daltigoth ~]# oc logs machine-api-controllers-5546f846bd-qftdv machine-controller|tail -20
E0207 14:02:42.512612       1 actuator.go:107] Machine error: error creating domain virError(Code=84, Domain=19, Message='Operation not supported: can't update 'bridge' sec
tion of network 'ocp4d-wrmf5'')
E0207 14:02:42.512642       1 actuator.go:51] ocp4d-wrmf5-worker-0-jxrnv: error creating libvirt machine: error creating domain virError(Code=84, Domain=19, Message='Operat
ion not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'')
I0207 14:02:42.512652       1 client.go:158] Freeing the client pool
I0207 14:02:42.512664       1 client.go:164] Closing libvirt connection: 0xc000b595b0
W0207 14:02:42.513068       1 controller.go:316] ocp4d-wrmf5-worker-0-jxrnv: failed to create machine: ocp4d-wrmf5-worker-0-jxrnv: error creating libvirt machine: error cre
ating domain virError(Code=84, Domain=19, Message='Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'')
I0207 14:02:43.514029       1 controller.go:170] ocp4d-wrmf5-worker-0-rb6pv: reconciling Machine
I0207 14:02:43.514073       1 actuator.go:220] Checking if machine ocp4d-wrmf5-worker-0-rb6pv exists.
I0207 14:02:43.517354       1 client.go:142] Created libvirt connection: 0xc000a82a80
I0207 14:02:43.517748       1 client.go:317] Check if "ocp4d-wrmf5-worker-0-rb6pv" domain exists
I0207 14:02:43.518134       1 client.go:158] Freeing the client pool
I0207 14:02:43.518160       1 client.go:164] Closing libvirt connection: 0xc000a82a80
I0207 14:02:43.518546       1 controller.go:314] ocp4d-wrmf5-worker-0-rb6pv: reconciling machine triggers idempotent create
I0207 14:02:43.518567       1 actuator.go:113] Creating machine "ocp4d-wrmf5-worker-0-rb6pv"
I0207 14:02:43.520660       1 client.go:142] Created libvirt connection: 0xc000a82d90
I0207 14:02:43.521003       1 client.go:384] Create a libvirt volume with name ocp4d-wrmf5-worker-0-rb6pv for pool ocp4d-wrmf5 from the base volume ocp4d-wrmf5-base
E0207 14:02:43.521361       1 actuator.go:107] Machine error: error creating volume storage volume 'ocp4d-wrmf5-worker-0-rb6pv' already exists
E0207 14:02:43.521378       1 actuator.go:51] ocp4d-wrmf5-worker-0-rb6pv: error creating libvirt machine: error creating volume storage volume 'ocp4d-wrmf5-worker-0-rb6pv' 
already exists
I0207 14:02:43.521383       1 client.go:158] Freeing the client pool
I0207 14:02:43.521394       1 client.go:164] Closing libvirt connection: 0xc000a82d90
W0207 14:02:43.521732       1 controller.go:316] ocp4d-wrmf5-worker-0-rb6pv: failed to create machine: ocp4d-wrmf5-worker-0-rb6pv: error creating libvirt machine: error cre
ating volume storage volume 'ocp4d-wrmf5-worker-0-rb6pv' already exists

Is it possible we're running into another issue?
root@daltigoth ~]# virsh vol-list ocp4d-wrmf5|grep worker
 ocp4d-wrmf5-worker-0-rb6pv   /var/lib/libvirt/openshift-images/ocp4d-wrmf5/ocp4d-wrmf5-worker-0-rb6pv
[root@daltigoth ~]# ls -l /var/lib/libvirt/openshift-images/ocp4d-wrmf5/ocp4d-wrmf5-worker-0-rb6pv
-rw-r--r--. 1 root root 200704 Feb  7 08:39 /var/lib/libvirt/openshift-images/ocp4d-wrmf5/ocp4d-wrmf5-worker-0-rb6pv

Comment 31 Michal Privoznik 2022-02-07 14:52:39 UTC

You probably forgot to attach client log, because what you did attach is just the daemon log. Nevertheless, from the log I can see the following:

2022-02-07 13:32:28.750+0000: 1193205: debug : virThreadJobSet:94 : Thread 1193205 (virNetServerHandleJob) is now running job remoteDispatchNetworkUpdate
2022-02-07 13:32:28.750+0000: 1193205: debug : virNetworkUpdate:534 : network=0x7f41b400cb00, section=4, parentIndex=0, xml=  <host mac="52:54:00:aa:0d:a3" name="ocp4d-wrmf5-master-2.ocp4d.openshift.lasthome.solace.krynn" ip="192.168.246.13"></host>, flags=0x3
2022-02-07 13:32:28.750+0000: 1193205: debug : virNetworkUpdate:554 : Argument order feature detection returned: 1

IOW, the virNetworkUpdate() was called with the following arguments:

  section = VIR_NETWORK_SECTION_IP_DHCP_HOST,
  parentIndex = 0,
  xml = "<host .../>"
  flags = VIR_NETWORK_UPDATE_AFFECT_LIVE | VIR_NETWORK_UPDATE_AFFECT_CONFIG

and this invocation is correct, because passed xml indeed corresponds to the IP_DHCP_HOST section. But then I also see:

2022-02-07 13:51:46.869+0000: 1193203: debug : virThreadJobSet:94 : Thread 1193203 (virNetServerHandleJob) is now running job remoteDispatchNetworkUpdate
2022-02-07 13:51:46.869+0000: 1193203: debug : virNetworkUpdate:534 : network=0x7f41bc00a220, section=1, parentIndex=-1, xml=  <host mac="1e:82:5f:77:0e:25" name="ocp4d-wrmf5-worker-0-jxrnv" ip="192.168.246.86"></host>, flags=0x0
2022-02-07 13:51:46.869+0000: 1193203: debug : virNetworkUpdate:554 : Argument order feature detection returned: 1
2022-02-07 13:51:46.869+0000: 1193203: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'

which is obviously wrong. The passed XML does not correspond to section 1 (VIR_NETWORK_SECTION_BRIDGE), hence the error. Therefore, my conclusion is that something is passing wrong arguments. I'll know more when I see the client log.

Comment 35 Vincent S. Cojot 2022-02-07 15:42:03 UTC

Inside of the pod, here's what I see:

[root@daltigoth ~]# oc rsh -c machine-controller machine-api-controllers-5546f846bd-qftdv
sh-4.4$ rpm -qa|grep virt
virt-what-1.18-9.el8_4.x86_64
libvirt-libs-6.0.0-35.1.module+el8.4.0+11273+64eb94ef.x86_64

Comment 38 Vincent S. Cojot 2022-02-07 19:26:17 UTC

there are probably hundreds of images floating around with that ancient and buggy libvirt.. is there not a way to make the patched libvirtd accept to talk to it in some backward compatibility mode?

Comment 41 Christophe Fergeau 2022-02-08 12:35:53 UTC

Fwiw, this can be reproduced with similar steps to the ones in the initial comment:
```
- set up a rhel 8.5 machine with libvirt-daemon-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64
- set up qemu:///system remote access on this machine (can be qemu+ssh, qemu+tcp, ...)
- run:
$ podman run -it --rm  registry.access.redhat.com/ubi8/ubi bash -c  "yum -y install libvirt-client-0:6.0.0-35.module+el8.4.0+10230+7a9b21e4.x86_64 && virsh -c qemu+tcp://$VIRT_AV_MACHINE/system net-update --network default --command modify  --section ip-dhcp-host '<host mac=\"7e:75:1a:57:b9:5b\" name=\"libvirt-test\" ip=\"192.168.122.10\"></host>'"
```

Comment 43 Christophe Fergeau 2022-02-08 16:23:59 UTC

> On client side, conn->driver points to so called remote driver which does nothing more than serialize all the arguments and send them to the daemon. There, the packet is deserialized and the API is called again, but this time conn->driver points to "real" driver (e.g. qemu driver). Hence, the public API is effectively called twice and it's not possible to distinguish within the function whether we're running on client or daemon side. Also, with split daemons any daemon can be in role of a client too.

Probably missing something here, but why isn't it possible to keep the changes in the remote driver:

+        int rc;
+
+        /* Since its introduction in v0.10.2-rc1~9 the @section and @command
+         * arguments were mistakenly swapped when passed to driver's callback.
+         * Detect if the other side is fixed already or not. */
+        rc = VIR_DRV_SUPPORTS_FEATURE(conn->driver, conn,
+                                      VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER);
+
+        VIR_DEBUG("Argument order feature detection returned: %d", rc);
+        if (rc < 0)
+            goto error;
+
+        if (rc == 0) {
+            /* Feature not supported, preserve swapped order */
+            ret = conn->networkDriver->networkUpdate(network, section, command,
+                                                     parentIndex, xml, flags);
+        } else {
+            /* Feature supported, correct order can be used */
+            ret = conn->networkDriver->networkUpdate(network, command, section,
+                                                     parentIndex, xml, flags);
+        }
+
 
but then return 0 in all other drivers connectSupportsFeature implementation?

+    case VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER:
+        return 0;
+

Maybe this is going to fail in the split daemon case?

> Well, if the fix is backported into rhel-8.2 and rhel-8.4 then we are golden, right? That's doable.

Yup, backporting fixes for this to 8.2 and 8.4 would fix OpenShift images.

Comment 44 Vincent S. Cojot 2022-02-08 22:48:57 UTC

The problem with backporting is that this would only fix the -latest- images, not the pre-existing images e.g: I need to work on 4.6.30 to reproduce a customer bug.. so if a fix like the one suggested by Christophe would work, I would favor that too.. :)

Comment 45 Michal Privoznik 2022-02-09 11:51:52 UTC

(In reply to Christophe Fergeau from comment #43)
> > On client side, conn->driver points to so called remote driver which does nothing more than serialize all the arguments and send them to the daemon. There, the packet is deserialized and the API is called again, but this time conn->driver points to "real" driver (e.g. qemu driver). Hence, the public API is effectively called twice and it's not possible to distinguish within the function whether we're running on client or daemon side. Also, with split daemons any daemon can be in role of a client too.
> 
> Probably missing something here, but why isn't it possible to keep the
> changes in the remote driver:
> 
> +        int rc;
> +
> +        /* Since its introduction in v0.10.2-rc1~9 the @section and @command
> +         * arguments were mistakenly swapped when passed to driver's
> callback.
> +         * Detect if the other side is fixed already or not. */
> +        rc = VIR_DRV_SUPPORTS_FEATURE(conn->driver, conn,
> +                                     
> VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER);
> +
> +        VIR_DEBUG("Argument order feature detection returned: %d", rc);
> +        if (rc < 0)
> +            goto error;
> +
> +        if (rc == 0) {
> +            /* Feature not supported, preserve swapped order */
> +            ret = conn->networkDriver->networkUpdate(network, section,
> command,
> +                                                     parentIndex, xml,
> flags);
> +        } else {
> +            /* Feature supported, correct order can be used */
> +            ret = conn->networkDriver->networkUpdate(network, command,
> section,
> +                                                     parentIndex, xml,
> flags);
> +        }
> +
>  
> but then return 0 in all other drivers connectSupportsFeature implementation?
> 
> +    case VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER:
> +        return 0;
> +
> 
> Maybe this is going to fail in the split daemon case?

Indeed. That's exactly what I'm seeing after I've implemented this. I mean, if both client and split daemons run with the change you're suggesting then everything works. But what does not work is older (unpatched) client talking to new split daemons or even monolithic daemon. Maybe there's a bug in my implementation:

https://gitlab.com/MichalPrivoznik/libvirt/-/commit/0b3b98ed45d9514c3c9f4028ccaf40c8c23ac92f

But I honestly doubt that. Put simply, the new (=patched) daemon has no knowledge whether client is patched too and whether it sent arguments in correct order. Mind you, at this point there are also versions with the current fix that's outside of remote driver and we don't want to break those either.

Comment 46 Jiri Denemark 2022-02-09 12:43:20 UTC

Well, the only way a daemon could know whether a client is new enough would be
adding a new flag (in addition to the existing VIR_DRV_FEATURE_...) that would
be passed by a client if both the client and the server was new enough. But
since it was not done at the same time, we would break clients that already
pass the arguments in correct order but were not updated with the new flag.
That said, we're in a pretty bad situation here...

Comment 47 Michal Privoznik 2022-02-09 12:52:20 UTC

(In reply to Vincent S. Cojot from comment #44)
> The problem with backporting is that this would only fix the -latest-
> images, not the pre-existing images e.g: I need to work on 4.6.30 to
> reproduce a customer bug.. so if a fix like the one suggested by Christophe
> would work, I would favor that too.. :)

Surely there has to be a way to upgrade older images too. I mean, what if there's a CVE that needs fixing?

Comment 48 Vincent S. Cojot 2022-02-09 14:30:48 UTC

(In reply to Michal Privoznik from comment #47)
> (In reply to Vincent S. Cojot from comment #44)
> > The problem with backporting is that this would only fix the -latest-
> > images, not the pre-existing images e.g: I need to work on 4.6.30 to
> > reproduce a customer bug.. so if a fix like the one suggested by Christophe
> > would work, I would favor that too.. :)
> 
> Surely there has to be a way to upgrade older images too. I mean, what if
> there's a CVE that needs fixing?

If there's a CVE that needs fixing, the message will give customers will be: update to the latest version (which has the fixes).
Here's I am talking about non-CVE cases were a customer might be staying on an older release (without the fix) because there's no CVE pushing them to update. We (field) would still need to be able to deploy that ancient version used by the customer to test out things. I agree that the changes to libvirt are the right thing to do (I am not familiar with that code base but I gather that from our discussion here) but it just breaks every container ever produced out there unless those can be rebuilt.. and then again these containers have an sha256 so the previously produced containers could not be rebuilt without updating the sha256, which we don't do..
IMVHO, this is precisely the purpose of the multiple registries out there (ours,the customers' and the end-users') : cache previously built images so that devs can skip rebuilding containers from scratch for every code change.

Comment 49 Michal Privoznik 2022-02-09 14:57:05 UTC

(In reply to Vincent S. Cojot from comment #48)

> Here's I am talking about non-CVE cases were a customer might be staying on
> an older release (without the fix) because there's no CVE pushing them to
> update.

Well, this can be viewed as a great opportunity for them to update. I'm sorry but if this was a bare metal machine and somebody was complaining that their package is broken even though there is a fixed one in an update channel then I'd tell them nothing more than to update. And if somebody fixed particular version ("vendor in" is the term I hear people use these days), then it's their own responsibility to follow up on updates.

> We (field) would still need to be able to deploy that ancient
> version used by the customer to test out things.

Well, you would be, wouldn't you? I mean, the latest image of RHEL-8.X would work. And if you need older image, with broken libvirt, then surely there has to be a workaround. I am trying to make this as painless as possible for users, but I just don't see other way than backporting the fix.

> I agree that the changes to
> libvirt are the right thing to do (I am not familiar with that code base but
> I gather that from our discussion here) but it just breaks every container
> ever produced out there unless those can be rebuilt.. and then again these
> containers have an sha256 so the previously produced containers could not be
> rebuilt without updating the sha256, which we don't do..
> IMVHO, this is precisely the purpose of the multiple registries out there
> (ours,the customers' and the end-users') : cache previously built images so
> that devs can skip rebuilding containers from scratch for every code change.

I hoped that this is automated process.

Comment 50 Michal Privoznik 2022-02-09 15:29:07 UTC

@Yalan, can you please move back to VERIFIED? This bug was mistakenly moved to NEW.

Comment 60 errata-xmlrpc 2022-05-10 13:24:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1759

Note You need to log in before you can comment on or make changes to this bug.