Bug 2029380
| Summary: | Incompatibilities between 8.5 virsh and libvirtd from virt:av | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Christophe Fergeau <cfergeau> | |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
| Status: | CLOSED ERRATA | QA Contact: | yalzhang <yalzhang> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 8.5 | CC: | dafu, jdenemar, jsuchane, kanderso, kboumedh, lmen, mprivozn, prkumar, psundara, smitterl, toneata, vcojot, virt-maint, xuzhang, yalzhang, ymankad | |
| Target Milestone: | rc | Keywords: | Triaged, ZStream | |
| Target Release: | --- | Flags: | toneata:
needinfo-
toneata: needinfo- pm-rhel: mirror+ |
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-8.0.0-0rc1.1.module+el8.6.0+13853+e8cd34b9 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2038812 2053519 2053520 (view as bug list) | Environment: | ||
| Last Closed: | 2022-05-10 13:24:19 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | 7.2.0 | |
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2038812, 2053519, 2053520 | |||
|
Description
Christophe Fergeau
2021-12-06 11:04:42 UTC
It's probably possible to workaround this issue in cluster-api-provider-libvirt by implementing something similar to https://github.com/dmacvicar/terraform-provider-libvirt/commit/0d74474808fea1b94e3e8cdd06aac35b10a0b596# (this codebase is using digitalocean/go-libvirt though, while the api provider uses libvirt/libvirt-go), but it's better if libvirt.so handles it instead of adding workarounds in all the users that need it. I can reproduce the bug by pure libvirt with below steps(Host1 and Host2 can be VMs): 1) Host1 install libvirt-libs-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64 2) Host2 install libvirt-libs-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64 3) On Host2, run(Host1 ip is 192.168.122.183): # virsh -c qemu+ssh://192.168.122.183/system net-update default add ip-dhcp-host '<host mac="7e:75:1a:57:b9:55" name="libvirt" ip="192.168.124.100"></host>' error: Failed to update network default error: Operation not supported: can't update 'ip' section of network 'default' # virsh -c qemu+ssh://192.168.122.183/system net-update default modify ip-dhcp-host '<host mac="7e:75:1a:57:b9:5b" name="libvirt-test" ip="192.168.124.90"></host>' error: Failed to update network default error: Operation not supported: can't update 'bridge' section of network 'default' (In reply to Christophe Fergeau from comment #3) > > Is this meant as a request for rhel-8.5 Z-stream? > > Yes it is, thanks for the reminder about the process :) I set `zstream?`, > hopefully that's the right/only flag that needs to be set? > Close... Since this will be included in the libvirt rebase for RHEL 8.6, I have set ITR=8.6.0 (Internal Target Release) and moved to POST. I have also set ZTR=8.5.0 (ZStream Target Release) in order request an 8.5.z-stream fix. I'll let Jiri and Jarda manage the rest of the libvirt assignment process, flag updates, etc. > > rhel-8.6 will be rebased > > The fact that 8.6 will get a newer libvirt makes it even more important that > this is fixed in zstream, it's probably not that unusual to not have all > machines upgraded to the latest RHEL. For example, RHEL CSB usually takes a > few months before switching to new RHEL releases. `virsh net-update` from > CSB to up to date server would fail if this is not backported to zstream. Test libvirt-8.0.0-0rc1.1.module+el8.6.0+13853+e8cd34b9.x86_64, with the steps in bug 1870552#c8, the result is as expected. 1. Switch to split daemon mode: # cat split_daemon.sh #!/bin/bash systemctl stop libvirtd.service systemctl stop libvirtd{,-ro,-admin,-tcp,-tls}.socket systemctl start virtqemud for drv in qemu interface network nodedev nwfilter secret storage proxy; do systemctl start virt${drv}d{,-ro,-admin}.socket ; done # sh split_daemon.sh # virsh uri qemu:///system # virsh net-update default add ip-dhcp-host '<host mac="7e:75:1a:57:b9:55" name="libvirt" ip="192.168.122.100"></host>' Updated network default live state # virsh net-dumpxml default <network> <name>default</name> <uuid>b50a1dfc-b343-4950-a78d-fd6cdbdcabc8</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr0' stp='on' delay='0'/> <mac address='52:54:00:07:a6:10'/> <ip address='192.168.122.1' netmask='255.255.255.0'> <dhcp> <range start='192.168.122.2' end='192.168.122.200'/> <host mac='7e:75:1a:57:b9:55' name='libvirt' ip='192.168.122.100'/> </dhcp> </ip> </network> 2. Newer libvirt client connect to older libvirt server: Prepare another host B with libvirt-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64(without the fix) as server: [host A] # virsh -c qemu+ssh://192.168.122.200/system net-update default add ip-dhcp-host '<host mac="7e:75:1a:57:b9:44" name="libvirt" ip="192.168.124.100"></host>' The authenticity of host '192.168.122.200 (192.168.122.200)' can't be established. ECDSA key fingerprint is SHA256:2C9hWtOW4CoEBGskObgRbSTzZ9ykCepW4nHnmmym69Q. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes root.122.200's password: Updated network default live state [host A]# virsh -c qemu+ssh://192.168.122.200/system net-dumpxml default root.122.200's password: <network> <name>default</name> <uuid>08816129-7b4d-46df-a1a3-a6e5a0b532e9</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr0' stp='on' delay='0'/> <mac address='52:54:00:d1:5e:13'/> <ip address='192.168.124.1' netmask='255.255.255.0'> <dhcp> <range start='192.168.124.2' end='192.168.124.254'/> <host mac='7e:75:1a:57:b9:44' name='libvirt' ip='192.168.124.100'/> </dhcp> </ip> </network> 3. Older libvirt client connect to libvirt-8.0.0 server [host B] # virsh -c qemu+ssh://192.168.122.8/system net-update default add dns-txt '<txt name="example" value="example value"/>' root.122.8's password: error: Failed to update network default error: Operation not supported: can't update 'ip' section of network 'default' Hi, there have been reported today (Feb 4th) that the same issue is now happening on plain RHEL8.5 with virt:rhel: https://github.com/openshift/installer/issues/5401 @ElCoyote27 I encountered it in stock RHEL 8.5. I had to downgrade libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.aarch64 to 6.0.0-37.module+el8.5.0+12162+40884dd2.aarch64. So watch out for the latest 8.5 updates. CC @cfergeau Did we just break virt:rhel too? @xuzhang Hi, there have been reported today (Feb 4th) that the same issue is now happening on plain RHEL8.5 with virt:rhel: https://github.com/openshift/installer/issues/5401 @ElCoyote27 I encountered it in stock RHEL 8.5. I had to downgrade libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.aarch64 to 6.0.0-37.module+el8.5.0+12162+40884dd2.aarch64. So watch out for the latest 8.5 updates. CC @cfergeau Did we just break virt:rhel too? @xuzhang I can confirm that on plain virt:rhel, updating from 6.0.0-37 to 6.0.0-37.1.el8 breaks ocp_libvirt_ipi in a similar fashion to the breakage we reported on virt:av two months ago. The changelog for the faulty libvirt shows: * Thu Jan 13 2022 Jiri Denemark <jdenemar> - 6.0.0-37.1.el8 - network: Implement virConnectSupportsFeature() (rhbz#2038812) - lib: Fix calling of virNetworkUpdate() driver callback (rhbz#2038812) Unfortunately, this breaks ocp_libvirt_ipi just like it broke for virt:av two months ago. * Last working version: 6.0.0-37.module+el8.5.0+12162+40884dd2 * first broken version: 6.0.0-37.1.module+el8.5.0+13858+39fdc467 the above is for virt:rhel for virt:av, the breakage happened there: * Last working version: 7.0.0-14.module+el8.4.0+10886+79296686 * first broken version: 7.6.0-6.module+el8.5.0+13051+7ddbe958 @jdenemar @mprivozn OK, I spoke too fast. I think we're looking at a 2nd BZ hidden behind the first issue. I will confirm with @Christophe Fergeau later Here's what I am seeing: Feb 06 12:58:43 daltigoth libvirtd[932688]: Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs' Feb 06 12:58:44 daltigoth libvirtd[932688]: Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs' Feb 06 12:58:46 daltigoth libvirtd[932688]: Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs' And the OCP workers remaing 'provisionning' for ever: [root@daltigoth ~]# oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ocp4d-5btvs-master-0 Running 105m openshift-machine-api ocp4d-5btvs-master-1 Running 105m openshift-machine-api ocp4d-5btvs-master-2 Running 105m openshift-machine-api ocp4d-5btvs-worker-0-rzq8t Provisioning 101m openshift-machine-api ocp4d-5btvs-worker-0-xw6ql Provisioning 101m openshift-machine-api ocp4d-5btvs-worker-0-z6dl7 Provisioning 101m [root@daltigoth ~]# rpm -q libvirt libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 [root@daltigoth ~]# dnf module list virt Updating Subscription Management repositories. [....] Advanced Virtualization for RHEL 8 x86_64 (RPMs) Name Stream Profiles Summary virt av common Virtualization module virt 8.0 common Virtualization module virt 8.0.0 common Virtualization module virt 8.1 common Virtualization module virt 8.2 common Virtualization module virt 8.3 common Virtualization module Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) Name Stream Profiles Summary virt rhel [d][e] common [d] Virtualization module Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled @jdenemar @mprivozn Sorry for the duplicate needinfo, please accept my apologies. I downgraded libvirt on the previously non working server: # yum downgrade libvirt-daemon-driver-storage libvirt-client libvirt-daemon-kvm libvirt-daemon-driver-secret libvirt-daemon-driver-storage-logical libvirt-daemon-driver-storage-rbd libvirt-libs libvirt-daemon-driver-qemu libvirt-daemon-driver-storage-gluster libvirt-daemon-driver-nwfilter libvirt-daemon-driver-storage-scsi libvirt-daemon-driver-storage-iscsi libvirt-bash-completion libvirt-daemon-driver-storage-core libvirt-daemon-config-network libvirt-daemon-driver-interface libvirt-daemon-driver-storage-disk libvirt-daemon-driver-storage-mpath libvirt-devel libvirt-daemon libvirt-daemon-config-nwfilter libvirt-daemon-driver-nodedev libvirt-daemon-driver-storage-iscsi-direct libvirt-daemon-driver-network libvirt [...] Installed products updated. Downgraded: libvirt-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-bash-completion-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-client-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-config-network-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-config-nwfilter-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-interface-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-network-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-nodedev-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-nwfilter-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-qemu-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-secret-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-core-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-disk-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-gluster-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-iscsi-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-iscsi-direct-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-logical-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-mpath-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-rbd-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-driver-storage-scsi-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-daemon-kvm-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-devel-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 libvirt-libs-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 [root@daltigoth ~]# systemctl restart libvirtd After that ocp_libvirt_ipi works again and I get this: [root@daltigoth ~]# oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ocp4d-2ks4x-master-0 Running 29m openshift-machine-api ocp4d-2ks4x-master-1 Running 29m openshift-machine-api ocp4d-2ks4x-master-2 Running 29m openshift-machine-api ocp4d-2ks4x-worker-0-6xvqw Running 26m openshift-machine-api ocp4d-2ks4x-worker-0-mm6jp Running 26m openshift-machine-api ocp4d-2ks4x-worker-0-rwmhg Running 26m Vincent, I'm not exactly sure what's going on. What is the API/virsh command that's failing? I mean, I can see: Operation not supported: can't update 'bridge' section of network 'ocp4d-5btvs' but I'd love to see what arguments was the virNetworkUpdate() API called with. Do you think you can capture libvirt logs for me? And I'll need both client and daemon logs. https://libvirt.org/kbase/debuglogs.html Hi Michael, I'm re-upgrading my server to 8.5-latest and will provide debug logs shortly. As it is OCP talking to libvirt IPI locally, I will try to gather logs from the OC containers as well. I looked at the libvirt kbase but what filters/outputs would you like to have me configure for runtime? I was thinking of this: # virt-admin daemon-log-outputs "1:file:/var/log/libvirt/libvirtd.log" # virt-admin daemon-log-filters "3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 1:*" Would that work for you? To summarize the issue: When ocp_libvirt_ipi starts, here's what happens: - the openshift installer uses terraform to launch 1bootstrap + 3masters. This part works fine. - Once the masters are up, they will launch a number of workers (3 for me). This is the second part (launching the workers from the OCP machine config API) which stopped working two months ago in virt:av and two days ago in virt:rhel (In reply to Vincent S. Cojot from comment #23) > Hi Michael, > I'm re-upgrading my server to 8.5-latest and will provide debug logs shortly. > As it is OCP talking to libvirt IPI locally, I will try to gather logs from > the OC containers as well. > > I looked at the libvirt kbase but what filters/outputs would you like to > have me configure for runtime? > > I was thinking of this: > # virt-admin daemon-log-outputs "1:file:/var/log/libvirt/libvirtd.log" > # virt-admin daemon-log-filters "3:remote 4:event 3:util.json 3:util.object > 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 1:*" > Would that work for you? this might work. but don't forget to set up & collect the client logs too. This is done by setting LIBVIRT_LOG_OUTPUTS env var, this should be sufficient: export LIBVIRT_LOG_OUTPUTS="1:file:/tmp/libvirt_client.log" (In reply to Vincent S. Cojot from comment #24) > To summarize the issue: > When ocp_libvirt_ipi starts, here's what happens: > - the openshift installer uses terraform to launch 1bootstrap + 3masters. > This part works fine. > - Once the masters are up, they will launch a number of workers (3 for me). > > This is the second part (launching the workers from the OCP machine config > API) which stopped working two months ago in virt:av and two days ago in > virt:rhel As I commented on the github issue ( https://github.com/openshift/installer/issues/5401#issuecomment-1031380358 ) could it be that the actual problem here is terraform which intentionally switches the arguments? I mean, comment 1 points to the commit that does exactly that. What can we do to make that bug^Wworkaround go away? Here's what I am seeing in the log (I will attach the full log shortly) when I grep for errors: 2022-02-07 13:42:37.021+0000: 1193200: error : virNetSocketReadWire:1832 : End of file while reading data: Input/output error 2022-02-07 13:42:38.040+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125): 2022-02-07 13:42:38.045+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125): 2022-02-07 13:42:38.050+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125): internal error: child reported (status=125): 2022-02-07 13:42:38.055+0000: 1193205: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125): internal error: child reported (status=125): internal error: child reported (status=125): 2022-02-07 13:45:44.523+0000: 1193201: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5' 2022-02-07 13:45:45.827+0000: 1193205: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5' 2022-02-07 13:45:48.211+0000: 1193205: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5' 2022-02-07 13:45:49.488+0000: 1193201: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5' 2022-02-07 13:45:51.884+0000: 1193205: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5' this is on the latest libvirt for 8.5: [root@daltigoth ~]# rpm -qa libvirt\* libvirt-daemon-kvm-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-libs-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-nwfilter-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-gluster-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-scsi-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-bash-completion-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-qemu-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-disk-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-mpath-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-client-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-admin-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-core-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-nodedev-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-secret-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-iscsi-direct-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-glib-3.0.0-1.el8.x86_64 libvirt-daemon-driver-storage-logical-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-rbd-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-interface-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-config-network-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-storage-iscsi-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-devel-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-driver-network-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 libvirt-daemon-config-nwfilter-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 (In reply to Vincent S. Cojot from comment #26) > Here's what I am seeing in the log (I will attach the full log shortly) when > I grep for errors: > We've known this much. I'd rather look at the debug logs. Thanks. @mprivozn I just attached the logs, please let me know if I did it right (not too familiar with taking libvirt logs) Here's what I am seeing from the OCP logs: [root@daltigoth ~]# oc project openshift-machine-api Already on project "openshift-machine-api" on server "https://api.ocp4d.openshift.lasthome.solace.krynn:6443". [root@daltigoth ~]# oc get pods NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-745d696cd7-jnhgl 2/2 Running 0 37m cluster-baremetal-operator-649844f896-lw7mw 2/2 Running 1 (34m ago) 37m machine-api-controllers-5546f846bd-qftdv 7/7 Running 3 (30m ago) 34m machine-api-operator-78b4684b94-72dpw 2/2 Running 0 37m [root@daltigoth ~]# oc logs machine-api-controllers-5546f846bd-qftdv machine-controller|tail -20 E0207 14:02:42.512612 1 actuator.go:107] Machine error: error creating domain virError(Code=84, Domain=19, Message='Operation not supported: can't update 'bridge' sec tion of network 'ocp4d-wrmf5'') E0207 14:02:42.512642 1 actuator.go:51] ocp4d-wrmf5-worker-0-jxrnv: error creating libvirt machine: error creating domain virError(Code=84, Domain=19, Message='Operat ion not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'') I0207 14:02:42.512652 1 client.go:158] Freeing the client pool I0207 14:02:42.512664 1 client.go:164] Closing libvirt connection: 0xc000b595b0 W0207 14:02:42.513068 1 controller.go:316] ocp4d-wrmf5-worker-0-jxrnv: failed to create machine: ocp4d-wrmf5-worker-0-jxrnv: error creating libvirt machine: error cre ating domain virError(Code=84, Domain=19, Message='Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5'') I0207 14:02:43.514029 1 controller.go:170] ocp4d-wrmf5-worker-0-rb6pv: reconciling Machine I0207 14:02:43.514073 1 actuator.go:220] Checking if machine ocp4d-wrmf5-worker-0-rb6pv exists. I0207 14:02:43.517354 1 client.go:142] Created libvirt connection: 0xc000a82a80 I0207 14:02:43.517748 1 client.go:317] Check if "ocp4d-wrmf5-worker-0-rb6pv" domain exists I0207 14:02:43.518134 1 client.go:158] Freeing the client pool I0207 14:02:43.518160 1 client.go:164] Closing libvirt connection: 0xc000a82a80 I0207 14:02:43.518546 1 controller.go:314] ocp4d-wrmf5-worker-0-rb6pv: reconciling machine triggers idempotent create I0207 14:02:43.518567 1 actuator.go:113] Creating machine "ocp4d-wrmf5-worker-0-rb6pv" I0207 14:02:43.520660 1 client.go:142] Created libvirt connection: 0xc000a82d90 I0207 14:02:43.521003 1 client.go:384] Create a libvirt volume with name ocp4d-wrmf5-worker-0-rb6pv for pool ocp4d-wrmf5 from the base volume ocp4d-wrmf5-base E0207 14:02:43.521361 1 actuator.go:107] Machine error: error creating volume storage volume 'ocp4d-wrmf5-worker-0-rb6pv' already exists E0207 14:02:43.521378 1 actuator.go:51] ocp4d-wrmf5-worker-0-rb6pv: error creating libvirt machine: error creating volume storage volume 'ocp4d-wrmf5-worker-0-rb6pv' already exists I0207 14:02:43.521383 1 client.go:158] Freeing the client pool I0207 14:02:43.521394 1 client.go:164] Closing libvirt connection: 0xc000a82d90 W0207 14:02:43.521732 1 controller.go:316] ocp4d-wrmf5-worker-0-rb6pv: failed to create machine: ocp4d-wrmf5-worker-0-rb6pv: error creating libvirt machine: error cre ating volume storage volume 'ocp4d-wrmf5-worker-0-rb6pv' already exists Is it possible we're running into another issue? root@daltigoth ~]# virsh vol-list ocp4d-wrmf5|grep worker ocp4d-wrmf5-worker-0-rb6pv /var/lib/libvirt/openshift-images/ocp4d-wrmf5/ocp4d-wrmf5-worker-0-rb6pv [root@daltigoth ~]# ls -l /var/lib/libvirt/openshift-images/ocp4d-wrmf5/ocp4d-wrmf5-worker-0-rb6pv -rw-r--r--. 1 root root 200704 Feb 7 08:39 /var/lib/libvirt/openshift-images/ocp4d-wrmf5/ocp4d-wrmf5-worker-0-rb6pv You probably forgot to attach client log, because what you did attach is just the daemon log. Nevertheless, from the log I can see the following: 2022-02-07 13:32:28.750+0000: 1193205: debug : virThreadJobSet:94 : Thread 1193205 (virNetServerHandleJob) is now running job remoteDispatchNetworkUpdate 2022-02-07 13:32:28.750+0000: 1193205: debug : virNetworkUpdate:534 : network=0x7f41b400cb00, section=4, parentIndex=0, xml= <host mac="52:54:00:aa:0d:a3" name="ocp4d-wrmf5-master-2.ocp4d.openshift.lasthome.solace.krynn" ip="192.168.246.13"></host>, flags=0x3 2022-02-07 13:32:28.750+0000: 1193205: debug : virNetworkUpdate:554 : Argument order feature detection returned: 1 IOW, the virNetworkUpdate() was called with the following arguments: section = VIR_NETWORK_SECTION_IP_DHCP_HOST, parentIndex = 0, xml = "<host .../>" flags = VIR_NETWORK_UPDATE_AFFECT_LIVE | VIR_NETWORK_UPDATE_AFFECT_CONFIG and this invocation is correct, because passed xml indeed corresponds to the IP_DHCP_HOST section. But then I also see: 2022-02-07 13:51:46.869+0000: 1193203: debug : virThreadJobSet:94 : Thread 1193203 (virNetServerHandleJob) is now running job remoteDispatchNetworkUpdate 2022-02-07 13:51:46.869+0000: 1193203: debug : virNetworkUpdate:534 : network=0x7f41bc00a220, section=1, parentIndex=-1, xml= <host mac="1e:82:5f:77:0e:25" name="ocp4d-wrmf5-worker-0-jxrnv" ip="192.168.246.86"></host>, flags=0x0 2022-02-07 13:51:46.869+0000: 1193203: debug : virNetworkUpdate:554 : Argument order feature detection returned: 1 2022-02-07 13:51:46.869+0000: 1193203: error : virNetworkDefUpdateNoSupport:2772 : Operation not supported: can't update 'bridge' section of network 'ocp4d-wrmf5' which is obviously wrong. The passed XML does not correspond to section 1 (VIR_NETWORK_SECTION_BRIDGE), hence the error. Therefore, my conclusion is that something is passing wrong arguments. I'll know more when I see the client log. Inside of the pod, here's what I see: [root@daltigoth ~]# oc rsh -c machine-controller machine-api-controllers-5546f846bd-qftdv sh-4.4$ rpm -qa|grep virt virt-what-1.18-9.el8_4.x86_64 libvirt-libs-6.0.0-35.1.module+el8.4.0+11273+64eb94ef.x86_64 there are probably hundreds of images floating around with that ancient and buggy libvirt.. is there not a way to make the patched libvirtd accept to talk to it in some backward compatibility mode? Fwiw, this can be reproduced with similar steps to the ones in the initial comment: ``` - set up a rhel 8.5 machine with libvirt-daemon-6.0.0-37.1.module+el8.5.0+13858+39fdc467.x86_64 - set up qemu:///system remote access on this machine (can be qemu+ssh, qemu+tcp, ...) - run: $ podman run -it --rm registry.access.redhat.com/ubi8/ubi bash -c "yum -y install libvirt-client-0:6.0.0-35.module+el8.4.0+10230+7a9b21e4.x86_64 && virsh -c qemu+tcp://$VIRT_AV_MACHINE/system net-update --network default --command modify --section ip-dhcp-host '<host mac=\"7e:75:1a:57:b9:5b\" name=\"libvirt-test\" ip=\"192.168.122.10\"></host>'" ``` > On client side, conn->driver points to so called remote driver which does nothing more than serialize all the arguments and send them to the daemon. There, the packet is deserialized and the API is called again, but this time conn->driver points to "real" driver (e.g. qemu driver). Hence, the public API is effectively called twice and it's not possible to distinguish within the function whether we're running on client or daemon side. Also, with split daemons any daemon can be in role of a client too. Probably missing something here, but why isn't it possible to keep the changes in the remote driver: + int rc; + + /* Since its introduction in v0.10.2-rc1~9 the @section and @command + * arguments were mistakenly swapped when passed to driver's callback. + * Detect if the other side is fixed already or not. */ + rc = VIR_DRV_SUPPORTS_FEATURE(conn->driver, conn, + VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER); + + VIR_DEBUG("Argument order feature detection returned: %d", rc); + if (rc < 0) + goto error; + + if (rc == 0) { + /* Feature not supported, preserve swapped order */ + ret = conn->networkDriver->networkUpdate(network, section, command, + parentIndex, xml, flags); + } else { + /* Feature supported, correct order can be used */ + ret = conn->networkDriver->networkUpdate(network, command, section, + parentIndex, xml, flags); + } + but then return 0 in all other drivers connectSupportsFeature implementation? + case VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER: + return 0; + Maybe this is going to fail in the split daemon case? > Well, if the fix is backported into rhel-8.2 and rhel-8.4 then we are golden, right? That's doable. Yup, backporting fixes for this to 8.2 and 8.4 would fix OpenShift images. The problem with backporting is that this would only fix the -latest- images, not the pre-existing images e.g: I need to work on 4.6.30 to reproduce a customer bug.. so if a fix like the one suggested by Christophe would work, I would favor that too.. :) (In reply to Christophe Fergeau from comment #43) > > On client side, conn->driver points to so called remote driver which does nothing more than serialize all the arguments and send them to the daemon. There, the packet is deserialized and the API is called again, but this time conn->driver points to "real" driver (e.g. qemu driver). Hence, the public API is effectively called twice and it's not possible to distinguish within the function whether we're running on client or daemon side. Also, with split daemons any daemon can be in role of a client too. > > Probably missing something here, but why isn't it possible to keep the > changes in the remote driver: > > + int rc; > + > + /* Since its introduction in v0.10.2-rc1~9 the @section and @command > + * arguments were mistakenly swapped when passed to driver's > callback. > + * Detect if the other side is fixed already or not. */ > + rc = VIR_DRV_SUPPORTS_FEATURE(conn->driver, conn, > + > VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER); > + > + VIR_DEBUG("Argument order feature detection returned: %d", rc); > + if (rc < 0) > + goto error; > + > + if (rc == 0) { > + /* Feature not supported, preserve swapped order */ > + ret = conn->networkDriver->networkUpdate(network, section, > command, > + parentIndex, xml, > flags); > + } else { > + /* Feature supported, correct order can be used */ > + ret = conn->networkDriver->networkUpdate(network, command, > section, > + parentIndex, xml, > flags); > + } > + > > but then return 0 in all other drivers connectSupportsFeature implementation? > > + case VIR_DRV_FEATURE_NETWORK_UPDATE_HAS_CORRECT_ORDER: > + return 0; > + > > Maybe this is going to fail in the split daemon case? Indeed. That's exactly what I'm seeing after I've implemented this. I mean, if both client and split daemons run with the change you're suggesting then everything works. But what does not work is older (unpatched) client talking to new split daemons or even monolithic daemon. Maybe there's a bug in my implementation: https://gitlab.com/MichalPrivoznik/libvirt/-/commit/0b3b98ed45d9514c3c9f4028ccaf40c8c23ac92f But I honestly doubt that. Put simply, the new (=patched) daemon has no knowledge whether client is patched too and whether it sent arguments in correct order. Mind you, at this point there are also versions with the current fix that's outside of remote driver and we don't want to break those either. Well, the only way a daemon could know whether a client is new enough would be adding a new flag (in addition to the existing VIR_DRV_FEATURE_...) that would be passed by a client if both the client and the server was new enough. But since it was not done at the same time, we would break clients that already pass the arguments in correct order but were not updated with the new flag. That said, we're in a pretty bad situation here... (In reply to Vincent S. Cojot from comment #44) > The problem with backporting is that this would only fix the -latest- > images, not the pre-existing images e.g: I need to work on 4.6.30 to > reproduce a customer bug.. so if a fix like the one suggested by Christophe > would work, I would favor that too.. :) Surely there has to be a way to upgrade older images too. I mean, what if there's a CVE that needs fixing? (In reply to Michal Privoznik from comment #47) > (In reply to Vincent S. Cojot from comment #44) > > The problem with backporting is that this would only fix the -latest- > > images, not the pre-existing images e.g: I need to work on 4.6.30 to > > reproduce a customer bug.. so if a fix like the one suggested by Christophe > > would work, I would favor that too.. :) > > Surely there has to be a way to upgrade older images too. I mean, what if > there's a CVE that needs fixing? If there's a CVE that needs fixing, the message will give customers will be: update to the latest version (which has the fixes). Here's I am talking about non-CVE cases were a customer might be staying on an older release (without the fix) because there's no CVE pushing them to update. We (field) would still need to be able to deploy that ancient version used by the customer to test out things. I agree that the changes to libvirt are the right thing to do (I am not familiar with that code base but I gather that from our discussion here) but it just breaks every container ever produced out there unless those can be rebuilt.. and then again these containers have an sha256 so the previously produced containers could not be rebuilt without updating the sha256, which we don't do.. IMVHO, this is precisely the purpose of the multiple registries out there (ours,the customers' and the end-users') : cache previously built images so that devs can skip rebuilding containers from scratch for every code change. (In reply to Vincent S. Cojot from comment #48) > Here's I am talking about non-CVE cases were a customer might be staying on > an older release (without the fix) because there's no CVE pushing them to > update. Well, this can be viewed as a great opportunity for them to update. I'm sorry but if this was a bare metal machine and somebody was complaining that their package is broken even though there is a fixed one in an update channel then I'd tell them nothing more than to update. And if somebody fixed particular version ("vendor in" is the term I hear people use these days), then it's their own responsibility to follow up on updates. > We (field) would still need to be able to deploy that ancient > version used by the customer to test out things. Well, you would be, wouldn't you? I mean, the latest image of RHEL-8.X would work. And if you need older image, with broken libvirt, then surely there has to be a workaround. I am trying to make this as painless as possible for users, but I just don't see other way than backporting the fix. > I agree that the changes to > libvirt are the right thing to do (I am not familiar with that code base but > I gather that from our discussion here) but it just breaks every container > ever produced out there unless those can be rebuilt.. and then again these > containers have an sha256 so the previously produced containers could not be > rebuilt without updating the sha256, which we don't do.. > IMVHO, this is precisely the purpose of the multiple registries out there > (ours,the customers' and the end-users') : cache previously built images so > that devs can skip rebuilding containers from scratch for every code change. I hoped that this is automated process. @Yalan, can you please move back to VERIFIED? This bug was mistakenly moved to NEW. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1759 |