In some cases, crio is failing to start on a baremetal IPv6 run, with a message like this: Jun 10 23:14:42 master-0.ostest.test.metalkube.org crio[2878]: time="2022-06-10 23:14:42.595976451Z" level=fatal msg="Failed to start streaming server: listen tcp [fd2e:6f44:5dd8:c956::14]:10010: bind: cannot assign requested address" Shortly before, we see configure-ovs.sh moving the IP to br-ex: Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[1758]: + echo 'Brought up connection ovs-if-br-ex successfully' And the IP is marked tentative, which Derek Higgens tested and confirms makes crio refuse to bind to it. Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: inet6 fd2e:6f44:5dd8:c956::14/128 scope global tentative dynamic noprefixroute And cri-o's service unit is set to restart on-abnormal not on-failure: https://github.com/cri-o/cri-o/blob/main/contrib/systemd/crio.service#L25, which means it won't retry on this kind of failure: If set to on-abnormal, the service will be restarted when the process is terminated by a signal (including on core dump, excluding the aforementioned four signals), when an operation times out, or when the watchdog timeout is triggered. NetworkManager from 8.6 has had significant parts rewritten, as well as changes to configure-ovs.sh to compensate, so it's likely there's been some timing changes that makes this error occur.
It appears that NM isn't waiting for DHCPv6 addresses to finish DAD before indicating the connection has activated when v6.may-fail=no Jun 10 23:14:38 master-0.ostest.test.metalkube.org configure-ovs.sh[1758]: + nmcli c add type ovs-interface slave-type ovs-port conn.interface br-ex master ovs-port-br-ex con-name ovs-if-br-ex 802-3-ethernet.mtu 1500 802-3-ethernet.cloned-mac-address 00:5f:27:59:1f:38 ipv4.route-metric 48 ipv6.route-metric 48 ipv6.may-fail no ipv6.addr-gen-mode eui64 connection.autoconnect no <snip> Jun 10 23:14:40 master-0.ostest.test.metalkube.org NetworkManager[1398]: <info> [1654902880.8275] dhcp6 (br-ex): activation: beginning transaction (timeout in 45 seconds) Jun 10 23:14:40 master-0.ostest.test.metalkube.org NetworkManager[1398]: <info> [1654902880.8284] dhcp6 (br-ex): state changed new lease, address=fd2e:6f44:5dd8:c956::14 Jun 10 23:14:40 master-0.ostest.test.metalkube.org NetworkManager[1398]: <info> [1654902880.8923] policy: set 'ovs-if-br-ex' (br-ex) as default for IPv6 routing and DNS Jun 10 23:14:41 master-0.ostest.test.metalkube.org NetworkManager[1398]: <info> [1654902881.7901] device (br-ex): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed') Jun 10 23:14:41 master-0.ostest.test.metalkube.org NetworkManager[1398]: <info> [1654902881.7903] device (br-ex): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Jun 10 23:14:41 master-0.ostest.test.metalkube.org NetworkManager[1398]: <info> [1654902881.7906] device (br-ex): Activation: successful, device activated. <snip> Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: link/ether 00:5f:27:59:1f:38 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535 Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: openvswitch numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: inet6 fd2e:6f44:5dd8:c956::14/128 scope global tentative dynamic noprefixroute Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: valid_lft 3599sec preferred_lft 3599sec Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: inet6 fe80::25f:27ff:fe59:1f38/64 scope link noprefixroute Jun 10 23:14:41 master-0.ostest.test.metalkube.org configure-ovs.sh[2751]: valid_lft forever preferred_lft forever <snip> Jun 10 23:14:41 master-0.ostest.test.metalkube.org systemd[1]: Starting Wait for a non-localhost hostname... Jun 10 23:14:42 master-0.ostest.test.metalkube.org systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069