Azure is having siginificant problems since the re-introduction of RHEL 8.6 content. See: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-azure-ovn-upgrade-4.11-micro-release-openshift-release-analysis-aggregator/1516440543658250240 It appears that some nodes never come back after being rebooted into the new version of RHCOS, e.g.: Node ci-op-l990y20q-99831-k7z6j-master-1 went unready at 2022-04-19T16:53:18Z, never became ready again I am working on collecting more data, including serial logs and will provide it as soon as I have it. Unfinished Jobs periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1516440534875377664 periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1516440538209849344 periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1516440539908542464 periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1516440540718043136 periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1516440542404153344
Created attachment 1873842 [details] First boot log from worker upgrading to RHCOS based on 8.6
From the first boot after upgrading to 411.86.202204190939-0 * It does get a DHCP lease: [ 7.476859] NetworkManager[809]: <info> [1650470349.9631] dhcp4 (eth0): state changed new lease, address=10.0.128.4 Later on, it runs configure-ovs.sh and it does some network-manager things: [ 58.840163] configure-ovs.sh[1733]: Removed nmconnection file /etc/NetworkManager/system-connections/ovs-port-phys0.nmconnection [ 58.840671] configure-ovs.sh[1733]: + nm_config_changed=1 [ 58.841126] configure-ovs.sh[1733]: + ovs-vsctl --timeout=30 --if-exists del-br br-ex [ 58.927356] configure-ovs.sh[1733]: + '[' -d /sys/class/net/br-ex1 ']' [ 58.930344] configure-ovs.sh[1733]: + echo 'OVS configuration successfully reverted' [ 58.933182] configure-ovs.sh[1733]: OVS configuration successfully reverted [ 58.933761] configure-ovs.sh[1733]: + reload_nm [ 58.934275] configure-ovs.sh[1733]: + '[' 1 -eq 0 ']' [ 58.934796] configure-ovs.sh[1733]: + nm_config_changed=0 [ 58.935893] configure-ovs.sh[1733]: + echo 'Reloading NetworkManager after configuration changes...' [ 58.936924] configure-ovs.sh[1733]: Reloading NetworkManager after configuration changes... [ 58.937889] configure-ovs.sh[1733]: + nmcli network off [ 58.957462] configure-ovs.sh[1733]: + echo 'Waiting for devices to disconnect...' [ 58.960596] configure-ovs.sh[1733]: Waiting for devices to disconnect... [ 58.964345] configure-ovs.sh[1733]: + timeout 60 bash -c 'while nmcli -g DEVICE,STATE d | grep -v :unmanaged; do sleep 5; done' After this, the host never pulls a DHCP lease again. However, if I force reboot the host: $ az vm restart --force --name ci-op-547k206c-99831-fdm9f-worker-centralus3-h8mdx --resource-group ci-op-547k206c-99831-fdm9f-rg --subscription 72e3a972-58b0-4afc-bd4f-da89b39ccebd It does reboot and come back up into RHCOS 411.86.202204190939-0 just fine (see second boot log), and the host becomes ready. So something seems wrong with first boot in 411.86.202204190939-0 on OVN.
Created attachment 1873843 [details] Second boot (after force restarting)
Moving to OVN for them to have a look.
I'm marking this as a duplicate of 2078866 as the problems are similar enough and the solution is the same. *** This bug has been marked as a duplicate of bug 2078866 ***