Bug 2077052
| Summary: | RHEL 8.6 bump in RHCOS is preventing Azure nodes from (re)booting | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Stephen Benjamin <stbenjam> | ||||
| Component: | Networking | Assignee: | Andreas Karis <akaris> | ||||
| Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> | ||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | unspecified | CC: | akaris, bleanhar, dornelas, jligon, jschinta, miabbott, mrussell, nstielau | ||||
| Version: | 4.11 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-05-03 13:54:04 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 2077605 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Stephen Benjamin
2022-04-20 14:49:46 UTC
Created attachment 1873842 [details]
First boot log from worker upgrading to RHCOS based on 8.6
From the first boot after upgrading to 411.86.202204190939-0
* It does get a DHCP lease:
[ 7.476859] NetworkManager[809]: <info> [1650470349.9631] dhcp4 (eth0): state changed new lease, address=10.0.128.4
Later on, it runs configure-ovs.sh and it does some network-manager things:
[ 58.840163] configure-ovs.sh[1733]: Removed nmconnection file /etc/NetworkManager/system-connections/ovs-port-phys0.nmconnection
[ 58.840671] configure-ovs.sh[1733]: + nm_config_changed=1
[ 58.841126] configure-ovs.sh[1733]: + ovs-vsctl --timeout=30 --if-exists del-br br-ex
[ 58.927356] configure-ovs.sh[1733]: + '[' -d /sys/class/net/br-ex1 ']'
[ 58.930344] configure-ovs.sh[1733]: + echo 'OVS configuration successfully reverted'
[ 58.933182] configure-ovs.sh[1733]: OVS configuration successfully reverted
[ 58.933761] configure-ovs.sh[1733]: + reload_nm
[ 58.934275] configure-ovs.sh[1733]: + '[' 1 -eq 0 ']'
[ 58.934796] configure-ovs.sh[1733]: + nm_config_changed=0
[ 58.935893] configure-ovs.sh[1733]: + echo 'Reloading NetworkManager after configuration changes...'
[ 58.936924] configure-ovs.sh[1733]: Reloading NetworkManager after configuration changes...
[ 58.937889] configure-ovs.sh[1733]: + nmcli network off
[ 58.957462] configure-ovs.sh[1733]: + echo 'Waiting for devices to disconnect...'
[ 58.960596] configure-ovs.sh[1733]: Waiting for devices to disconnect...
[ 58.964345] configure-ovs.sh[1733]: + timeout 60 bash -c 'while nmcli -g DEVICE,STATE d | grep -v :unmanaged; do sleep 5; done'
After this, the host never pulls a DHCP lease again. However, if I force reboot the host:
$ az vm restart --force --name ci-op-547k206c-99831-fdm9f-worker-centralus3-h8mdx --resource-group ci-op-547k206c-99831-fdm9f-rg --subscription 72e3a972-58b0-4afc-bd4f-da89b39ccebd
It does reboot and come back up into RHCOS 411.86.202204190939-0 just fine (see second boot log), and the host becomes ready.
So something seems wrong with first boot in 411.86.202204190939-0 on OVN.
Created attachment 1873843 [details]
Second boot (after force restarting)
Moving to OVN for them to have a look. I'm marking this as a duplicate of 2078866 as the problems are similar enough and the solution is the same. *** This bug has been marked as a duplicate of bug 2078866 *** |