Bug 1903152 - Baremetal: Rebooting any host that uses OVN Kubernetes leaves it unable to access the network
Summary: Baremetal: Rebooting any host that uses OVN Kubernetes leaves it unable to ac...
Keywords:
Status: CLOSED DUPLICATE of bug 1898036
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Antoni Segura Puimedon
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-01 13:42 UTC by Antoni Segura Puimedon
Modified: 2020-12-01 14:57 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-01 14:57:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Antoni Segura Puimedon 2020-12-01 13:42:26 UTC
Description of problem:
4.7 makes use of OverlayFS to make sure that any changes that happen at runtime, stay only at runtime. In order to do that:
* It mounts OverlayFS to a new directory /etc/NetworkManager/system-connections-merged and tells NetworkManager to use it as its source of system connection configuration.

    lowerdir=/etc/NetworkManager/system-connections,upperdir=/run/nm-system-connections,workdir=/run/nm-system-connections-work

This happens before NetworkManager runs, as NetworkManager needs to be started pointing to /etc/NetworkManager/system-connections-merged. So just after systemd finishes setting up the temporary directories, it gets set up.

Another part of the networking setup done by ovs-configuration.service is in charge of setting up NetworkManager and open vSwitch for OVN Kubernetes. The way it does the set up consists on checking which NetworkManager connection is the one used the default gateway and morphing it into a bridged connection.


How reproducible: 100%


Steps to Reproduce:
1. Deploy OCP 4.7 with OVN Kubernetes
2. oc debug node/mynodename
3. chroot /chroot
4. systemctl reboot

Actual results:
The node boots up and is unable to set up its networking, so it appears as NotReady in `oc get nodes`. It also can't be accessed via `oc debug`.

Expected results:
After a short time, mynodename shows up as Ready in `oc get nodes` and can be accessed doing oc debug node/mynodename

Additional info:

The reason for this is that the NetworkManager configuration that ovs-configuration.service ends up being ephemeral due to OverlayFS, whereas the ovsdb configuration that comes from the same service is not. The inconsistency makes it impossible to boot.

Workarounds:

While the bug is being worked on, one can do the following to be able to reboot the nodes:

1. oc debug into each node after they appear as ready
2. copy the contents of /etc/NetworkManager/system-connections-merged into /etc/NetworkManager/system-connections

Comment 1 Antoni Segura Puimedon 2020-12-01 14:57:11 UTC

*** This bug has been marked as a duplicate of bug 1898036 ***


Note You need to log in before you can comment on or make changes to this bug.