Bug 1766161
| Summary: | NetworkManager configuration is lost after node reboot (slave config is overridden at reboot phase) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | oshoval |
| Component: | RHCOS | Assignee: | Colin Walters <walters> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.3.0 | CC: | bbreard, dustymabe, imcleod, jligon, miabbott, nstielau, walters |
| Target Milestone: | --- | ||
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-02-04 16:20:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
oshoval
2019-10-28 13:27:09 UTC
This may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1762614 or https://bugzilla.redhat.com/show_bug.cgi?id=1762509 thanks i was mistaken with the version, (it was based on 8.0) so i need to try with a version based on 8.1 and then see what happens but first we encoutered there a problem out of this ticket scope (2019-10-29 15:13:10,455 root ERROR NM main-loop aborted: Connection update failed: error=nm-settings-error-quark: failed to update connection: Could not open file '/etc/sysconfig/network-scripts/ifcfg-ens3' for writing: Permission denied (3), dev=ens3/<enum NM_DEVICE_STATE_ACTIVATED of type NM.DeviceState>) Also maybe this is a possible workround as well if indeed it will happen, https://bugzilla.redhat.com/show_bug.cgi?id=1760262 Filed https://bugzilla.redhat.com/show_bug.cgi?id=1766739 for the SELinux issue Manually labelling SELinux until that bug included in new image. Back to this bug original title, verified and indeed i see the problem of ifcfg files been overriden at VERSION="43.81.20191029.3" after reboot. tried some workarounds without luck, for example: https://bugzilla.redhat.com/show_bug.cgi?id=1760262 sudo su sudo echo 'omit_dracutmodules+="ifcfg network"' > /etc/dracut.conf.d/ifcfg.conf dracut --print-cmdline dracut -fv is there a known workaround meanwhile for it ? maybe grub need to be updated as well ? (networkstatic=yes param) thanks Please retest this with an updated installer - this was fixed in https://github.com/coreos/ignition-dracut/pull/130 and the updated bootimage should be available as of https://github.com/openshift/installer/pull/2609/commits/6f7da477e2f3392b3ee9f70df82f68b2db4dd1e2 Thanks Colin, will do https://github.com/coreos/ignition-dracut/pull/130 is for the issue https://bugzilla.redhat.com/show_bug.cgi?id=1766739 right ? for the issue of the overwriting of ifcfg files (this bugzilla), there isnt a fix yet i think, the files will be still be overwriten, even if their labeling will be correct so maybe the status need to return to open ? thanks can i change the status of this ticket to NEW ? as the previous comment say, the bug that was fixed is other bugzilla and not this one thanks Colin, can you confirm that the fix for the SELinux labeling problem (BZ#1766739) also fixes this behavior? @orshoval I tried to reproduce this, but was not able to using 4.3.0-0.nightly-2019-12-05-073829
However, I'm not terribly familiar with the `cluster-network-addons-operator` or `nmstate`, so perhaps I didn't do something correctly.
Have you tried to reproduce this recently?
### 1. Installed OCP 4.3.0-0.nightly-2019-12-05-073829 on AWS, 3 master/3 worker
### 2. Checked default state of worker node
```
$ oc debug node/ip-10-0-152-213.us-west-2.compute.internal
Starting pod/ip-10-0-152-213us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.152.213
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# ip a | grep ens
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
inet 10.0.152.213/20 brd 10.0.159.255 scope global dynamic noprefixroute ens5
sh-4.4# ls /etc/sysconfig/network-scripts/
sh-4.4# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-98216d725c8b567e6c5d6f0a643e5d4a8b6d76357efe0c96aba945195cf06225/vmlinuz-4.18.0-147.0.3.el8_1.x86_64 console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.0/rhcos/98216d725c8b567e6c5d6f0a643e5d4a8b6d76357efe0c96aba945195cf0
6225/0 ignition.platform.id=aws
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
```
### 3. Installed `cluster-network-addons-operator` per - https://github.com/kubevirt/cluster-network-addons-operator#deployment
```
$ alias kubectl=oc
$ kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.23.0/namespace.yaml
namespace/cluster-network-addons created
$ kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.23.0/network-addons-config.crd.yaml
customresourcedefinition.apiextensions.k8s.io/networkaddonsconfigs.networkaddonsoperator.network.kubevirt.io created
$ kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.23.0/operator.yaml
serviceaccount/cluster-network-addons-operator created
clusterrole.rbac.authorization.k8s.io/cluster-network-addons-operator created
clusterrolebinding.rbac.authorization.k8s.io/cluster-network-addons-operator created
role.rbac.authorization.k8s.io/cluster-network-addons-operator created
rolebinding.rbac.authorization.k8s.io/cluster-network-addons-operator created
deployment.apps/cluster-network-addons-operator created
$ cat nac.yaml
---
apiVersion: networkaddonsoperator.network.kubevirt.io/v1alpha1
kind: NetworkAddonsConfig
metadata:
name: cluster
spec:
imagePullPolicy: IfNotPresent
nmstate: {}
ovs: {}
$ oc apply -f nac.yaml
networkaddonsconfig.networkaddonsoperator.network.kubevirt.io/cluster created
```
### 4. Applied bridge config
```
$ cat bridge.yaml
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: br1-ens5-policy
spec:
desiredState:
interfaces:
- name: br1
description: Linux bridge with ens5 as a port
type: linux-bridge
state: up
ipv4:
dhcp: true
enabled: true
bridge:
options:
stp:
enabled: false
port:
- name: ens5
$ oc apply -f bridge.yaml
nodenetworkconfigurationpolicy.nmstate.io/br1-ens5-policy created
```
### 5. Check worker node again; confirm that bridge is created and config files are landed.
```
$ oc debug node/ip-10-0-139-75.us-west-2.compute.internal
Starting pod/ip-10-0-139-75us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.139.75
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq master br1 state UP group default qlen 1000
link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff
3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 4e:63:e8:8c:60:f3 brd ff:ff:ff:ff:ff:ff
4: br0: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN group default qlen 1000
link/ether 12:9a:d9:52:26:44 brd ff:ff:ff:ff:ff:ff
5: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
link/ether 62:ed:ca:f4:dd:fa brd ff:ff:ff:ff:ff:ff
inet6 fe80::60ed:caff:fef4:ddfa/64 scope link
valid_lft forever preferred_lft forever
6: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether a2:31:9f:25:d1:c1 brd ff:ff:ff:ff:ff:ff
inet 10.129.2.1/23 brd 10.129.3.255 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::a031:9fff:fe25:d1c1/64 scope link
valid_lft forever preferred_lft forever
7: veth6af7fe2f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether e2:16:f0:09:31:7e brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::e016:f0ff:fe09:317e/64 scope link
valid_lft forever preferred_lft forever
8: vethd1dff741@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether aa:1e:da:bb:85:b6 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::a81e:daff:febb:85b6/64 scope link
valid_lft forever preferred_lft forever
11: veth5d8d2fe3@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether 7a:03:cf:ae:10:01 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::7803:cfff:feae:1001/64 scope link
valid_lft forever preferred_lft forever
13: veth7b819c33@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether 86:4e:19:f2:9b:71 brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::844e:19ff:fef2:9b71/64 scope link
valid_lft forever preferred_lft forever
16: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000
link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff
inet 10.0.139.75/20 brd 10.0.143.255 scope global dynamic noprefixroute br1
valid_lft 3495sec preferred_lft 3495sec
sh-4.4# ls /etc/sysconfig/network-scripts/
ifcfg-Wired_connection_1 ifcfg-br1
sh-4.4# cat /etc/sysconfig/network-scripts/ifcfg-Wired_connection_1
MACADDR=02:78:01:32:9E:DA
MTU=9001
TYPE=Ethernet
NAME="Wired connection 1"
UUID=308c7360-6348-32f5-b668-dadb8c81d704
DEVICE=ens5
ONBOOT=yes
BRIDGE=br1
sh-4.4# cat /etc/sysconfig/network-scripts/ifcfg-br1
STP=no
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
NM_USER_NMSTATE__INTERFACE__DESCRIPTION="Linux bridge with ens5 as a port"
BOOTPROTO=dhcp
DEFROUTE=yes
DHCP_CLIENT_ID=mac
IPV4_FAILURE_FATAL=no
IPV6_DISABLED=yes
IPV6INIT=no
NAME=br1
UUID=1f1df135-20f0-4df8-8cbc-6a7ff9f2f7b0
DEVICE=br1
ONBOOT=yes
AUTOCONNECT_SLAVES=yes
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
```
### 6. Reboot node with `systemctl reboot`
### 7. Check worker node again
```
$ oc debug node/ip-10-0-139-75.us-west-2.compute.internal
Starting pod/ip-10-0-139-75us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.139.75
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cd /etc/sysconfig/network-scripts/
sh-4.4# ls -latr
total 12
drwxr-xr-x. 3 root root 4096 Dec 5 20:19 ..
-rw-r--r--. 1 root root 323 Dec 5 21:14 ifcfg-br1
-rw-r--r--. 1 root root 151 Dec 5 21:14 ifcfg-Wired_connection_1
drwxr-xr-x. 2 root root 55 Dec 5 21:14 .
sh-4.4# uptime
21:27:57 up 1 min, 0 users, load average: 1.67, 0.59, 0.21
sh-4.4# journalctl --list-boots
-2 edf01840bee04a119008de782af93785 Thu 2019-12-05 20:16:29 UTC—Thu 2019-12-05 20:19:32 UTC
-1 9de4caa95aa4415a9341137b9ca7527f Thu 2019-12-05 20:19:47 UTC—Thu 2019-12-05 21:25:53 UTC
0 3abdd0722c594a9bbee2d4ff7a19b405 Thu 2019-12-05 21:26:09 UTC—Thu 2019-12-05 21:28:01 UTC
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ca36eace9f177983c7f6431a361d6fbf4aec724a59c56e7db572d35efb386262
CustomOrigin: Managed by machine-config-operator
Version: 43.81.201912050027.0 (2019-12-05T00:32:07Z)
ostree://e884477421640d1285c07a6dd9aaf01c9e125038ebbe6290a5e341eb3695a4d1
Version: 43.81.201911221453.0 (2019-11-22T14:58:44Z)
sh-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq master br1 state UP group default qlen 1000
link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff
3: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000
link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff
inet 10.0.139.75/20 brd 10.0.143.255 scope global dynamic noprefixroute br1
valid_lft 3527sec preferred_lft 3527sec
8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 22:48:76:9b:09:d2 brd ff:ff:ff:ff:ff:ff
9: br0: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN group default qlen 1000
link/ether de:9b:ec:84:2c:4d brd ff:ff:ff:ff:ff:ff
10: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
link/ether 5e:bd:06:9f:8c:0c brd ff:ff:ff:ff:ff:ff
inet6 fe80::5cbd:6ff:fe9f:8c0c/64 scope link
valid_lft forever preferred_lft forever
11: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 22:99:3a:f2:9d:b4 brd ff:ff:ff:ff:ff:ff
inet 10.129.2.1/23 brd 10.129.3.255 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::2099:3aff:fef2:9db4/64 scope link
valid_lft forever preferred_lft forever
12: veth526b9b62@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether fe:c8:d6:e9:41:c9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::fcc8:d6ff:fee9:41c9/64 scope link
valid_lft forever preferred_lft forever
13: vethedb78f17@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether 2a:f7:aa:53:98:68 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::28f7:aaff:fe53:9868/64 scope link
valid_lft forever preferred_lft forever
14: vetha412308f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether 5a:7d:b6:d1:cb:13 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::587d:b6ff:fed1:cb13/64 scope link
valid_lft forever preferred_lft forever
15: veth796cbc83@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default
link/ether 8a:f7:21:97:06:9a brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::88f7:21ff:fe97:69a/64 scope link
valid_lft forever preferred_lft forever
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
```
To be honest, my cluster was unhealthy at this point, but the problem that you reported did not seem to occur.
I wanted to point out that the Network Addons Operator/nmstate appears to conflict with the role of MCO here. The MCO is responsible for maintaining the configuration state of the node, for example the config files under /etc. So when `nmstate` (or the operator) causes ifcfg files to be written out to /etc/sysconfig/network-scripts, I'm not sure what will happen when the cluster is upgraded and the MCO tries to reconcile the config state of the node.
We are not able to reproduce this with the 4.3 content listed in comment#11 and given the 4.3 deadline that approaches, we will be likely be unable to make any additional changes for this BZ. Moving to 4.4. @oshoval, if you can try to reproduce this again or provide additional information, we can take another look in the 4.4 timeframe. sorry for the late response, we have problem with okd 4.3 on libvirt, so i cant check it. will update when we be able to run it thanks I believe this is fixed in 4.3. We didn't fix 4.2 but it's not worth respinning for this. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |