Description of problem: we are trying to create a bridge, which is connected to the mng interface, and has dhcp, all goes well, but when we perform a node reboot, after reboot the slave interface file at /etc/sysconfig/network-scripts is getting overriden and no more points to its newly created master bridge, and the configuration become broken. Version-Release number of selected component (if applicable): 43.81.20191027.3 but also bersions before that (official untouched latest 4.3) ./hack/update-rhcos-bootimage.py https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.81.20191027.3/x86_64/meta.json How reproducible: Create a bridge which is connected to the management slave, and has dhcp, check the ip a state of the mng interface and the bridge, and see both have same ip and mac, perform reboot, and see that the ip a doesnt show same state as it should and as before and that /etc/sysconfig/network-scripts slave file was changed if comparing the slave file before reboot and after the reboot. Steps to Reproduce: 1. Create a bridge which is connected to the management slave, and has dhcp for example with nmstate after installing cluster network addons (following example if mng iface is ens3) cat <<EOF > bridge.yaml apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: br1-eth1-policy spec: desiredState: interfaces: - name: br1 description: Linux bridge with eth1 as a port type: linux-bridge state: up ipv4: dhcp: true enabled: true bridge: options: stp: enabled: false port: - name: ens3 EOF 2. delete the network addons config if you want to reduce noise in the next reboot 3. backup the /etc/sysconfig/network-scripts/ifcfg-ens3 slave file and check ip a status about the bridge and the slave that both have same ip and mac 4. reboot node 5. check diff of the /etc/sysconfig/network-scripts/ifcfg-ens3 comparing to above and see that the bridge isnt connected there and on ip a anymore to the slave. Actual results: bridge isnt connected anymore to the slave and configuration is broken Expected results: configuration should be kept intact Additional info: a wild guess is that code like the code in the bottom of the script override this folder as part of the ignition https://github.com/cgwalters/ignition-dracut/blob/e0ad7ecb9def2c7f05db26207b25d73fa8f844c1/dracut/30ignition/persist-ifcfg.sh but trying to edit that file on the node and remove the specific ifcfg-ens3 from the soruce folder before overwrite destination folder didnt help so not sure at all it comes from there
This may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1762614 or https://bugzilla.redhat.com/show_bug.cgi?id=1762509
thanks i was mistaken with the version, (it was based on 8.0) so i need to try with a version based on 8.1 and then see what happens but first we encoutered there a problem out of this ticket scope (2019-10-29 15:13:10,455 root ERROR NM main-loop aborted: Connection update failed: error=nm-settings-error-quark: failed to update connection: Could not open file '/etc/sysconfig/network-scripts/ifcfg-ens3' for writing: Permission denied (3), dev=ens3/<enum NM_DEVICE_STATE_ACTIVATED of type NM.DeviceState>) Also maybe this is a possible workround as well if indeed it will happen, https://bugzilla.redhat.com/show_bug.cgi?id=1760262
Filed https://bugzilla.redhat.com/show_bug.cgi?id=1766739 for the SELinux issue
Manually labelling SELinux until that bug included in new image. Back to this bug original title, verified and indeed i see the problem of ifcfg files been overriden at VERSION="43.81.20191029.3" after reboot. tried some workarounds without luck, for example: https://bugzilla.redhat.com/show_bug.cgi?id=1760262 sudo su sudo echo 'omit_dracutmodules+="ifcfg network"' > /etc/dracut.conf.d/ifcfg.conf dracut --print-cmdline dracut -fv is there a known workaround meanwhile for it ? maybe grub need to be updated as well ? (networkstatic=yes param) thanks
Please retest this with an updated installer - this was fixed in https://github.com/coreos/ignition-dracut/pull/130 and the updated bootimage should be available as of https://github.com/openshift/installer/pull/2609/commits/6f7da477e2f3392b3ee9f70df82f68b2db4dd1e2
Thanks Colin, will do https://github.com/coreos/ignition-dracut/pull/130 is for the issue https://bugzilla.redhat.com/show_bug.cgi?id=1766739 right ? for the issue of the overwriting of ifcfg files (this bugzilla), there isnt a fix yet i think, the files will be still be overwriten, even if their labeling will be correct so maybe the status need to return to open ? thanks
can i change the status of this ticket to NEW ? as the previous comment say, the bug that was fixed is other bugzilla and not this one thanks
Colin, can you confirm that the fix for the SELinux labeling problem (BZ#1766739) also fixes this behavior?
@orshoval I tried to reproduce this, but was not able to using 4.3.0-0.nightly-2019-12-05-073829 However, I'm not terribly familiar with the `cluster-network-addons-operator` or `nmstate`, so perhaps I didn't do something correctly. Have you tried to reproduce this recently? ### 1. Installed OCP 4.3.0-0.nightly-2019-12-05-073829 on AWS, 3 master/3 worker ### 2. Checked default state of worker node ``` $ oc debug node/ip-10-0-152-213.us-west-2.compute.internal Starting pod/ip-10-0-152-213us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.152.213 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# ip a | grep ens 2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000 inet 10.0.152.213/20 brd 10.0.159.255 scope global dynamic noprefixroute ens5 sh-4.4# ls /etc/sysconfig/network-scripts/ sh-4.4# cat /proc/cmdline BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-98216d725c8b567e6c5d6f0a643e5d4a8b6d76357efe0c96aba945195cf06225/vmlinuz-4.18.0-147.0.3.el8_1.x86_64 console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.0/rhcos/98216d725c8b567e6c5d6f0a643e5d4a8b6d76357efe0c96aba945195cf0 6225/0 ignition.platform.id=aws sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... ``` ### 3. Installed `cluster-network-addons-operator` per - https://github.com/kubevirt/cluster-network-addons-operator#deployment ``` $ alias kubectl=oc $ kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.23.0/namespace.yaml namespace/cluster-network-addons created $ kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.23.0/network-addons-config.crd.yaml customresourcedefinition.apiextensions.k8s.io/networkaddonsconfigs.networkaddonsoperator.network.kubevirt.io created $ kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.23.0/operator.yaml serviceaccount/cluster-network-addons-operator created clusterrole.rbac.authorization.k8s.io/cluster-network-addons-operator created clusterrolebinding.rbac.authorization.k8s.io/cluster-network-addons-operator created role.rbac.authorization.k8s.io/cluster-network-addons-operator created rolebinding.rbac.authorization.k8s.io/cluster-network-addons-operator created deployment.apps/cluster-network-addons-operator created $ cat nac.yaml --- apiVersion: networkaddonsoperator.network.kubevirt.io/v1alpha1 kind: NetworkAddonsConfig metadata: name: cluster spec: imagePullPolicy: IfNotPresent nmstate: {} ovs: {} $ oc apply -f nac.yaml networkaddonsconfig.networkaddonsoperator.network.kubevirt.io/cluster created ``` ### 4. Applied bridge config ``` $ cat bridge.yaml apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: br1-ens5-policy spec: desiredState: interfaces: - name: br1 description: Linux bridge with ens5 as a port type: linux-bridge state: up ipv4: dhcp: true enabled: true bridge: options: stp: enabled: false port: - name: ens5 $ oc apply -f bridge.yaml nodenetworkconfigurationpolicy.nmstate.io/br1-ens5-policy created ``` ### 5. Check worker node again; confirm that bridge is created and config files are landed. ``` $ oc debug node/ip-10-0-139-75.us-west-2.compute.internal Starting pod/ip-10-0-139-75us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.139.75 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq master br1 state UP group default qlen 1000 link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 4e:63:e8:8c:60:f3 brd ff:ff:ff:ff:ff:ff 4: br0: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN group default qlen 1000 link/ether 12:9a:d9:52:26:44 brd ff:ff:ff:ff:ff:ff 5: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether 62:ed:ca:f4:dd:fa brd ff:ff:ff:ff:ff:ff inet6 fe80::60ed:caff:fef4:ddfa/64 scope link valid_lft forever preferred_lft forever 6: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether a2:31:9f:25:d1:c1 brd ff:ff:ff:ff:ff:ff inet 10.129.2.1/23 brd 10.129.3.255 scope global tun0 valid_lft forever preferred_lft forever inet6 fe80::a031:9fff:fe25:d1c1/64 scope link valid_lft forever preferred_lft forever 7: veth6af7fe2f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether e2:16:f0:09:31:7e brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::e016:f0ff:fe09:317e/64 scope link valid_lft forever preferred_lft forever 8: vethd1dff741@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether aa:1e:da:bb:85:b6 brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet6 fe80::a81e:daff:febb:85b6/64 scope link valid_lft forever preferred_lft forever 11: veth5d8d2fe3@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether 7a:03:cf:ae:10:01 brd ff:ff:ff:ff:ff:ff link-netnsid 2 inet6 fe80::7803:cfff:feae:1001/64 scope link valid_lft forever preferred_lft forever 13: veth7b819c33@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether 86:4e:19:f2:9b:71 brd ff:ff:ff:ff:ff:ff link-netnsid 3 inet6 fe80::844e:19ff:fef2:9b71/64 scope link valid_lft forever preferred_lft forever 16: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000 link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff inet 10.0.139.75/20 brd 10.0.143.255 scope global dynamic noprefixroute br1 valid_lft 3495sec preferred_lft 3495sec sh-4.4# ls /etc/sysconfig/network-scripts/ ifcfg-Wired_connection_1 ifcfg-br1 sh-4.4# cat /etc/sysconfig/network-scripts/ifcfg-Wired_connection_1 MACADDR=02:78:01:32:9E:DA MTU=9001 TYPE=Ethernet NAME="Wired connection 1" UUID=308c7360-6348-32f5-b668-dadb8c81d704 DEVICE=ens5 ONBOOT=yes BRIDGE=br1 sh-4.4# cat /etc/sysconfig/network-scripts/ifcfg-br1 STP=no TYPE=Bridge PROXY_METHOD=none BROWSER_ONLY=no NM_USER_NMSTATE__INTERFACE__DESCRIPTION="Linux bridge with ens5 as a port" BOOTPROTO=dhcp DEFROUTE=yes DHCP_CLIENT_ID=mac IPV4_FAILURE_FATAL=no IPV6_DISABLED=yes IPV6INIT=no NAME=br1 UUID=1f1df135-20f0-4df8-8cbc-6a7ff9f2f7b0 DEVICE=br1 ONBOOT=yes AUTOCONNECT_SLAVES=yes sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... ``` ### 6. Reboot node with `systemctl reboot` ### 7. Check worker node again ``` $ oc debug node/ip-10-0-139-75.us-west-2.compute.internal Starting pod/ip-10-0-139-75us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.139.75 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# cd /etc/sysconfig/network-scripts/ sh-4.4# ls -latr total 12 drwxr-xr-x. 3 root root 4096 Dec 5 20:19 .. -rw-r--r--. 1 root root 323 Dec 5 21:14 ifcfg-br1 -rw-r--r--. 1 root root 151 Dec 5 21:14 ifcfg-Wired_connection_1 drwxr-xr-x. 2 root root 55 Dec 5 21:14 . sh-4.4# uptime 21:27:57 up 1 min, 0 users, load average: 1.67, 0.59, 0.21 sh-4.4# journalctl --list-boots -2 edf01840bee04a119008de782af93785 Thu 2019-12-05 20:16:29 UTC—Thu 2019-12-05 20:19:32 UTC -1 9de4caa95aa4415a9341137b9ca7527f Thu 2019-12-05 20:19:47 UTC—Thu 2019-12-05 21:25:53 UTC 0 3abdd0722c594a9bbee2d4ff7a19b405 Thu 2019-12-05 21:26:09 UTC—Thu 2019-12-05 21:28:01 UTC sh-4.4# rpm-ostree status State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ca36eace9f177983c7f6431a361d6fbf4aec724a59c56e7db572d35efb386262 CustomOrigin: Managed by machine-config-operator Version: 43.81.201912050027.0 (2019-12-05T00:32:07Z) ostree://e884477421640d1285c07a6dd9aaf01c9e125038ebbe6290a5e341eb3695a4d1 Version: 43.81.201911221453.0 (2019-11-22T14:58:44Z) sh-4.4# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq master br1 state UP group default qlen 1000 link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff 3: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000 link/ether 02:78:01:32:9e:da brd ff:ff:ff:ff:ff:ff inet 10.0.139.75/20 brd 10.0.143.255 scope global dynamic noprefixroute br1 valid_lft 3527sec preferred_lft 3527sec 8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 22:48:76:9b:09:d2 brd ff:ff:ff:ff:ff:ff 9: br0: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN group default qlen 1000 link/ether de:9b:ec:84:2c:4d brd ff:ff:ff:ff:ff:ff 10: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether 5e:bd:06:9f:8c:0c brd ff:ff:ff:ff:ff:ff inet6 fe80::5cbd:6ff:fe9f:8c0c/64 scope link valid_lft forever preferred_lft forever 11: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 22:99:3a:f2:9d:b4 brd ff:ff:ff:ff:ff:ff inet 10.129.2.1/23 brd 10.129.3.255 scope global tun0 valid_lft forever preferred_lft forever inet6 fe80::2099:3aff:fef2:9db4/64 scope link valid_lft forever preferred_lft forever 12: veth526b9b62@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether fe:c8:d6:e9:41:c9 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::fcc8:d6ff:fee9:41c9/64 scope link valid_lft forever preferred_lft forever 13: vethedb78f17@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether 2a:f7:aa:53:98:68 brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet6 fe80::28f7:aaff:fe53:9868/64 scope link valid_lft forever preferred_lft forever 14: vetha412308f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether 5a:7d:b6:d1:cb:13 brd ff:ff:ff:ff:ff:ff link-netnsid 2 inet6 fe80::587d:b6ff:fed1:cb13/64 scope link valid_lft forever preferred_lft forever 15: veth796cbc83@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP group default link/ether 8a:f7:21:97:06:9a brd ff:ff:ff:ff:ff:ff link-netnsid 3 inet6 fe80::88f7:21ff:fe97:69a/64 scope link valid_lft forever preferred_lft forever sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... ``` To be honest, my cluster was unhealthy at this point, but the problem that you reported did not seem to occur. I wanted to point out that the Network Addons Operator/nmstate appears to conflict with the role of MCO here. The MCO is responsible for maintaining the configuration state of the node, for example the config files under /etc. So when `nmstate` (or the operator) causes ifcfg files to be written out to /etc/sysconfig/network-scripts, I'm not sure what will happen when the cluster is upgraded and the MCO tries to reconcile the config state of the node.
We are not able to reproduce this with the 4.3 content listed in comment#11 and given the 4.3 deadline that approaches, we will be likely be unable to make any additional changes for this BZ. Moving to 4.4. @oshoval, if you can try to reproduce this again or provide additional information, we can take another look in the 4.4 timeframe.
sorry for the late response, we have problem with okd 4.3 on libvirt, so i cant check it. will update when we be able to run it thanks
I believe this is fixed in 4.3. We didn't fix 4.2 but it's not worth respinning for this.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days