Bug 1901859
| Summary: | NodeNetworkConfigurationPolicy failed to retrieve default gw - create VLAN interface | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Robert Bohne <rbohne> | ||||||||||||||
| Component: | Networking | Assignee: | Quique Llorente <ellorent> | ||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Meni Yakove <myakove> | ||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||
| Priority: | unspecified | ||||||||||||||||
| Version: | 2.5.1 | CC: | cnv-qe-bugs, ellorent, fge, gcofano, mapandey, mbagga, phoracek, swasthan, ysegev | ||||||||||||||
| Target Milestone: | --- | ||||||||||||||||
| Target Release: | 4.8.0 | ||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||
| OS: | Unspecified | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2021-07-27 14:21:17 UTC | Type: | Bug | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Embargoed: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Robert Bohne
2020-11-26 09:38:29 UTC
Created attachment 1733677 [details]
NodeNetworkConfigurationEnactment
Created attachment 1733678 [details]
NodeNetworkConfigurationEnactment-with-ip
Created attachment 1733679 [details]
NodeNetworkState-with-ip
Here the NCP with IP
oc apply -f - <<EOF
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: vlan-ens3f1-policy
spec:
nodeSelector:
kubernetes.io/hostname: "ocp-master1"
desiredState:
interfaces:
- name: ens3f1.602
description: VLAN using ens3f1
type: vlan
state: up
ipv4:
enabled: true
dhcp: false
vlan:
base-iface: ens3f1
id: 602
EOF
nns says ens3f1 os down, but ip link show state UP ??
[root@ocp-lb ~]# oc debug node/ocp-master1
Creating debug namespace/openshift-debug-node-qmms2 ...
Starting pod/ocp-master1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.29.26.52
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ip link show dev ens3f1
3: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether d4:f5:ef:1a:25:78 brd ff:ff:ff:ff:ff:ff
[root@ocp-lb ~]# oc get nns ocp-master1 -o jsonpath="{.status.currentState.interfaces[?(@.name=='ens3f1')]}" | jq
{
"ethernet": {
"auto-negotiation": false,
"duplex": "full",
"speed": 10000,
"sr-iov": {
"total-vfs": 0,
"vfs": []
}
},
"ipv4": {
"enabled": false
},
"ipv6": {
"enabled": false
},
"mac-address": "D4:F5:EF:1A:25:78",
"mtu": 1500,
"name": "ens3f1",
"state": "down",
"type": "ethernet"
}
Hello Robert,
Could you try it with following? "ipv4: {enabled: true}" requires an IP to be set on the interface. Since you disabled DHCP and static, there is none.
ipv4:
enabled: false
This would create the interface without an IP.
Hello Petr,
i tried and if failed to:
oc apply -f - <<EOF
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: vlan-ens3f1-policy
spec:
nodeSelector:
kubernetes.io/hostname: "ocp-master1"
desiredState:
interfaces:
- name: ens3f1.602
description: VLAN using ens3f1
type: vlan
state: up
ipv4:
enabled: false
dhcp: false
vlan:
base-iface: ens3f1
id: 602
EOF
Error:
[root@ocp-lb ~]# oc get nodenetworkconfigurationenactment.nmstate.io/ocp-master1.vlan-ens3f1-policy -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'
error reconciling NodeNetworkConfigurationPolicy at desired state apply: , rolling back desired state configuration: failed runnig probes after network changes: failed to retrieve default gw at runProbes: timed out waiting for the condition
I will attached NodeNetworkState-with-ip-false and NodeNetworkConfigurationEnactment-with-ip-false in a second.
Created attachment 1733695 [details]
NodeNetworkState-with-ip-false
Created attachment 1733696 [details]
NodeNetworkConfigurationEnactment-with-ip-false
Robert, would you please provide the log of knmstate from the recent run? It should hopefully show us whether nmstatectl did something to the enp2s0 default interface (it did in your original setup, but that may have been due to DHCP). So before the configuration we have management IP set on ens2f1.602. And we also have 0.0.0.0/0 route on it:
- ipv4:
address:
- ip: 172.29.26.52
prefix-length: 24
dhcp: false
enabled: true
- destination: 0.0.0.0/0
metric: 402
next-hop-address: 172.29.26.33
next-hop-interface: ens2f1.602
table-id: 254
After the configuration, when we run our connectivity probes, we still see the static IP, but we don't have any default route set:
"ipv4": {
"address": [
{
"ip": "172.29.26.52",
"prefix-length": 24
}
],
"dhcp": false,
"enabled": true
},
The new connection has DHCP clearly disabled:
"ipv4": {
"dhcp": false,
"enabled": false
},
-------------
It seems that nmstatectl removed the default GW even though your default interface was not explicitly touched.
We have a workaround for this - setting the default GW explicitly in the policy. However, we don't see this issue for the first time, we should investigate it properly.
Quique, would you find some time to assist Robert with the workaround. But please don't let him get away until we get a proper fix for this bug <.<
I have being able to to ensure that nmstate does not remove the vlan's default gw adding the whole config to policy
The values are from my env you will have to extrapolate
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: vlan-bug
spec:
nodeSelector:
kubernetes.io/hostname: "node02"
desiredState:
routes:
config:
- destination: 0.0.0.0/0
next-hop-address: 192.168.66.2
next-hop-interface: eth0.602
interfaces:
- name: eth0.602
description: VLAN using ens3f1
type: vlan
state: up
ipv4:
address:
- ip: 172.29.25.52
prefix-length: 24
dhcp: false
enabled: true
vlan:
base-iface: eth0
id: 602
- name: eth1.602
description: VLAN using ens3f1
type: vlan
state: up
ipv4:
address:
- ip: 172.29.26.52
prefix-length: 24
dhcp: false
enabled: true
vlan:
base-iface: eth1
id: 602
After some investigation together with Quique - thanks again!
The work-around is to add the primary interface to the nncp, the static IP address will inherit
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: network-config-ocp-master1
spec:
nodeSelector:
kubernetes.io/hostname: "ocp-master1"
desiredState:
interfaces:
- name: ens2f1.602 <==== Primary interface, configured during installation via liveiso + nmcli/nmtui (Static IP).
description: VLAN using ens2f1
type: vlan
state: up
ipv4:
dhcp: false
enabled: true <===== enabled because inherit configuration from existing nm config.
vlan:
base-iface: ens2f1
id: 602
- name: ens3f1.602
description: VLAN using ens3f1
type: vlan
state: up
ipv4:
dhcp: false
enabled: false
vlan:
base-iface: ens3f1
id: 602
- name: br1
description: Linux bridge with ens3f1.602 as a port
type: linux-bridge
state: up
ipv4:
enabled: false
dhcp: false
bridge:
options:
stp:
enabled: false
port:
- name: ens3f1.602
@Quique sadly I don't have access to the customer env. anymore. If I remember correctly you were able to reproduce the bug. Thank you very much again for your support!
CNV seems ignoring metrics of route(default gateway), but nmstate does not. This cause nmstate merging desire state with current state, then generated two default gateways. nmstate-1.0.0-1.el8 has support of multiple gateways, so there is no problem there. Gris, this indeed explains the issues with the route setup workaround. However, that issue was secondary. The main problem is that configuration of a VLAN interface (as described in https://bugzilla.redhat.com/show_bug.cgi?id=1901859#c0) deletes the default route of the system. IIUIC there is no connection between the two interfaces and VLAN config should not affect host's connectivity. Any clue what might have caused this? I understand if it is impossible to figure this out without a system reproducing it. No idea. Let me backport the multiple gateway support into nmstate-0.3 and we try again there. I take a second look on this bug. Initially, the comment https://bugzilla.redhat.com/show_bug.cgi?id=1901859#c0 is caused by ipv4 enabled without IP address. Then, the multiple default gateway is caused by in desire state the default gateways has metric 402 which is different from running config, hence nmstate treat it as two default gateway which leads to the NmstateNotImplementedError error. The better way to set a default gateway should be: ```yml routes: config: - destination: 0.0.0.0/0 state: absent - destination: 0.0.0.0/0 next-hop-address: 192.0.2.1 next-hop-interface: eth1 ``` I tried in my VM. The nmstate-0.3 does not remove the default gateways when adding new vlan when the default gateways was created by NetworkManager. For ip address and routes created by other tool(like ip command or kernel/dracut option), nmstate-0.3 does not support it yet. 1.0 in RHEL 8.4 should works well there. To continue debuging this issue, the nmstate logs and pre-nmstate network state could helps. @(In reply to Gris Ge from comment #24) > I take a second look on this bug. > > Initially, the comment > https://bugzilla.redhat.com/show_bug.cgi?id=1901859#c0 is caused by ipv4 > enabled without IP address. > > Then, the multiple default gateway is caused by in desire state the default > gateways has metric 402 which is different from running config, hence > nmstate treat it as two default gateway which leads to the > NmstateNotImplementedError error. > The better way to set a default gateway should be: > > ```yml > routes: > config: > - destination: 0.0.0.0/0 > state: absent > - destination: 0.0.0.0/0 > next-hop-address: 192.0.2.1 > next-hop-interface: eth1 > ``` > > I tried in my VM. The nmstate-0.3 does not remove the default gateways when > adding new vlan when the default gateways was created by NetworkManager. > > For ip address and routes created by other tool(like ip command or > kernel/dracut option), nmstate-0.3 does not support it yet. 1.0 in RHEL 8.4 > should works well there. But it has being configured with NetworkManager as stated on https://bugzilla.redhat.com/show_bug.cgi?id=1901859#c13, so nmstate should be aware of it. > > To continue debuging this issue, the nmstate logs and pre-nmstate network > state could helps. We don't have this env anymore, we have similar bz maybe there we can get the info needed https://bugzilla.redhat.com/show_bug.cgi?id=1879458. Also it would be nice to retest this fixing the dup default gw issue at nmstate https://bugzilla.redhat.com/show_bug.cgi?id=1909729 to see what happend. @fge thanks for you help here. Would you please check Quique's questions and suggestions in the comment above? *** Bug 1879458 has been marked as a duplicate of this bug. *** Hi Quique, The bug 1909729 has been shipped to RHEL 8.3.0.z. Could check again whether it fix this bug or not? (In reply to Gris Ge from comment #30) > Hi Quique, > > > The bug 1909729 has been shipped to RHEL 8.3.0.z. Could check again whether > it fix this bug or not? @ysegev Is going to verify it Hi Gris, Can you please specify on which nmstate (or knmstate-handler) version it was fixed, so we can be sure the fix exists on our cluster before trying to verify? Thanks. Hi Yossi: nmstate-0.3.4-25.el8_3 Verified on a cluster with the following versions:
nmstate-0.3.4-25.el8_3.noarch
kubernetes-nmstate-handler-container-v4.8.0-3 (6a66d2c9e338103d5573289afdeb856c4d1f2b86669206851eef189ea5d0e88f)
OCP: 4.8.0-0.nightly-2021-03-04-014703
CNV: 4.8.0
Verified by running the original scenario from the bug description (with adjustments to the cluster in use - selected worker hostname and NIC name), by applying this NNCP:
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: vlan-ens8-policy
spec:
nodeSelector:
kubernetes.io/hostname: "network01-9rfjb-worker-0-jscck"
desiredState:
interfaces:
- name: ens8.602
description: VLAN using ens8
type: vlan
state: up
vlan:
base-iface: ens8
id: 602
Results:
1. NNCP successfully configured.
2. No error message.
3. Configured VLAN interface (ens8.602) exists on the selected node.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920 |