Bug 2047445
| Summary: | ovs-configure mis-detecting the ipv6 status on IPv4 only cluster causing Deployment failure | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Victor Voronkov <vvoronko> |
| Component: | Networking | Assignee: | Ben Nemec <bnemec> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | anusaxen, bnemec, calfonso, ccrum, cgoncalves, elevin, mcornea, omichael, sasha, vpickard, wking, yporagpa, yprokule |
| Version: | 4.10 | Keywords: | Regression |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:44:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2048836 | ||
master-0-0:
[core@master-0-0 ~]$ nmcli con show
NAME UUID TYPE DEVICE
Wired Connection 9d5c7c3b-9130-4a40-b31f-c99cad4da283 ethernet enp0s3
Wired Connection 84a523ff-ee8a-4a29-94ca-47590eb0cb76 ethernet enp0s4
[core@master-0-0 ~]$ nmcli -m multiline --get-values ip6.address conn show 84a523ff-ee8a-4a29-94ca-47590eb0cb76
IP6.ADDRESS[1]:fe80::5054:ff:fe6e:6923/64
[core@master-0-0 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:72:6c:90 brd ff:ff:ff:ff:ff:ff
3: enp0s4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:6e:69:23 brd ff:ff:ff:ff:ff:ff
inet 192.168.123.109/24 brd 192.168.123.255 scope global dynamic noprefixroute enp0s4
valid_lft 2621sec preferred_lft 2621sec
inet6 fe80::5054:ff:fe6e:6923/64 scope link noprefixroute
valid_lft forever preferred_lft forever
22: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 82:8d:01:2c:b3:41 brd ff:ff:ff:ff:ff:ff
23: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000
link/ether 9a:6d:68:a6:d5:36 brd ff:ff:ff:ff:ff:ff
24: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 5a:e5:2b:53:5c:5e brd ff:ff:ff:ff:ff:ff
inet 10.130.0.2/23 brd 10.130.1.255 scope global ovn-k8s-mp0
valid_lft forever preferred_lft forever
inet6 fe80::58e5:2bff:fe53:5c5e/64 scope link
valid_lft forever preferred_lft forever
25: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
link/ether 3e:b2:5b:bc:a6:7f brd ff:ff:ff:ff:ff:ff
inet6 fe80::3cb2:5bff:febc:a67f/64 scope link
valid_lft forever preferred_lft forever
[core@master-0-0 ~]$ journalctl -u NetworkManager-wait-online.service
-- Logs begin at Thu 2022-01-27 16:39:23 UTC, end at Thu 2022-01-27 21:15:28 UTC. --
Jan 27 16:39:37 master-0-0 systemd[1]: Starting Network Manager Wait Online...
Jan 27 16:39:37 master-0-0 systemd[1]: Started Network Manager Wait Online.
Jan 27 16:41:05 master-0-0 systemd[1]: NetworkManager-wait-online.service: Succeeded.
Jan 27 16:41:05 master-0-0 systemd[1]: Stopped Network Manager Wait Online.
Jan 27 16:41:05 master-0-0 systemd[1]: NetworkManager-wait-online.service: Consumed 0 CPU time
-- Reboot --
Jan 27 16:41:26 localhost systemd[1]: Starting Network Manager Wait Online...
Jan 27 16:42:27 master-0-0 systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Jan 27 16:42:27 master-0-0 systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Jan 27 16:42:27 master-0-0 systemd[1]: Failed to start Network Manager Wait Online.
Jan 27 16:42:27 master-0-0 systemd[1]: NetworkManager-wait-online.service: Consumed 43ms CPU time
*** Bug 2048535 has been marked as a duplicate of this bug. *** *** Bug 2048966 has been marked as a duplicate of this bug. *** Verified on IPv4 cluster with OVN at build 4.11.0-0.nightly-2022-02-01-062253 [kni@provisionhost-0-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-02-01-062253 True False 134m Cluster version is 4.11.0-0.nightly-2022-02-01-062253 [kni@provisionhost-0-0 ~]$ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master-0-0 Ready master 162m v1.23.3+b63be7f 192.168.123.53 <none> Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8 master-0-1 Ready master 162m v1.23.3+b63be7f 192.168.123.74 <none> Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8 master-0-2 Ready master 162m v1.23.3+b63be7f 192.168.123.117 <none> Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8 worker-0-0 Ready worker 144m v1.23.3+b63be7f 192.168.123.72 <none> Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8 worker-0-1 Ready worker 144m v1.23.3+b63be7f 192.168.123.62 <none> Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8 [kni@provisionhost-0-0 ~]$ oc get pods -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-f8pgx 6/6 Running 6 (162m ago) 163m ovnkube-master-wrh7c 6/6 Running 6 (162m ago) 163m ovnkube-master-zt2cw 6/6 Running 1 (155m ago) 163m ovnkube-node-25zqv 5/5 Running 0 163m ovnkube-node-6m9z7 5/5 Running 0 146m ovnkube-node-8f6ql 5/5 Running 0 163m ovnkube-node-ftlhc 5/5 Running 0 145m ovnkube-node-n2xqq 5/5 Running 0 163m [core@master-0-0 ~]$ nmcli con show NAME UUID TYPE DEVICE ovs-if-br-ex c106d856-c8bb-453b-8209-f0b3db2c832f ovs-interface br-ex Wired Connection cbae6a1a-a769-479d-9f75-4977c21c3d62 ethernet enp0s3 br-ex 752a8af0-a3c8-470c-aec4-7202781a5ffe ovs-bridge br-ex ovs-if-phys0 11c7f8eb-add1-4043-96c8-53a657e3dc36 ethernet enp0s4 ovs-port-br-ex 407bb2d3-8fee-4022-b660-7452d4d65a8b ovs-port br-ex ovs-port-phys0 4ccdb004-d433-4ccf-b34e-4dc3e531c09d ovs-port enp0s4 Wired Connection dcc8f7c6-e9fa-456b-b52d-36747fc9d24e ethernet -- *** Bug 2048776 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Created attachment 1857210 [details] ovs-configure service journal Description of problem: We're mis-detecting the ipv6 status and setting ipv6.may-fail to false, which causes the connection to fail to come up since there is no ipv6 address available here: Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: ++ nmcli -m multiline --get-values ip6.address conn show 84a523ff-ee8a-4a29-94ca-47590eb0cb76 Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: ++ wc -l Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: + num_ip6_addrs=2 Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: + '[' 2 -gt 1 ']' Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: + extra_if_brex_args+='ipv6.may-fail no ' Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-01-27-104747 How reproducible: Deploy IPv4 cluster Steps to Reproduce: 1. 2. 3. Actual results: Deployment bootstrap failure, masters NotReady, Network operator degraded, ovn-pods CrashLoopBack oc logs network-operator-78ccc94f66-mww95 -n openshift-network-operator I0127 19:57:34.823444 1 log.go:184] Set ClusterOperator conditions: - lastTransitionTime: "2022-01-27T16:54:02Z" message: |- DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-d2hpt is in CrashLoopBackOff State DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-5djsx is in CrashLoopBackOff State DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-smpm6 is in CrashLoopBackOff State DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-01-27T16:51:32Z reason: RolloutHung status: "True" type: Degraded - lastTransitionTime: "2022-01-27T16:51:05Z" status: "False" type: ManagementStateDegraded - lastTransitionTime: "2022-01-27T16:51:05Z" status: "True" type: Upgradeable Expected results: Deployment to succeed Additional info: [kni@provisionhost-0-0 ~]$ oc get pods -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-l8x2m 6/6 Running 6 (3h19m ago) 3h20m ovnkube-master-mmqs7 6/6 Running 6 (3h19m ago) 3h20m ovnkube-master-xjqfx 6/6 Running 1 (76m ago) 3h20m ovnkube-node-5djsx 4/5 CrashLoopBackOff 43 (2m56s ago) 3h20m ovnkube-node-d2hpt 4/5 CrashLoopBackOff 43 (2m49s ago) 3h20m ovnkube-node-smpm6 4/5 CrashLoopBackOff 43 (2m56s ago) 3h20m