Bug 2047445 - ovs-configure mis-detecting the ipv6 status on IPv4 only cluster causing Deployment failure
Summary: ovs-configure mis-detecting the ipv6 status on IPv4 only cluster causing Depl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.11.0
Assignee: Ben Nemec
QA Contact: Anurag saxena
URL:
Whiteboard:
: 2048535 2048776 2048966 (view as bug list)
Depends On:
Blocks: 2048836
TreeView+ depends on / blocked
 
Reported: 2022-01-27 21:01 UTC by Victor Voronkov
Modified: 2022-08-10 10:44 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:44:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2934 0 None open Bug 2047445: Use ip command to check for ipv6 addresses 2022-01-27 22:09:40 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:44:38 UTC

Description Victor Voronkov 2022-01-27 21:01:03 UTC
Created attachment 1857210 [details]
ovs-configure service journal

Description of problem:
We're mis-detecting the ipv6 status and setting ipv6.may-fail to false, which causes the connection to fail to come up since there is no ipv6 address available here:

Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: ++ nmcli -m multiline --get-values ip6.address conn show 84a523ff-ee8a-4a29-94ca-47590eb0cb76
Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: ++ wc -l
Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: + num_ip6_addrs=2
Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: + '[' 2 -gt 1 ']'
Jan 27 16:42:28 master-0-0 configure-ovs.sh[1837]: + extra_if_brex_args+='ipv6.may-fail no '

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-27-104747

How reproducible:
Deploy IPv4 cluster

Steps to Reproduce:
1.
2.
3.

Actual results:
Deployment bootstrap failure, masters NotReady, Network operator degraded, ovn-pods CrashLoopBack

oc logs network-operator-78ccc94f66-mww95 -n openshift-network-operator

I0127 19:57:34.823444       1 log.go:184] Set ClusterOperator conditions:
- lastTransitionTime: "2022-01-27T16:54:02Z"
  message: |-
    DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-d2hpt is in CrashLoopBackOff State
    DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-5djsx is in CrashLoopBackOff State
    DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-smpm6 is in CrashLoopBackOff State
    DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-01-27T16:51:32Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2022-01-27T16:51:05Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2022-01-27T16:51:05Z"
  status: "True"
  type: Upgradeable

Expected results:
Deployment to succeed

Additional info:
[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-ovn-kubernetes
NAME                   READY   STATUS             RESTARTS         AGE
ovnkube-master-l8x2m   6/6     Running            6 (3h19m ago)    3h20m
ovnkube-master-mmqs7   6/6     Running            6 (3h19m ago)    3h20m
ovnkube-master-xjqfx   6/6     Running            1 (76m ago)      3h20m
ovnkube-node-5djsx     4/5     CrashLoopBackOff   43 (2m56s ago)   3h20m
ovnkube-node-d2hpt     4/5     CrashLoopBackOff   43 (2m49s ago)   3h20m
ovnkube-node-smpm6     4/5     CrashLoopBackOff   43 (2m56s ago)   3h20m

Comment 1 Victor Voronkov 2022-01-27 21:16:15 UTC
master-0-0:
[core@master-0-0 ~]$ nmcli con show
NAME              UUID                                  TYPE      DEVICE 
Wired Connection  9d5c7c3b-9130-4a40-b31f-c99cad4da283  ethernet  enp0s3 
Wired Connection  84a523ff-ee8a-4a29-94ca-47590eb0cb76  ethernet  enp0s4 
[core@master-0-0 ~]$ nmcli -m multiline --get-values ip6.address conn show 84a523ff-ee8a-4a29-94ca-47590eb0cb76
IP6.ADDRESS[1]:fe80::5054:ff:fe6e:6923/64

[core@master-0-0 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:72:6c:90 brd ff:ff:ff:ff:ff:ff
3: enp0s4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:6e:69:23 brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.109/24 brd 192.168.123.255 scope global dynamic noprefixroute enp0s4
       valid_lft 2621sec preferred_lft 2621sec
    inet6 fe80::5054:ff:fe6e:6923/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
22: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:8d:01:2c:b3:41 brd ff:ff:ff:ff:ff:ff
23: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000
    link/ether 9a:6d:68:a6:d5:36 brd ff:ff:ff:ff:ff:ff
24: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 5a:e5:2b:53:5c:5e brd ff:ff:ff:ff:ff:ff
    inet 10.130.0.2/23 brd 10.130.1.255 scope global ovn-k8s-mp0
       valid_lft forever preferred_lft forever
    inet6 fe80::58e5:2bff:fe53:5c5e/64 scope link 
       valid_lft forever preferred_lft forever
25: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether 3e:b2:5b:bc:a6:7f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3cb2:5bff:febc:a67f/64 scope link 
       valid_lft forever preferred_lft forever

[core@master-0-0 ~]$ journalctl -u NetworkManager-wait-online.service
-- Logs begin at Thu 2022-01-27 16:39:23 UTC, end at Thu 2022-01-27 21:15:28 UTC. --
Jan 27 16:39:37 master-0-0 systemd[1]: Starting Network Manager Wait Online...
Jan 27 16:39:37 master-0-0 systemd[1]: Started Network Manager Wait Online.
Jan 27 16:41:05 master-0-0 systemd[1]: NetworkManager-wait-online.service: Succeeded.
Jan 27 16:41:05 master-0-0 systemd[1]: Stopped Network Manager Wait Online.
Jan 27 16:41:05 master-0-0 systemd[1]: NetworkManager-wait-online.service: Consumed 0 CPU time
-- Reboot --
Jan 27 16:41:26 localhost systemd[1]: Starting Network Manager Wait Online...
Jan 27 16:42:27 master-0-0 systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Jan 27 16:42:27 master-0-0 systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Jan 27 16:42:27 master-0-0 systemd[1]: Failed to start Network Manager Wait Online.
Jan 27 16:42:27 master-0-0 systemd[1]: NetworkManager-wait-online.service: Consumed 43ms CPU time

Comment 5 Bob Fournier 2022-02-01 17:08:09 UTC
*** Bug 2048535 has been marked as a duplicate of this bug. ***

Comment 6 Bob Fournier 2022-02-01 17:17:13 UTC
*** Bug 2048966 has been marked as a duplicate of this bug. ***

Comment 7 Victor Voronkov 2022-02-01 18:28:17 UTC
Verified on IPv4 cluster with OVN at build 4.11.0-0.nightly-2022-02-01-062253

[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-02-01-062253   True        False         134m    Cluster version is 4.11.0-0.nightly-2022-02-01-062253

[kni@provisionhost-0-0 ~]$ oc get nodes -o wide
NAME         STATUS   ROLES    AGE    VERSION           INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
master-0-0   Ready    master   162m   v1.23.3+b63be7f   192.168.123.53    <none>        Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8
master-0-1   Ready    master   162m   v1.23.3+b63be7f   192.168.123.74    <none>        Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8
master-0-2   Ready    master   162m   v1.23.3+b63be7f   192.168.123.117   <none>        Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8
worker-0-0   Ready    worker   144m   v1.23.3+b63be7f   192.168.123.72    <none>        Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8
worker-0-1   Ready    worker   144m   v1.23.3+b63be7f   192.168.123.62    <none>        Red Hat Enterprise Linux CoreOS 410.84.202201312356-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-108.rhaos4.10.gitb15fee5.el8

[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS       AGE
ovnkube-master-f8pgx   6/6     Running   6 (162m ago)   163m
ovnkube-master-wrh7c   6/6     Running   6 (162m ago)   163m
ovnkube-master-zt2cw   6/6     Running   1 (155m ago)   163m
ovnkube-node-25zqv     5/5     Running   0              163m
ovnkube-node-6m9z7     5/5     Running   0              146m
ovnkube-node-8f6ql     5/5     Running   0              163m
ovnkube-node-ftlhc     5/5     Running   0              145m
ovnkube-node-n2xqq     5/5     Running   0              163m


[core@master-0-0 ~]$ nmcli con show
NAME              UUID                                  TYPE           DEVICE 
ovs-if-br-ex      c106d856-c8bb-453b-8209-f0b3db2c832f  ovs-interface  br-ex  
Wired Connection  cbae6a1a-a769-479d-9f75-4977c21c3d62  ethernet       enp0s3 
br-ex             752a8af0-a3c8-470c-aec4-7202781a5ffe  ovs-bridge     br-ex  
ovs-if-phys0      11c7f8eb-add1-4043-96c8-53a657e3dc36  ethernet       enp0s4 
ovs-port-br-ex    407bb2d3-8fee-4022-b660-7452d4d65a8b  ovs-port       br-ex  
ovs-port-phys0    4ccdb004-d433-4ccf-b34e-4dc3e531c09d  ovs-port       enp0s4 
Wired Connection  dcc8f7c6-e9fa-456b-b52d-36747fc9d24e  ethernet       --

Comment 8 Surya Seetharaman 2022-02-03 11:07:30 UTC
*** Bug 2048776 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2022-08-10 10:44:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.