Bug 1902674 - [BM][IPI] Nodes lose connectivity after reboot
Summary: [BM][IPI] Nodes lose connectivity after reboot
Keywords:
Status: CLOSED DUPLICATE of bug 1898036
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: Ricardo Carrillo Cruz
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-30 11:27 UTC by Yurii Prokulevych
Modified: 2020-12-01 16:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-01 14:53:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yurii Prokulevych 2020-11-30 11:27:04 UTC
Description of problem:
-----------------------
Cluster nodes lose connectivity after reboot
From console see:

[   93.172414] configure-ovs.sh[1778]: ++ awk '{ if ($4 == "dev") { print $5; exit } }'
[   93.173243] configure-ovs.sh[1778]: + iface=
[   93.173621] configure-ovs.sh[1778]: + [[ -n '' ]]
[   93.174028] configure-ovs.sh[1778]: + counter=11
[   93.174436] configure-ovs.sh[1778]: + echo 'No default route found on attempt: 11'
[   93.174835] configure-ovs.sh[1778]: No default route found on attempt: 11
[   93.175250] configure-ovs.sh[1778]: + sleep 5
[   98.175302] configure-ovs.sh[1778]: + '[' 11 -lt 12 ']'
[   98.179798] configure-ovs.sh[1778]: ++ ip route show default
[   98.179978] configure-ovs.sh[1778]: ++ awk '{ if ($4 == "dev") { print $5; exit } }'
[   98.181993] configure-ovs.sh[1778]: + iface=
[   98.182091] configure-ovs.sh[1778]: + [[ -n '' ]]
[   98.182955] configure-ovs.sh[1778]: ++ ip -6 route show default
[   98.183423] configure-ovs.sh[1778]: ++ awk '{ if ($4 == "dev") { print $5; exit } }'
[   98.186087] configure-ovs.sh[1778]: + iface=
[   98.186198] configure-ovs.sh[1778]: + [[ -n '' ]]
[   98.186591] configure-ovs.sh[1778]: + counter=12
[   98.186963] configure-ovs.sh[1778]: + echo 'No default route found on attempt: 12'
[   98.187350] configure-ovs.sh[1778]: No default route found on attempt: 12
[   98.187817] configure-ovs.sh[1778]: + sleep 5
[  103.189427] configure-ovs.sh[1778]: + '[' 12 -lt 12 ']'
[  103.197737] configure-ovs.sh[1778]: + '[' '' = br-ex ']'
[  103.197898] configure-ovs.sh[1778]: + '[' -z '' ']'
[  103.199085] configure-ovs.sh[1778]: + echo 'ERROR: Unable to find default gateway interface'
[  103.199668] configure-ovs.sh[1778]: ERROR: Unable to find default gateway interface
[  103.200514] configure-ovs.sh[1778]: + exit 1

After logging via console next connections are listed:
------------------------------------------------------
[root@worker-0-1 ~]# nmcli connection show
NAME              UUID                                  TYPE      DEVICE
Wired Connection  f8789ab5-d345-4ea5-9b53-3a084d336ff9  ethernet  enp4s0
Wired Connection  f8789ab5-d345-4ea5-9b53-3a084d336ff9  ethernet  enp5s0

but before reboot we had:
-------------------------
[core@worker-0-1 ~]$ nmcli connection show
NAME              UUID                                  TYPE           DEVICE
ovs-if-br-ex      2a8ffc43-17fd-41a4-bf13-b22d45810b62  ovs-interface  br-ex
br-ex             c47f7225-6970-43bc-9718-bd31abb9d858  ovs-bridge     br-ex
ovs-if-phys0      39c4944a-a231-4a96-aef3-9b704ccb056c  ethernet       enp5s0
ovs-port-br-ex    1bed73b1-e8a6-42e4-b2cf-d872a9cf2387  ovs-port       br-ex
ovs-port-phys0    df8f6166-328f-42a1-bd6c-d59bcacb12cf  ovs-port       enp5s0
Wired Connection  f8789ab5-d345-4ea5-9b53-3a084d336ff9  ethernet       --


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
4.7.0-0.nightly-2020-11-29-133728

How reproducible:
-----------------
so far 100%

Steps to Reproduce:
-------------------
1. Deploy BM IPI 4.7
2. Reboot a node by logging via ssh


Actual results:
---------------
Node loses its connectivity and becomes NotReady


Expected results:
-----------------
Node is successfully rebooted


Additional info:
----------------
Virtual deployment - 3masters + 2workers; baremetal and provisioning networks IPv6

Comment 1 Ross Brattain 2020-12-01 02:55:21 UTC
Do you have two default routes maybe?

Can you check the outputs of

ip route show default | awk '{ if ($4 == "dev") { print $5; exit } }' 

ip -6 route show default | awk '{ if ($4 == "dev") { print $5; exit } }'

Comment 2 Yurii Prokulevych 2020-12-01 05:55:51 UTC
(In reply to Ross Brattain from comment #1)
> Do you have two default routes maybe?
> 
> Can you check the outputs of
> 
> ip route show default | awk '{ if ($4 == "dev") { print $5; exit } }' 
> 
> ip -6 route show default | awk '{ if ($4 == "dev") { print $5; exit } }'

Hi Ross,

[root@worker-0-1 system]# ip route

[root@worker-0-1 system]# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
fe80::/64 dev enp4s0 proto kernel metric 100 pref medium
fe80::/64 dev enp5s0 proto kernel metric 101 pref medium
fe80::/64 dev genev_sys_6081 proto kernel metric 256 pref medium
[root@worker-0-1 system]#

Problem is that some connection created by configure-ovs.sh disappeared after reboot.

E.g.:
configure-ovs.sh[1646]: + nmcli c add type ovs-bridge con-name br-ex conn.interface br-ex 802-3-ethernet.mtu 1500 802-3-ethernet.cloned-mac-address 52:54:00:b3:ba:87 ipv4.route-metric 100 ipv6.route-metric 100

configure-ovs.sh[1646]: + nmcli c add type ovs-port conn.interface enp5s0 master br-ex con-name ovs-port-phys0

configure-ovs.sh[1646]: + nmcli c add type ovs-port conn.interface br-ex master br-ex con-name ovs-port-br-ex

configure-ovs.sh[1646]: + nmcli c add type 802-3-ethernet conn.interface enp5s0 master ovs-port-phys0 con-name  ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500

Comment 4 Antoni Segura Puimedon 2020-12-01 14:53:52 UTC

*** This bug has been marked as a duplicate of bug 1898036 ***


Note You need to log in before you can comment on or make changes to this bug.