Bug 1979391 - Baremetal IPI deployment with control plane on bond device fails during bootstrap phase
Summary: Baremetal IPI deployment with control plane on bond device fails during boots...
Keywords:
Status: CLOSED DUPLICATE of bug 1976110
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: Beth White
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-05 20:10 UTC by Marius Cornea
Modified: 2021-08-06 09:10 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-06 09:10:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ovs-configuration.journal (24.14 KB, text/plain)
2021-07-05 20:10 UTC, Marius Cornea
no flags Details

Description Marius Cornea 2021-07-05 20:10:22 UTC
Created attachment 1798331 [details]
ovs-configuration.journal

Description of problem:

Baremetal IPI deployment with control plane on bond device fails during bootstrap phase.

bond configuration is passed via machine config at deploy time by the following:

---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: null
  labels:
    machineconfiguration.openshift.io/role: master
  name: 11-master-bonding
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:;base64,W2Nvbm5lY3Rpb25dCmlkPWVuczRmMAp0eXBlPWV0aGVybmV0CmludGVyZmFjZS1uYW1lPWVuczRmMAptYXN0ZXI9Ym9uZDAKc2xhdmUtdHlwZT1ib25kCmF1dG9jb25uZWN0PXRydWU=
        path: /etc/NetworkManager/system-connections/ens4f0.nmconnection
        filesystem: root
        mode: 0600
      - contents:
          source: data:;base64,W2Nvbm5lY3Rpb25dCmlkPWVuczRmMQp0eXBlPWV0aGVybmV0CmludGVyZmFjZS1uYW1lPWVuczRmMQptYXN0ZXI9Ym9uZDAKc2xhdmUtdHlwZT1ib25kCmF1dG9jb25uZWN0PXRydWU=
        path: /etc/NetworkManager/system-connections/ens4f1.nmconnection
        filesystem: root
        mode: 0600
      - contents:
          source: data:;base64,W2Nvbm5lY3Rpb25dCmlkPWJvbmQwCnR5cGU9Ym9uZAppbnRlcmZhY2UtbmFtZT1ib25kMAphdXRvY29ubmVjdD10cnVlCmNvbm5lY3Rpb24uYXV0b2Nvbm5lY3Qtc2xhdmVzPTEKCltib25kXQptb2RlPTgwMi4zYWQKbWlpbW9uPTEwMAoKW2lwdjRdCm1ldGhvZD1hdXRvCmRoY3AtdGltZW91dD0yMTQ3NDgzNjQ3CgpbaXB2Nl0KbWV0aG9kPWRpc2FibGVkCg==
        path: /etc/NetworkManager/system-connections//bond0.nmconnection
        filesystem: root
        mode: 0600


Looking on the master nodes we can see that the bond0 device includes only ens4f1 interface while ens4f0 interface is configured independently and holding the DHCP ip address(10.46.58.20/24). br-ex bridge includes bond0 device and ends up without an IPv4 address as it is set on the ens4f0.

[root@openshift-master-0 core]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether bc:97:e1:29:9c:80 brd ff:ff:ff:ff:ff:ff
3: eno2np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether bc:97:e1:29:9c:81 brd ff:ff:ff:ff:ff:ff
4: ens4f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 1c:34:da:75:bc:fa brd ff:ff:ff:ff:ff:ff
    inet 10.46.58.20/24 brd 10.46.58.255 scope global dynamic noprefixroute ens4f0
       valid_lft 29820sec preferred_lft 29820sec
    inet6 2620:52:0:2e3a:1e34:daff:fe75:bcfa/64 scope global dynamic noprefixroute 
       valid_lft 2591926sec preferred_lft 604726sec
    inet6 fe80::1e34:daff:fe75:bcfa/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
5: ens4f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 1c:34:da:75:bc:fb brd ff:ff:ff:ff:ff:ff
7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP group default qlen 1000
    link/ether 1c:34:da:75:bc:fb brd ff:ff:ff:ff:ff:ff
8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0e:5f:b5:bd:02:67 brd ff:ff:ff:ff:ff:ff
12: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 1c:34:da:75:bc:fa brd ff:ff:ff:ff:ff:ff
    inet6 2620:52:0:2e3a:5af4:7e2:a2b0:a8a6/64 scope global dynamic noprefixroute 
       valid_lft 2591528sec preferred_lft 604328sec
    inet6 fe80::ce05:109f:3bec:d287/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever


[root@openshift-master-0 core]# nmcli con
NAME              UUID                                  TYPE           DEVICE 
Wired Connection  74efc6ae-ae0b-46a2-b40c-7fc096b261ff  ethernet       ens4f0 
ovs-if-br-ex      1460f265-cbc6-4435-8f96-8dc5d1608640  ovs-interface  br-ex  
br-ex             091be9cc-88c3-4617-a38f-c1c050a4b9ef  ovs-bridge     br-ex  
ens4f1            9b808a8b-13ba-3749-8b8b-6f6f208666a2  ethernet       ens4f1 
ovs-if-phys0      fe52c005-e7e7-4b5c-9306-6884f8e79b4c  bond           bond0  
ovs-port-br-ex    c22be679-27bc-401a-a7da-fecd0776cd9e  ovs-port       br-ex  
ovs-port-phys0    d1b534b2-afeb-42b4-8a89-6d01b96a789c  ovs-port       bond0  
bond0             52eecf5a-df5e-30ae-9ca1-6297f0239027  bond           --     
ens4f0            8950883b-a416-360a-a597-cb308946aaa0  ethernet       --     

[root@openshift-master-0 core]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v4.18.0-305.7.1.el8_4.x86_64

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 1c:34:da:75:bc:fb
Active Aggregator Info:
	Aggregator ID: 1
	Number of ports: 1
	Actor Key: 21
	Partner Key: 3
	Partner Mac Address: 50:c7:09:34:6a:a0

Slave Interface: ens4f1
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 1c:34:da:75:bc:fb
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 1c:34:da:75:bc:fb
    port key: 21
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 127
    system mac address: 50:c7:09:34:6a:a0
    oper key: 3
    port priority: 127
    port number: 3
    port state: 63

[root@openshift-master-0 core]# cat /etc/NetworkManager/system-connections/ens4f0.nmconnection 
[connection]
id=ens4f0
type=ethernet
interface-name=ens4f0
master=bond0
slave-type=bond
autoconnect=true

[root@openshift-master-0 core]# cat /etc/NetworkManager/system-connections/ens4f1.nmconnection 
[connection]
id=ens4f1
type=ethernet
interface-name=ens4f1
master=bond0
slave-type=bond
autoconnect=true

[root@openshift-master-0 core]# cat /etc/NetworkManager/system-connections/bond0.nmconnection 
[connection]
id=bond0
type=bond
interface-name=bond0
autoconnect=true
connection.autoconnect-slaves=1

[bond]
mode=802.3ad
miimon=100

[ipv4]
method=auto
dhcp-timeout=2147483647

[ipv6]
method=disabled


Version-Release number of selected component (if applicable):
4.8.0-rc.3

How reproducible:
100%

Steps to Reproduce:
1. Deploy baremetal IPI environment with control plane running on a bond device

Actual results:
br-ex bridge on master nodes does not get an IPv4 address via DHCP and bootstrap fails


Expected results:
ens4f0 is part of the bond device according to the NetworkManager system-connections configuration and br-ex bridge gets the DHCP IPV4 address

Additional info:

Attaching NetworkManager ovs-configuration.service journal logs.

Comment 2 Marius Cornea 2021-07-05 20:13:33 UTC
Note: the same configuration deploys successfully on 4.7.

Comment 15 Steven Hardy 2021-08-06 09:10:34 UTC

*** This bug has been marked as a duplicate of bug 1976110 ***


Note You need to log in before you can comment on or make changes to this bug.