Bug 2005240

Summary: Fail to attach linux bond interface to ovs-bridge
Product: Red Hat Enterprise Linux 8 Reporter: Radim Hrazdil <rhrazdil>
Component: nmstateAssignee: Gris Ge <fge>
Status: CLOSED ERRATA QA Contact: Mingyu Shi <mshi>
Severity: medium Docs Contact:
Priority: high    
Version: 8.5CC: acabral, bgalvani, bstinson, ferferna, fge, jiji, jishi, jwboyer, lrintel, mshi, network-qe, rkhan, sfaye, sukulkar, till
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: 8.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2128233 (view as bug list) Environment:
Last Closed: 2022-11-08 09:17:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2128233    
Attachments:
Description Flags
NetworkManager+nmstatectl logs none

Description Radim Hrazdil 2021-09-17 08:15:32 UTC
Created attachment 1823747 [details]
NetworkManager+nmstatectl logs

Description of problem:
When attaching linux bond interface to an ovs-bridge, nmstatectl sometimes fails with an error:
libnmstate.error.NmstateLibnmError: Activate profile uuid:cf81a923-5999-4a21-bd7e-dc96c7977451 iface:eth1 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_DEPENDENCY_FAILED of type NM.DeviceStateReason>




Version-Release number of selected component (if applicable):
NetworkManager-1.32.10-2.el8.x86_64
NetworkManager-ovs-1.32.10-2.el8.x86_64
NetworkManager-config-server-1.32.8-1.el8.noarch
NetworkManager-libnm-1.32.10-2.el8.x86_64
NetworkManager-team-1.32.10-2.el8.x86_64
NetworkManager-tui-1.32.10-2.el8.x86_64

python3-libnmstate-1.1.0-3.el8.noarch
nmstate-1.1.0-3.el8.noarch

openvswitch2.15-2.15.0-35.el8s.x86_6



How reproducible:
50%

Steps to Reproduce:
1.
cat ds.yaml
interfaces:
  - link-aggregation:
      mode: active-backup
      options:
        miimon: 140
        primary: eth1
      port:
        - eth1
        - eth2
    name: bond101
    state: up
    type: bond
  - bridge:
      options:
        stp: false
      port:
        - name: bond101
    name: br22
    state: up
    type: ovs-bridge

2. nmstatectl set ds.yaml



Additional info:
NetworkManager trace logs and nmstatectl output attached


Current state before applying the desired state:

[vagrant@node02 ~]$ nmstatectl show
Unhandled IFLA_INFO_DATA for iface type Other("IpTun")
2021-09-15 19:38:09,623 root         DEBUG    NetworkManager version 1.32.10
2021-09-15 19:38:09,626 root         DEBUG    Async action: Retrieve applied config: ethernet eth0 started
2021-09-15 19:38:09,627 root         DEBUG    Async action: Retrieve applied config: ethernet eth1 started
2021-09-15 19:38:09,627 root         DEBUG    Async action: Retrieve applied config: ethernet eth2 started
2021-09-15 19:38:09,630 root         DEBUG    Async action: Retrieve applied config: ethernet eth0 finished
2021-09-15 19:38:09,631 root         DEBUG    Async action: Retrieve applied config: ethernet eth1 finished
2021-09-15 19:38:09,631 root         DEBUG    Async action: Retrieve applied config: ethernet eth2 finished
2021-09-15 19:38:09,634 root         DEBUG    Interface ethernet.eth0 found. Merging the interface information.
2021-09-15 19:38:09,634 root         DEBUG    Interface ethernet.eth1 found. Merging the interface information.
2021-09-15 19:38:09,634 root         DEBUG    Interface ethernet.eth2 found. Merging the interface information.
Unhandled IFLA_INFO_DATA for iface type Other("IpTun")
Unhandled IFLA_INFO_DATA for iface type Other("IpTun")
---
dns-resolver:
  config: {}
  running:
    search: []
    server:
    - 192.168.66.2
    - fe80::cd:2dff:feda:21a%eth0
    - fd00::1
route-rules:
  config: []
routes:
  config:
  - destination: fd10:244::8c40/128
    metric: 1024
    next-hop-address: '::'
    next-hop-interface: cali100ec759187
    table-id: 254
  - destination: fd10:244::bac0/122
    metric: 1024
    next-hop-address: fd00::103
    next-hop-interface: eth0
    table-id: 254
  - destination: fd10:244::c480/122
    metric: 1024
    next-hop-address: fd00::101
    next-hop-interface: eth0
    table-id: 254
  - destination: fd10:244::f8c0/122
    metric: 1024
    next-hop-address: fd00::104
    next-hop-interface: eth0
    table-id: 254
  - destination: 10.244.140.64/26
    metric: 0
    next-hop-address: 0.0.0.0
    next-hop-interface: ''
    table-id: 254
  - destination: 10.244.140.65/32
    metric: 0
    next-hop-address: 0.0.0.0
    next-hop-interface: cali100ec759187
    table-id: 254
  - destination: 10.244.186.192/26
    metric: 0
    next-hop-address: 192.168.66.103
    next-hop-interface: tunl0
    table-id: 254
  - destination: 10.244.196.128/26
    metric: 0
    next-hop-address: 192.168.66.101
    next-hop-interface: tunl0
    table-id: 254
  - destination: 10.244.248.192/26
    metric: 0
    next-hop-address: 192.168.66.104
    next-hop-interface: tunl0
    table-id: 254
  running:
  - destination: fd00::102/128
    metric: 103
    next-hop-address: '::'
    next-hop-interface: eth0
    table-id: 254
  - destination: fd00::/64
    metric: 103
    next-hop-address: '::'
    next-hop-interface: eth0
    table-id: 254
  - destination: fd10:244::8c40/128
    metric: 1024
    next-hop-address: '::'
    next-hop-interface: cali100ec759187
    table-id: 254
  - destination: fd10:244::bac0/122
    metric: 1024
    next-hop-address: fd00::103
    next-hop-interface: eth0
    table-id: 254
  - destination: fd10:244::c480/122
    metric: 1024
    next-hop-address: fd00::101
    next-hop-interface: eth0
    table-id: 254
  - destination: fd10:244::f8c0/122
    metric: 1024
    next-hop-address: fd00::104
    next-hop-interface: eth0
    table-id: 254
  - destination: fe80::/64
    metric: 103
    next-hop-address: '::'
    next-hop-interface: eth0
    table-id: 254
  - destination: fe80::/64
    metric: 256
    next-hop-address: '::'
    next-hop-interface: cali100ec759187
    table-id: 254
  - destination: ::/0
    metric: 103
    next-hop-address: fe80::cd:2dff:feda:21a
    next-hop-interface: eth0
    table-id: 254
  - destination: 0.0.0.0/0
    metric: 103
    next-hop-address: 192.168.66.2
    next-hop-interface: eth0
    table-id: 254
  - destination: 10.244.140.64/26
    metric: 0
    next-hop-address: 0.0.0.0
    next-hop-interface: ''
    table-id: 254
  - destination: 10.244.140.65/32
    metric: 0
    next-hop-address: 0.0.0.0
    next-hop-interface: cali100ec759187
    table-id: 254
  - destination: 10.244.186.192/26
    metric: 0
    next-hop-address: 192.168.66.103
    next-hop-interface: tunl0
    table-id: 254
  - destination: 10.244.196.128/26
    metric: 0
    next-hop-address: 192.168.66.101
    next-hop-interface: tunl0
    table-id: 254
  - destination: 10.244.248.192/26
    metric: 0
    next-hop-address: 192.168.66.104
    next-hop-interface: tunl0
    table-id: 254
  - destination: 192.168.66.0/24
    metric: 103
    next-hop-address: 0.0.0.0
    next-hop-interface: eth0
    table-id: 254
interfaces:
- name: cali100ec759187
  type: veth
  state: up
  accept-all-mac-addresses: false
  ethtool:
    feature:
      highdma: true
      rx-checksum: true
      rx-gro: false
      rx-gro-list: false
      rx-udp-gro-forwarding: false
      rx-vlan-hw-parse: true
      rx-vlan-stag-hw-parse: true
      tx-checksum-ip-generic: true
      tx-checksum-sctp: true
      tx-generic-segmentation: true
      tx-gre-csum-segmentation: true
      tx-gre-segmentation: true
      tx-ipxip4-segmentation: true
      tx-ipxip6-segmentation: true
      tx-nocache-copy: false
      tx-scatter-gather-fraglist: true
      tx-sctp-segmentation: true
      tx-tcp-ecn-segmentation: true
      tx-tcp-mangleid-segmentation: true
      tx-tcp-segmentation: true
      tx-tcp6-segmentation: true
      tx-udp_tnl-csum-segmentation: true
      tx-udp_tnl-segmentation: true
      tx-vlan-hw-insert: true
      tx-vlan-stag-hw-insert: true
  ipv4:
    enabled: false
    address: []
  ipv6:
    enabled: true
    address:
    - ip: fe80::ecee:eeff:feee:eeee
      prefix-length: 64
  mac-address: EE:EE:EE:EE:EE:EE
  mtu: 1480
  veth:
    peer: eth2
- name: eth0
  type: ethernet
  state: up
  accept-all-mac-addresses: false
  ethernet:
    auto-negotiation: false
  ethtool:
    feature:
      rx-gro: true
      rx-gro-list: false
      rx-udp-gro-forwarding: false
      tx-checksum-ip-generic: true
      tx-generic-segmentation: true
      tx-nocache-copy: false
      tx-tcp-ecn-segmentation: true
      tx-tcp-mangleid-segmentation: false
      tx-tcp-segmentation: true
      tx-tcp6-segmentation: true
    ring:
      rx: 256
      tx: 256
  ipv4:
    enabled: true
    address:
    - ip: 192.168.66.102
      prefix-length: 24
    auto-dns: true
    auto-gateway: true
    auto-route-table-id: 0
    auto-routes: true
    dhcp: true
  ipv6:
    enabled: true
    address:
    - ip: fd00::102
      prefix-length: 128
    - ip: fe80::909:a9f1:bca7:3b3c
      prefix-length: 64
    auto-dns: true
    auto-gateway: true
    auto-route-table-id: 0
    auto-routes: true
    autoconf: true
    dhcp: true
  lldp:
    enabled: false
  mac-address: 52:55:00:D1:55:02
  mtu: 1500
- name: eth1
  type: ethernet
  state: up
  accept-all-mac-addresses: false
  ethernet:
    auto-negotiation: false
  ethtool:
    feature:
      rx-gro: true
      rx-gro-list: false
      rx-udp-gro-forwarding: false
      tx-checksum-ip-generic: true
      tx-generic-segmentation: true
      tx-nocache-copy: false
      tx-tcp-ecn-segmentation: true
      tx-tcp-mangleid-segmentation: false
      tx-tcp-segmentation: true
      tx-tcp6-segmentation: true
    ring:
      rx: 256
      tx: 256
  ipv4:
    enabled: false
    address: []
    dhcp: false
  ipv6:
    enabled: false
    address: []
    autoconf: false
    dhcp: false
  lldp:
    enabled: false
  mac-address: 52:55:00:D1:56:02
  mtu: 1500
- name: eth2
  type: ethernet
  state: up
  accept-all-mac-addresses: false
  ethernet:
    auto-negotiation: false
  ethtool:
    feature:
      rx-gro: true
      rx-gro-list: false
      rx-udp-gro-forwarding: false
      tx-checksum-ip-generic: true
      tx-generic-segmentation: true
      tx-nocache-copy: false
      tx-tcp-ecn-segmentation: true
      tx-tcp-mangleid-segmentation: false
      tx-tcp-segmentation: true
      tx-tcp6-segmentation: true
    ring:
      rx: 256
      tx: 256
  ipv4:
    enabled: false
    address: []
    dhcp: false
  ipv6:
    enabled: false
    address: []
    autoconf: false
    dhcp: false
  lldp:
    enabled: false
  mac-address: 52:55:00:D1:56:03
  mtu: 1500
- name: lo
  type: unknown
  state: up
  accept-all-mac-addresses: false
  ethtool:
    feature:
      rx-gro: true
      rx-gro-list: false
      rx-udp-gro-forwarding: false
      tx-generic-segmentation: true
      tx-sctp-segmentation: true
      tx-tcp-ecn-segmentation: true
      tx-tcp-mangleid-segmentation: true
      tx-tcp-segmentation: true
      tx-tcp6-segmentation: true
  ipv4:
    enabled: true
    address:
    - ip: 127.0.0.1
      prefix-length: 8
  ipv6:
    enabled: true
    address:
    - ip: ::1
      prefix-length: 128
  mac-address: 00:00:00:00:00:00
  mtu: 65536
- name: tunl0
  type: unknown
  state: up
  accept-all-mac-addresses: false
  ethtool:
    feature:
      highdma: true
      rx-gro: true
      rx-gro-list: false
      rx-udp-gro-forwarding: false
      tx-checksum-ip-generic: true
      tx-generic-segmentation: true
      tx-nocache-copy: false
      tx-scatter-gather-fraglist: true
      tx-sctp-segmentation: true
      tx-tcp-ecn-segmentation: true
      tx-tcp-mangleid-segmentation: true
      tx-tcp-segmentation: true
      tx-tcp6-segmentation: true
  ipv4:
    enabled: true
    address:
    - ip: 10.244.140.64
      prefix-length: 32
  ipv6:
    enabled: false
    address: []
  mac-address: 00:00:00:00
  mtu: 1480
[vagrant@node02 ~]$ nmcli con
NAME  UUID                                  TYPE      DEVICE
eth0  1c45a8cb-36d0-406f-80fa-fae1fd8d9ec1  ethernet  eth0  
eth1  cf81a923-5999-4a21-bd7e-dc96c7977451  ethernet  eth1  
eth2  cdb090be-2087-43e6-8e54-c973a2393b65  ethernet  eth2

Comment 3 Gris Ge 2022-06-09 06:35:12 UTC
Hi Beniamino,

Could you take a quick log on above NM logs for the error:

reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_DEPENDENCY_FAILED of type NM.DeviceStateReason>

Comment 4 Gris Ge 2022-06-09 08:06:25 UTC
The RHEL 9.x nmstate 2.x rust has no this problem as we retry on activation failure.

For 8.x nmstate 1.x is way to complex to get this retry working properly, I will wait NM team fix this issue from their end.

Comment 5 Gris Ge 2022-06-27 07:41:05 UTC
Assigning to NetworkManager component hoping they could fix this `NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED` failure.

To reproduce this problem on RHEL 8:

sudo ip netns add tmp
sudo ip link add eth2 type veth peer name eth2peer
sudo ip link add eth1 type veth peer name eth1peer
sudo ip link set eth1 up
sudo ip link set eth2 up
sudo ip link set eth1peer netns tmp
sudo ip link set eth2peer netns tmp
sudo ip netns exec tmp ip link set eth1peer up
sudo ip netns exec tmp ip link set eth2peer up

sudo nmcli device set eth1 managed yes
sudo nmcli device set eth2 managed yes

sudo nmcli device set eth1 managed yes

echo 'interfaces:
  - link-aggregation:
      mode: active-backup
      options:
        miimon: 140
        primary: eth1
      port:
        - eth1
        - eth2
    name: bond101
    state: up
    type: bond
  - bridge:
      options:
        stp: false
      port:
        - name: bond101
    name: br22
    state: up
    type: ovs-bridge' | sudo nmstatectl set  -



Also revoke dev_ack as we are not sure we can fix it in NM or not.

Comment 7 Beniamino Galvani 2022-07-05 11:55:46 UTC
Hi,

the desired configuration is:

        br22
    (ovs-bridge)
         ^
         |
  ovs-port-bond101
    (ovs-port)
         ^
         |
      bond101
      (bond)
      ^     ^
      |     |
    eth1   eth2

and the sequence of operations performed by nmstate is:

     [1631736148.8830] audit: op="checkpoint-create" arg="/org/freedesktop/NM/Checkpoint/15" pid=377695 uid=0 result="success"

     [1631736148.9020] audit: op="connection-add"      uuid="2baca9ad-a992-476b-880c-6f377246403b" name="bond101"
     [1631736148.9037] audit: op="connection-add"      uuid="ae3e6fa3-f8f9-447b-b940-15184c8dfddd" name="br22"
     [1631736148.9081] audit: op="connection-add"      uuid="7223b3ae-2270-48ea-b960-0f926b2f1e03" name="ovs-port-bond101"

 (1) [1631736148.9128] audit: op="connection-activate" uuid="ae3e6fa3-f8f9-447b-b940-15184c8dfddd" name="br22"
     [1631736148.9394] audit: op="connection-activate" uuid="2baca9ad-a992-476b-880c-6f377246403b" name="bond101"
 (2) [1631736149.0126] audit: op="connection-activate" uuid="7223b3ae-2270-48ea-b960-0f926b2f1e03" name="ovs-port-bond101"
 (3) [1631736149.0670] audit: op="connection-activate" uuid="cf81a923-5999-4a21-bd7e-dc96c7977451" name="eth1"
     [1631736149.0881] audit: op="connection-activate" uuid="cdb090be-2087-43e6-8e54-c973a2393b65" name="eth2"

     [1631736149.1707] audit: op="checkpoint-rollback" arg="/org/freedesktop/NM/Checkpoint/15" pid=377695 uid=0 result="success"

At (1) br22 is activated manually and since it has
connection.autoconnect-slaves=yes then also ovs-port-bond101 gets
activated (all the remaining ports are also activated recursively).

Then at (2) nmstate asks to activate again ovs-port-bond101. This
disconnects the device and causes the disconnection of bond101, which
enters the deactivating state.

At (3), eth1 is activated. When the device enters the prepare state,
the state of the controller interface (bond101) is still deactivating
and thus the activation fails with reason "dependency-failed".

I don't think this is easily solvable by NM, as there are activations
interrupting each other. In particular, eth1 gets activated when the
controller (bond101) has not settled yet and is still deactivating.

Can nmstate wait that the controller is ready before starting new
activation?

Comment 8 Gris Ge 2022-07-06 08:35:32 UTC
Sure. Let me try from my end.

Comment 9 Gris Ge 2022-07-07 07:11:31 UTC
Patch sent to upstream https://github.com/nmstate/nmstate/pull/1963

Previously, we are treating controller profile holding `IP_CONFIG` state as activated, this lead to race problem mentioned above.

Since OVS bridge and OVS port are not allowed to hold IP address, we wait its activation reach `NM.ActiveConnectionState.ACTIVATED`.

Comment 11 Gris Ge 2022-07-20 09:15:00 UTC
*** Bug 1966478 has been marked as a duplicate of this bug. ***

Comment 14 Mingyu Shi 2022-08-10 07:14:18 UTC
Verified with:
nmstate-1.3.2-1.el8.x86_64
nispor-1.2.7-1.el8.x86_64
NetworkManager-1.39.12-1.el8.x86_64
openvswitch2.15-2.15.0-114.el8fdp.x86_64

Run 50 times, all pass.

Comment 23 errata-xmlrpc 2022-11-08 09:17:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7465