Bug 2053027

Summary: nmpolicy cannot clone IP config of the default NIC carrying static IPv6
Product: Container Native Virtualization (CNV) Reporter: Petr Horáček <phoracek>
Component: NetworkingAssignee: Quique Llorente <ellorent>
Status: CLOSED ERRATA QA Contact: Adi Zavalkovsky <azavalko>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.10.0CC: amalykhi, azavalko, cnv-qe-bugs, ellorent, ferferna, fge, jiji, jishi, mshi, network-qe, phoracek, rnetser, thaller, till
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kubernetes-nmstate-handler-container-v4.10.0-48 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2048988 Environment:
Last Closed: 2022-03-16 16:07:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Petr Horáček 2022-02-11 08:39:28 UTC
Summary:
nmpolicy which attempts to create a bridge on a NIC with static IP configuration fails. During this process nmpolicy is trying to copy all existing routes (IPv4 and IPv6) from the NIC to the bridge. However, since the NIC had multiple IPv6 address (static and link-local) with the same destination, nmstate fails to apply the change due to its lack of multipath support. This probably affects all default OpenShift baremetal installations.

Current plan forward:
We will attempt to introduce a new operator != to nmpolicy which would allow us to filter out these IPv6 routes and get around the limitation. This is currently being tested by QE. The mid term solution is a fix on the nmstate side via https://bugzilla.redhat.com/show_bug.cgi?id=2048988, long term solution then fix of the core issue in NetworkManager https://bugzilla.redhat.com/show_bug.cgi?id=1837254.

Comment 2 Adi Zavalkovsky 2022-02-16 16:08:30 UTC
Verified.
OCP version - 4.10. kubernetes-nmstate-handler-container-v4.10.0-48




deployment yaml:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: capture-br1-deployment
spec:
  capture:
    default-gw: routes.running.destination=="0.0.0.0/0"
    default-gw-routes-takeover: capture.primary-nic-routes | routes.running.next-hop-interface
      := "capture-br1"
    primary-nic: interfaces.name==capture.default-gw.routes.running.0.next-hop-interface
    primary-nic-routes: routes.running.next-hop-interface==capture.primary-nic.interfaces.0.name
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: '{{ capture.primary-nic.interfaces.0.name }}'
          vlan: {}
      ipv4: '{{ capture.primary-nic.interfaces.0.ipv4 }}'
      ipv6: '{{ capture.primary-nic.interfaces.0.ipv6 }}'
      name: capture-br1
      state: up
      type: linux-bridge
    routes:
      config: '{{ capture.default-gw-routes-takeover.routes.running }}'
  nodeSelector:
    capture: allow

teardown yaml:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: capture-br1-teardown
spec:
  capture:
    capture-br1: interfaces.name == "capture-br1"
    capture-br1-routes: routes.running.next-hop-interface == "capture-br1"
    capture-br1-routes-takeover: capture.capture-br1-routes | routes.running.next-hop-interface
      := capture.capture-br1.interfaces.0.bridge.port.0.name
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port: []
      ipv4:
        auto-dns: true
        dhcp: false
        enabled: false
      ipv6:
        auto-dns: true
        autoconf: false
        dhcp: false
        enabled: false
      name: capture-br1
      state: absent
      type: linux-bridge
    - ipv4: '{{ capture.capture-br1.interfaces.0.ipv4 }}'
      ipv6: '{{ capture.capture-br1.interfaces.0.ipv6 }}'
      name: '{{ capture.capture-br1.interfaces.0.bridge.port.0.name }}'
      state: up
      type: ethernet
    routes:
      config: '{{ capture.capture-br1-routes-takeover.routes.running }}'
  nodeSelector:
    capture: allow

Applied on BM SRIOV nodes, both deployment and teardown is working as expected

Comment 5 errata-xmlrpc 2022-03-16 16:07:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947