Bug 2165874

Summary: ovs-interface and mac cloning not working as expected
Product: Red Hat Enterprise Linux 9 Reporter: Quique Llorente <ellorent>
Component: nmstateAssignee: Gris Ge <fge>
Status: CLOSED DUPLICATE QA Contact: Mingyu Shi <mshi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 9.2CC: ferferna, jiji, jishi, network-qe, sfaye, till
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-21 05:19:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2168477    
Bug Blocks:    

Description Quique Llorente 2023-01-31 10:41:08 UTC
Description of problem:

Cloning mac-address at ovs-interrface does not work as expected and DHCP is not assigning the expected address


Version-Release number of selected component (if applicable): nmstate-2.2.3-3.el9.x86_64


How reproducible: Always


Steps to Reproduce:
1. Retrieve the mac address from eth0 (let's say it's 52:55:00:D1:55:02)
2. Configure a ovs-bridge with eth0 as a port and other port with ovs-interface with cloned mac
- ipv4:
    dhcp: true
    enabled: true
  mac-address: 52:55:00:D1:55:02
  name: ovs0
  state: up
  type: ovs-interface
- bridge:
    options:
      stp: true
    port:
    - name: eth0
    - name: ovs0
  name: br69
  state: up
  type: ovs-bridge

Actual results:
ovs0 ovs-interface is not getting the expected IP from DHCP server


Expected results:
ovs0 ovs-interface has the same address previously owned by eth0


Additional info:
CI job: https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/pr-logs/pull/nmstate_kubernetes-nmstate/1059/pull-kubernetes-nmstate-e2e-handler-k8s/1615696734632022016

NetworkManager logs https://gcsweb.ci.kubevirt.io/gcs/kubevirt-prow/pr-logs/pull/nmstate_kubernetes-nmstate/1059/pull-kubernetes-nmstate-e2e-handler-k8s/1615696734632022016/artifacts/NodeNetworkConfigurationPolicy_default_ovs-bridged_network_when_there_is_a_default_interface_with_dynamic_address_and_ovs_bridge_on_top_of_the_default_interface/

Looks like git repo version nmstate-2.2.4-0.alpha.20230118.8ec213b8 is working fine

Comment 1 Quique Llorente 2023-02-01 06:51:29 UTC
We were pining to Gris personal repo to a temporal fix for centos 8 stream, this is for python, we will have the results with latest centos 9 stream soon.

Comment 2 Quique Llorente 2023-02-02 13:53:50 UTC
We have reproduce it locally, looks like a race between NetworkManager cloning the mac and sending the DHCP DISCOVER

This is the dnsmasq logs when test is working fine 

dnsmasq-dhcp: DHCPDISCOVER(br0) 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPOFFER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.102 52:55:00:d1:55:02 node02
dnsmasq-dhcp: DHCPDISCOVER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPOFFER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.102 52:55:00:d1:55:02 node02
dnsmasq-dhcp: DHCPDISCOVER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPOFFER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.102 52:55:00:d1:55:02 node02
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.102 52:55:00:d1:55:02 node02

This is when failing 

dnsmasq-dhcp: DHCPDISCOVER(br0) 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPOFFER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.102 52:55:00:d1:55:02 node02
dnsmasq-dhcp: DHCPDISCOVER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPOFFER(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.102 52:55:00:d1:55:02 node02
dnsmasq-dhcp: DHCPDISCOVER(br0) 192.168.66.102 aa:c2:13:bc:b9:43 
dnsmasq-dhcp: DHCPOFFER(br0) 192.168.66.79 aa:c2:13:bc:b9:43 
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.79 aa:c2:13:bc:b9:43 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.79 aa:c2:13:bc:b9:43 
dnsmasq-dhcp: DHCPREQUEST(br0) 192.168.66.102 52:55:00:d1:55:02 
dnsmasq-dhcp: DHCPACK(br0) 192.168.66.102 52:55:00:d1:55:02 node02

So clearly this is the unexpected DHCPDISCOVER send by NetworkManager

dnsmasq-dhcp: DHCPDISCOVER(br0) 192.168.66.102 aa:c2:13:bc:b9:43 <- this is the generated mac not the cloned one from ovs0 interface

Then we have the following logs at NM

Feb 02 13:21:41 node02 NetworkManager[1697]: <info>  [1675344101.8489] dhcp4 (ovs0): activation: beginning transaction (no timeout)
Feb 02 13:21:41 node02 NetworkManager[1697]: <info>  [1675344101.8498] device (ovs0): set-hw-addr: set-cloned MAC address to 52:55:00:D1:55:02 (52:55:00:D1:55:02)
Feb 02 13:21:42 localhost.localdomain NetworkManager[1697]: <info>  [1675344102.9608] audit: op="checkpoint-adjust-rollback-timeout" arg="/org/freedesktop/NetworkManager/Checkpoint/1" pid=4017 uid=0 result="success"
Feb 02 13:22:18 localhost.localdomain NetworkManager[1697]: <info>  [1675344138.4800] dhcp4 (ovs0): state changed new lease, address=192.168.66.79
Feb 02 13:22:18 localhost.localdomain NetworkManager[1697]: <info>  [1675344138.4817] policy: set 'ovs0-if' (ovs0) as default for IPv4 routing and DNS

So NetworkManager is cloning the address but the ip address received by DHCP is like it was not cloned at all.

Same test with nmstate-git copr is working fine it also install a lot of dependencies at the nmstate-handler container.

Comment 3 Gris Ge 2023-02-09 07:03:26 UTC
Problem reproduced on NM latest git build also.

Might be NetworkManager bug https://bugzilla.redhat.com/show_bug.cgi?id=2168477 waiting debug result from NM team.

Comment 4 Gris Ge 2023-02-21 05:19:44 UTC

*** This bug has been marked as a duplicate of bug 2168477 ***