Bug 1733709

Summary: After restarting NetworkManager, openvswitch objects need to be manually brought up
Product: Red Hat Enterprise Linux 7 Reporter: David Critch <dcritch>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0CC: atragler, bgalvani, dornelas, fgiudici, lrintel, pasik, ptalbert, rkhan, sukulkar, thaller, vbenes
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.18.0-6.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1734032 1735706 1735707 (view as bug list) Environment:
Last Closed: 2020-03-31 20:07:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1734032, 1735706, 1735707    
Attachments:
Description Flags
NetworkManager journal w/ 'trace' level logging
none
still present in RHEL7.7
none
Testing script for RHEL 7.6 / 7.7 / 7.8
none
Output from the testing script
none
System journal from reproducing the bug under RHEL-7.8-20190912.3
none
NM trace log from reproducing the bug under RHEL-7.8-20190912.3
none
Testing script for RHEL 7.6 / 7.7 / 7.8
none
Testing script for RHEL 7.6 / 7.7 / 7.8
none
Output from the testing script
none
System journal from reproducing the bug under RHEL-7.8-20190912.3
none
NM trace log from reproducing the bug under RHEL-7.8-20190912.3
none
Output from the testing script
none
System journal from reproducing the bug under RHEL-7.8-20190912.3
none
NM trace log from reproducing the bug under RHEL-7.8-20190912.3 none

Description David Critch 2019-07-27 19:51:22 UTC
Created attachment 1593891 [details]
NetworkManager journal w/ 'trace' level logging

Description of problem:

We are using NetworkManager with the OVS plugin to configure a bridge on top of a bonded pair of interfaces. After restarting NetworkManager, only one slave interface is connected. All other ports/bridges/interfaces need to be manually restored with 'nmcli conn up ...'

Version-Release number of selected component (if applicable):
NetworkManager-ovs-1.12.0-10.el7_6.x86_64
NetworkManager-1.12.0-10.el7_6.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Install openvswitch and the NetworkManager-ovs plugin
2. Configure bonds and a bridge as below:
nmcli conn add type ovs-bridge conn.interface brcnv
nmcli conn add type ovs-port conn.interface port0 master brcnv
nmcli conn add type ovs-port conn.interface brcnv-port master brcnv
nmcli conn add type ovs-interface conn.id brcnv-iface conn.interface brcnv master brcnv-port ipv4.method manual ipv4.address $ip ipv4.gateway $gw ipv4.dns "$dns"
nmcli conn add type ovs-port conn.interface bond0 master brcnv ovs-port.bond-mode balance-slb
nmcli conn add type ethernet conn.interface $default_device master bond0
nmcli conn add type ethernet conn.interface $secondary_device master bond0
nmcli conn down "System $default_device"
nmcli conn mod "System $default_device" connection.autoconnect no

3. systemctl restart NetworkManager

Actual results:
# nmcli conn
NAME                  UUID                                  TYPE           DEVICE     
ovs-slave-em3         9f8bf683-619d-40de-9174-d6d296733d80  ethernet       em3         <green>
<...snip / all others 'down' >

Expected results:
# nmcli conn
NAME                  UUID                                  TYPE           DEVICE     
brcnv-iface           cbfe1830-31ff-4a6b-ac52-86f5a1be60e7  ovs-interface  brcnv      
ovs-bridge-brcnv      6c65d0db-e7fe-42b0-8b3c-8be1e5dc508e  ovs-bridge     brcnv      
ovs-slave-bond0       f3ec130f-2163-4093-8479-085a0ac815e4  ovs-port       bond0      
ovs-slave-brcnv-port  1f6e5b0a-5a02-4e06-a85c-c1c55f27e183  ovs-port       brcnv-port 
ovs-slave-em1         b6badd81-f431-48cb-9916-5e1d73b5e1ad  ethernet       em1        
ovs-slave-em3         cb51ba17-8e74-411b-a489-903606da4320  ethernet       em3        
ovs-slave-port0       aaf0ad18-0ad6-4a66-9cbf-ed6e3c6284c2  ovs-port       port0      
< ... snip ... >

Additional info:
Can restore configuration by:
nmcli conn up ovs-bridge-brcnv
nmcli conn up brcnv-iface
nmcli conn up ovs-slave-em1

Comment 2 Vladimir Benes 2019-07-29 08:01:51 UTC
Let's try on recent 7.7. We've made some changes recently about restarting. I will update this bug in a while.

Comment 3 Vladimir Benes 2019-07-29 08:50:29 UTC
Created attachment 1594203 [details]
still present in RHEL7.7

nmcli conn add type ovs-bridge conn.interface brcnv
nmcli conn add type ovs-port conn.interface port0 master brcnv
nmcli conn add type ovs-port conn.interface brcnv-port master brcnv
nmcli conn add type ovs-interface conn.id brcnv-iface conn.interface brcnv master brcnv-port ipv4.method manual ipv4.address 192.168.100.154/24 ipv4.gateway  192.168.100.1 ipv4.dns 1.1.1.1
nmcli conn add type ovs-port conn.interface bond0 master brcnv ovs-port.bond-mode balance-slb
nmcli conn add type ethernet conn.interface eth1 master bond0

Comment 4 Beniamino Galvani 2019-07-29 14:26:08 UTC
Proposed fix at:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/215

Comment 10 Beniamino Galvani 2019-07-30 07:42:13 UTC
I see, the duplicate 'brcnv' interface name causes other issues on restart. I didn't realize before because I was using the script attached to the mail where the brcnv ovs interface is commented out. I'm debugging the problem.

Comment 11 Beniamino Galvani 2019-07-30 09:38:25 UTC
Merge request:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/216

Comment 13 David Critch 2019-07-30 13:29:49 UTC
Working like a champ now. Huge thanks for the fix!

Comment 18 Pavlin Georgiev 2019-09-25 08:53:32 UTC
Created attachment 1618946 [details]
Testing script for RHEL 7.6 / 7.7 / 7.8

Comment 19 Pavlin Georgiev 2019-09-25 09:00:40 UTC
TEST SETUP
KVM based local VM
Distro: RHEL 7.6
Component version:
  NetworkManager-1.12.0-6.el7
  NetworkManager-ovs-1.12.0-6.el7

  
PREPARATION
Add 1 Ethernet interfaces of type virtio the the testing VM.

Set kernel parameters to disable Consistent Network Device Naming:
# grubby --update-kernel=ALL --args="biosdevname=0 net.ifnames=0"
# yum install -y NetworkManager-ovs
# reboot

TEST PROCEDURE
1. Run the testing script.
2. Get results.


RESULT
# nmcli con
NAME                  UUID                                  TYPE           DEVIC
eth1                  6477ca5a-6986-43fe-9c09-6a361d7cf13a  ethernet       eth1 
ovs-bridge-brcnv      3fd02a35-4c20-4ad8-bde5-5c6c7139f283  ovs-bridge     brcnv
ovs-slave-bond0       eae35fd6-62e9-487c-aae6-66c6613a9303  ovs-port       bond0
ovs-slave-brcnv-port  49cd9a79-37ee-4a82-89a3-4e59d20aee4f  ovs-port       brcnv
ovs-slave-port0       0ed8ffff-ecb9-41b4-b90b-a9dbb4a65d7b  ovs-port       port0
virbr0                d8e0628f-86cb-45fd-a89e-1a39fadc6018  bridge         virbr
brcnv-iface           20a45655-e480-4710-9990-7a110a63748c  ovs-interface  --   
eth0                  afe238f8-6895-486f-a898-1da642a2c120  ethernet       --   
ovs-slave-eth0        d4e8d943-4867-48b8-931e-09b14e8314e9  ethernet       --   
ovs-slave-eth1        d5c7e40f-82df-49a0-a833-44e7a212459c  ethernet       --

Comment 20 Pavlin Georgiev 2019-09-25 09:13:55 UTC
TEST SETUP 2
Distro: RHEL-7.8-20190912.3 Server x86_64
Component version: 
  NetworkManager-1.18.0-5.el7_7.1
  NetworkManager-ovs-1.18.0-5.el7_7.1


PREPARATION
Make sure you have console connection to the testing machine.
The SSH connection will be interrupted.


TEST PROCEDURE 2
1. Create a Beaker job to reproduce the environment:
   https://beaker.engineering.redhat.com/jobs/3807638
2. Run the testing script.
3. Get results.


RESULT 2
[root@qe-dell-ovs5-vm-59 bin]# ./bug_reproducer_vm.sh 
IP=10.16.134.65/23
GW=10.16.135.254
DNS=10.19.43.29 10.11.5.19 10.5.30.160

Connection 'ovs-bridge-brcnv' (d02b9bfe-ad58-46e8-ad29-cc32e0f5b8e1) successfully added.
Connection 'ovs-slave-port0' (a607ddb1-54ee-4811-b163-63572739a220) successfully added.
Connection 'ovs-slave-brcnv-port' (dc8f1916-6b4b-4401-a16d-0017b6904f10) successfully added.
Connection 'brcnv-iface' (d5dd5975-c460-46c9-94a6-ae2290480fd9) successfully added.
Connection 'ovs-slave-bond0' (99f105a3-bc66-4eae-b378-bbc2289099f4) successfully added.
Connection 'ovs-slave-eth0' (4a7ba3f2-8b09-4378-a931-f3fd3ea33794) successfully added.
Connection 'ovs-slave-eth1' (90d6d470-9aec-41de-97a8-d7317790e633) successfully added.
Connection 'eth0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/1)

NAME                  UUID                                  TYPE           DEVIC
ovs-bridge-brcnv      d02b9bfe-ad58-46e8-ad29-cc32e0f5b8e1  ovs-bridge     brcnv
ovs-slave-bond0       99f105a3-bc66-4eae-b378-bbc2289099f4  ovs-port       bond0
ovs-slave-brcnv-port  dc8f1916-6b4b-4401-a16d-0017b6904f10  ovs-port       brcnv
ovs-slave-port0       a607ddb1-54ee-4811-b163-63572739a220  ovs-port       port0
brcnv-iface           d5dd5975-c460-46c9-94a6-ae2290480fd9  ovs-interface  --   
eth0                  b42dda1e-b09e-4d30-a784-c2d3c1f64af9  ethernet       --   
eth1                  78243c55-38db-4d81-82e8-c2fdc4babf47  ethernet       --   
eth10                 e4a8b7fe-5beb-475d-9919-442e42c8af59  ethernet       --   
eth2                  fde08446-2609-480c-9fe6-af8bf36c8ad7  ethernet       --   
eth3                  db30f0ca-be62-40c2-ad12-2a3996388d25  ethernet       --   
eth4                  ce3576b2-7a5a-46c2-8da9-94a5a8b1011b  ethernet       --   
eth5                  0d815d82-6b27-4e64-b005-8188c74b8d91  ethernet       --   
eth6                  15a17c08-57b7-45a8-96eb-4eec8b944c45  ethernet       --   
eth7                  1b59d383-1298-4005-b16f-c0b45d97f33c  ethernet       --   
eth8                  bddeab30-ce90-4659-ba8f-17de15a3a190  ethernet       --   
eth9                  c43d6c36-ea66-467a-8e18-cf4083c95004  ethernet       --   
ovs-slave-eth0        4a7ba3f2-8b09-4378-a931-f3fd3ea33794  ethernet       --   
ovs-slave-eth1        90d6d470-9aec-41de-97a8-d7317790e633  ethernet       --   

NetworkManager-1.18.0-5.el7_7.1.x86_64
NetworkManager-ovs-1.18.0-5.el7_7.1.x86_64

Comment 21 Beniamino Galvani 2019-09-25 14:00:10 UTC
Hi,

the script works for me with NM 1.18.0-6 which is the last version available in RHEL 7.8.

Can you please enable trace logging and attach NM logs? Thanks!

Comment 23 Pavlin Georgiev 2019-09-26 04:35:47 UTC
Created attachment 1619342 [details]
Output from the testing script

TEST SETUP 3
Distro: RHEL-7.8-20190912.3 Server x86_64
Component version: 
  NetworkManager-1.18.0-5.el7_7.1.x86_64
  NetworkManager-ovs-1.18.0-5.el7_7.1.x86_64


TEST PROCEDURE 3
1. Create a Beaker job to reproduce the environment:
   https://beaker.engineering.redhat.com/jobs/3809424

2. Install Brew build NetworkManager-1.18.0-6.el7:
   https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=969104

3. To troubleshoot NetworkManager, follow the steps from:
   https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 

   # sed -i "/RateLimitInterval/d" /etc/systemd/journald.conf
   # sed -i "/RateLimitBurst/d" /etc/systemd/journald.conf
   # echo -e "RateLimitInterval=0s\nRateLimitBurst=0" >> /etc/systemd/journald.conf
   # systemctl restart systemd-journald

4. Run the testing script.
5. Get messages from the system journal. 


RESULT 3
Expected result NOT achieved. Output is attached.

Comment 24 Pavlin Georgiev 2019-09-26 04:37:28 UTC
Created attachment 1619343 [details]
System journal from reproducing the bug under RHEL-7.8-20190912.3

Comment 25 Pavlin Georgiev 2019-09-26 04:38:54 UTC
Created attachment 1619344 [details]
NM trace log from reproducing the bug under RHEL-7.8-20190912.3

Comment 27 Pavlin Georgiev 2019-09-26 12:54:33 UTC
Created attachment 1619522 [details]
Testing script for RHEL 7.6 / 7.7 / 7.8

Update the testing script.

Comment 28 Pavlin Georgiev 2019-09-26 13:21:22 UTC
Created attachment 1619564 [details]
Testing script for RHEL 7.6 / 7.7 / 7.8

Comment 29 Pavlin Georgiev 2019-09-26 15:13:45 UTC
Created attachment 1619630 [details]
Output from the testing script

TEST SETUP 4
Distro: RHEL-7.8-20190912.3 Server x86_64
Component version: 
  NetworkManager-1.18.0-5.el7_7.1.x86_64
  NetworkManager-ovs-1.18.0-5.el7_7.1.x86_64


TEST PROCEDURE 4
1. Create a Beaker job to reproduce the environment:
   https://beaker.engineering.redhat.com/jobs/3810651

2. Install Brew build NetworkManager-1.18.0-6.el7:
   https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=969104

3. Install Brew builds containing packages of Open vSwitch suitable for RHEL 7.8:
   - openvswitch2.11-2.11.0-23.el7fdn
   - openvswitch-selinux-extra-policy-1.0-13.el7fdp
  
4. To troubleshoot NetworkManager, follow the steps from:
   https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 

   # sed -i "/RateLimitInterval/d" /etc/systemd/journald.conf
   # sed -i "/RateLimitBurst/d" /etc/systemd/journald.conf
   # echo -e "RateLimitInterval=0s\nRateLimitBurst=0" >> /etc/systemd/journald.conf
   # systemctl restart systemd-journald

5. Run the testing script.
6. Get messages from the system journal. 


RESULT 4
I do NOT get the expected result from that testing environment.
Output of the script is attached.

Comment 30 Pavlin Georgiev 2019-09-26 15:14:47 UTC
Created attachment 1619631 [details]
System journal from reproducing the bug under RHEL-7.8-20190912.3

Comment 31 Pavlin Georgiev 2019-09-26 15:16:27 UTC
Created attachment 1619633 [details]
NM trace log from reproducing the bug under RHEL-7.8-20190912.3

Comment 32 Beniamino Galvani 2019-09-27 07:51:41 UTC
From the log:

 <info>  [1569521361.6054] ovsdb: Could not connect: No such file or directory
 <debug> [1569521361.6054] ovsdb: disconnecting from ovsdb
 <warn>  [1569521361.6055] device brcnv-iface could not be added to a ovs port: Request cancelled

Is the openvswitch service running?

Comment 33 Pavlin Georgiev 2019-09-27 08:26:11 UTC
(In reply to Beniamino Galvani from comment #32)
> From the log:
> 
>  <info>  [1569521361.6054] ovsdb: Could not connect: No such file or
> directory
>  <debug> [1569521361.6054] ovsdb: disconnecting from ovsdb
>  <warn>  [1569521361.6055] device brcnv-iface could not be added to a ovs
> port: Request cancelled
> 
> Is the openvswitch service running?

[root@qe-dell-ovs5-vm-2 ~]# systemctl status openvswitch
● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Comment 34 Beniamino Galvani 2019-09-27 08:30:40 UTC
Please retest after starting the service with:

 systemctl start openvswitch

Comment 35 Pavlin Georgiev 2019-09-27 08:31:19 UTC
Created attachment 1619933 [details]
Output from the testing script

I enabled the openvswitch service.
I used the same testing setup and procedure as written in comment #29.

RESULT 5
The expected result has been achieved.
Output of the script is attached.

Comment 36 Pavlin Georgiev 2019-09-27 08:32:49 UTC
Created attachment 1619935 [details]
System journal from reproducing the bug under RHEL-7.8-20190912.3

Comment 37 Pavlin Georgiev 2019-09-27 08:33:20 UTC
Created attachment 1619937 [details]
NM trace log from reproducing the bug under RHEL-7.8-20190912.3

Comment 38 Pavlin Georgiev 2019-09-27 08:38:31 UTC
OUTCOME
1. Upgrading component: NetworkManager
    from: 1.12.0-10.el7_6
      to: 1.18.0-6.el7

2. Installing packages from Brew server:
  - openvswitch2.11-2.11.0-23.el7fdn
  - openvswitch-selinux-extra-policy-1.0-13.el7fdp

3. Starting service openvswitch
has fixed the bug.

Comment 40 errata-xmlrpc 2020-03-31 20:07:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1162