Bug 1806249

Summary: when trying to create ovs-bond cmd gets stuck
Product: Red Hat Enterprise Linux 8 Reporter: Ram Lavi <ralavi>
Component: nmstateAssignee: Fernando F. Mancera <ferferna>
Status: CLOSED ERRATA QA Contact: Mingyu Shi <mshi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.2CC: edwardh, ferferna, jiji, jishi, lmiksik, network-qe, till
Target Milestone: rc   
Target Release: 8.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nmstate-0.2.6-2.7.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:00:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tar-files none

Description Ram Lavi 2020-02-23 09:30:02 UTC
Created attachment 1665169 [details]
tar-files

Description of problem:
when trying to create ovs-bond cmd gets stuck. The cmd gets stuck after trying to set it twice (the first time was not successful).

Version-Release number of selected component (if applicable):
NetworkManager-1.22.8-1.el8.x86_64, nmstate-0.2.5-1.el8.noarch

How reproducible:

Steps to Reproduce:
1. start with vm centos8, yum install NetworkManager-1.22.8-1.el8.x86_64, nmstate-0.2.5-1.el8.noarch
2. add 2 nics eth1, eth2
3. create ovs-bond on the nics  (yaml attached). (you may need to do this more than once)

Actual results:
cmd gets stuck

Expected results:


Additional info:
[root@localhost ~]# rpm -q NetworkManager
NetworkManager-1.22.8-1.el8.x86_64
[root@localhost ~]# rpm -q nmstate
nmstate-0.2.5-1.el8.noarch

Comment 1 Gris Ge 2020-02-24 06:45:07 UTC
Hi Ram,

Can you provides the output of below command when got failure from nmstate?


for BR_NAME in `sudo ovs-vsctl list-br`; do
    sudo ovs-vsctl list-ports $BR_NAME
    sudo ovs-vsctl list-ifaces $BR_NAME
    sudo ovs-ofctl dump-ports $BR_NAMEfor BR_NAME in `sudo ovs-vsctl list-br`; do
    sudo ovs-vsctl list-ports $BR_NAME
    sudo ovs-vsctl list-ifaces $BR_NAME
    sudo ovs-ofctl dump-ports $BR_NAME
done
done

Comment 2 Ram Lavi 2020-02-24 07:59:00 UTC
Found the issue while running the script you requested.
ovs-vsctl wasn't started:
`root@localhost ~]# ovs-vsctl show
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)`

after starting it `sudo /usr/share/openvswitch/scripts/ovs-ctl start` - I retried to setting the ovs-bond and it worked:
`[root@localhost ~]# nmstatectl set ovsbridge_bond_create.yml 
2020-02-24 07:51:17,548 root         DEBUG    Checkpoint /org/freedesktop/NetworkManager/Checkpoint/6 created for all devices: 60
2020-02-24 07:51:17,549 root         DEBUG    Adding new interfaces: ['ovs-br0']
2020-02-24 07:51:17,550 root         DEBUG    Editing interfaces: ['eth1', 'eth2']
2020-02-24 07:51:17,551 root         WARNING  IPv6 link local address fe80::b0a8:13fb:ee4d:846b/64 is ignored when applying desired state
2020-02-24 07:51:17,551 root         WARNING  IPv6 link local address fe80::537d:6595:2caa:cab2/64 is ignored when applying desired state
2020-02-24 07:51:17,553 root         DEBUG    Executing NM action: func=add_connection_async
2020-02-24 07:51:17,563 root         DEBUG    Connection adding succeeded: dev=ovs-br0
2020-02-24 07:51:17,563 root         DEBUG    Executing NM action: func=commit_changes_async
2020-02-24 07:51:17,575 root         DEBUG    Connection update succeeded: dev=eth1
2020-02-24 07:51:17,575 root         DEBUG    Executing NM action: func=commit_changes_async
2020-02-24 07:51:17,579 root         DEBUG    Connection update succeeded: dev=eth2
2020-02-24 07:51:17,579 root         DEBUG    Executing NM action: func=add_connection_async
2020-02-24 07:51:17,599 root         DEBUG    Connection adding succeeded: dev=ovs-bond1
2020-02-24 07:51:17,599 root         DEBUG    Executing NM action: func=safe_activate_async
2020-02-24 07:51:17,639 root         DEBUG    Connection activation initiated: dev=ovs-br0, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>
2020-02-24 07:51:17,654 root         DEBUG    Connection activation succeeded: dev=ovs-br0, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags>
2020-02-24 07:51:17,655 root         DEBUG    Executing NM action: func=safe_activate_async
2020-02-24 07:51:17,682 root         DEBUG    Connection activation initiated: dev=ovs-bond1, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>
2020-02-24 07:51:17,722 root         DEBUG    Connection activation succeeded: dev=ovs-bond1, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATED of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_ACTIVATED of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_IS_SLAVE | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags>
2020-02-24 07:51:17,722 root         DEBUG    Executing NM action: func=_safe_modify_async
2020-02-24 07:51:17,731 root         DEBUG    Device reapply succeeded: dev=eth2
2020-02-24 07:51:17,731 root         DEBUG    Executing NM action: func=_safe_modify_async
2020-02-24 07:51:17,735 root         DEBUG    Device reapply succeeded: dev=eth1
2020-02-24 07:51:18,236 root         DEBUG    NM action queue exhausted, quiting mainloop
2020-02-24 07:51:18,284 root         DEBUG    Checkpoint /org/freedesktop/NetworkManager/Checkpoint/6 destroyed
Desired state applied: 
---
interfaces:
- name: ovs-br0
  type: ovs-bridge
  state: up
  bridge:
    options:
      stp: false
    port:
    - link-aggregation:
        mode: balance-slb
        slaves:
        - name: eth1
        - name: eth2
      name: ovs-bond1
`

Now your script shows this:
ovs-bond1
eth1
eth2
OFPST_PORT reply (xid=0x2): 2 ports
  port  eth1: rx pkts=1724, bytes=112964, drop=3016, errs=0, frame=0, over=0, crc=0
           tx pkts=68, bytes=6758, drop=0, errs=0, coll=0
  port  eth2: rx pkts=1726, bytes=113184, drop=3016, errs=0, frame=0, over=0, crc=0
           tx pkts=66, bytes=6538, drop=0, errs=0, coll=0

Comment 3 Ram Lavi 2020-02-24 07:59:41 UTC
(In reply to Gris Ge from comment #1)
> Hi Ram,
> 
> Can you provides the output of below command when got failure from nmstate?
> 
> 
> for BR_NAME in `sudo ovs-vsctl list-br`; do
>     sudo ovs-vsctl list-ports $BR_NAME
>     sudo ovs-vsctl list-ifaces $BR_NAME
>     sudo ovs-ofctl dump-ports $BR_NAMEfor BR_NAME in `sudo ovs-vsctl
> list-br`; do
>     sudo ovs-vsctl list-ports $BR_NAME
>     sudo ovs-vsctl list-ifaces $BR_NAME
>     sudo ovs-ofctl dump-ports $BR_NAME
> done
> done

script seems duplicate

Comment 4 Ram Lavi 2020-02-24 09:15:44 UTC
(In reply to Ram Lavi from comment #2)
> Found the issue while running the script you requested.
> ovs-vsctl wasn't started:

openvswitch service  I mean

> `root@localhost ~]# ovs-vsctl show
> ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No
> such file or directory)`
> 
> after starting it `sudo /usr/share/openvswitch/scripts/ovs-ctl start` - I

To be more accurate the cmd that fixed the issue is: systemctl start openvswitch 

> retried to setting the ovs-bond and it worked:
> `[root@localhost ~]# nmstatectl set ovsbridge_bond_create.yml 
> 2020-02-24 07:51:17,548 root         DEBUG    Checkpoint
> /org/freedesktop/NetworkManager/Checkpoint/6 created for all devices: 60
> 2020-02-24 07:51:17,549 root         DEBUG    Adding new interfaces:
> ['ovs-br0']
> 2020-02-24 07:51:17,550 root         DEBUG    Editing interfaces: ['eth1',
> 'eth2']
> 2020-02-24 07:51:17,551 root         WARNING  IPv6 link local address
> fe80::b0a8:13fb:ee4d:846b/64 is ignored when applying desired state
> 2020-02-24 07:51:17,551 root         WARNING  IPv6 link local address
> fe80::537d:6595:2caa:cab2/64 is ignored when applying desired state
> 2020-02-24 07:51:17,553 root         DEBUG    Executing NM action:
> func=add_connection_async
> 2020-02-24 07:51:17,563 root         DEBUG    Connection adding succeeded:
> dev=ovs-br0
> 2020-02-24 07:51:17,563 root         DEBUG    Executing NM action:
> func=commit_changes_async
> 2020-02-24 07:51:17,575 root         DEBUG    Connection update succeeded:
> dev=eth1
> 2020-02-24 07:51:17,575 root         DEBUG    Executing NM action:
> func=commit_changes_async
> 2020-02-24 07:51:17,579 root         DEBUG    Connection update succeeded:
> dev=eth2
> 2020-02-24 07:51:17,579 root         DEBUG    Executing NM action:
> func=add_connection_async
> 2020-02-24 07:51:17,599 root         DEBUG    Connection adding succeeded:
> dev=ovs-bond1
> 2020-02-24 07:51:17,599 root         DEBUG    Executing NM action:
> func=safe_activate_async
> 2020-02-24 07:51:17,639 root         DEBUG    Connection activation
> initiated: dev=ovs-br0, con-state=<enum
> NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>
> 2020-02-24 07:51:17,654 root         DEBUG    Connection activation
> succeeded: dev=ovs-br0, con-state=<enum
> NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>,
> dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>,
> state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER |
> NM_ACTIVATION_STATE_FLAG_LAYER2_READY |
> NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags>
> 2020-02-24 07:51:17,655 root         DEBUG    Executing NM action:
> func=safe_activate_async
> 2020-02-24 07:51:17,682 root         DEBUG    Connection activation
> initiated: dev=ovs-bond1, con-state=<enum
> NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>
> 2020-02-24 07:51:17,722 root         DEBUG    Connection activation
> succeeded: dev=ovs-bond1, con-state=<enum
> NM_ACTIVE_CONNECTION_STATE_ACTIVATED of type NM.ActiveConnectionState>,
> dev-state=<enum NM_DEVICE_STATE_ACTIVATED of type NM.DeviceState>,
> state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER |
> NM_ACTIVATION_STATE_FLAG_IS_SLAVE | NM_ACTIVATION_STATE_FLAG_LAYER2_READY |
> NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags>
> 2020-02-24 07:51:17,722 root         DEBUG    Executing NM action:
> func=_safe_modify_async
> 2020-02-24 07:51:17,731 root         DEBUG    Device reapply succeeded:
> dev=eth2
> 2020-02-24 07:51:17,731 root         DEBUG    Executing NM action:
> func=_safe_modify_async
> 2020-02-24 07:51:17,735 root         DEBUG    Device reapply succeeded:
> dev=eth1
> 2020-02-24 07:51:18,236 root         DEBUG    NM action queue exhausted,
> quiting mainloop
> 2020-02-24 07:51:18,284 root         DEBUG    Checkpoint
> /org/freedesktop/NetworkManager/Checkpoint/6 destroyed
> Desired state applied: 
> ---
> interfaces:
> - name: ovs-br0
>   type: ovs-bridge
>   state: up
>   bridge:
>     options:
>       stp: false
>     port:
>     - link-aggregation:
>         mode: balance-slb
>         slaves:
>         - name: eth1
>         - name: eth2
>       name: ovs-bond1
> `
> 
> Now your script shows this:
> ovs-bond1
> eth1
> eth2
> OFPST_PORT reply (xid=0x2): 2 ports
>   port  eth1: rx pkts=1724, bytes=112964, drop=3016, errs=0, frame=0,
> over=0, crc=0
>            tx pkts=68, bytes=6758, drop=0, errs=0, coll=0
>   port  eth2: rx pkts=1726, bytes=113184, drop=3016, errs=0, frame=0,
> over=0, crc=0
>            tx pkts=66, bytes=6538, drop=0, errs=0, coll=0

Comment 5 Till Maas 2020-02-24 10:48:59 UTC
The need to run openvswitch is mentioned in the installation documentation:

https://github.com/nmstate/nmstate/blob/master/README.install.md#post-package-installation

Do we need to add some extra information about this in a README file for the RPM package?

Comment 6 Till Maas 2020-02-24 21:41:50 UTC
*** Bug 1806251 has been marked as a duplicate of this bug. ***

Comment 7 Edward Haas 2020-02-25 02:32:14 UTC
(In reply to Till Maas from comment #5)
> The need to run openvswitch is mentioned in the installation documentation:
> 
> https://github.com/nmstate/nmstate/blob/master/README.install.md#post-
> package-installation
> 
> Do we need to add some extra information about this in a README file for the RPM package?

I think we should focus on the core issues and not the documentation part.
- If OVS service is down, unexpected and non informative errors are reported to the user. This is a maintenance/support heavy burden.
- This specific issue was seen with a total freeze. nmstatectl has not returned and got stuck.

I also do not think the referenced duplicate is correct.
The fact that both are resolved by starting the OVS service does not mean it is the same resolution/fix.
This BZ should track why the transaction got stuck.

Comment 8 Gris Ge 2020-02-25 08:32:45 UTC
Hi Edward and Ram,

If you want to focus on fix the infinity hang in this, 
please provide reproduce steps or logs when nmstate stuck forever.
I cannot reproduce this infinity hang using the VM Ram provided.

Without reproduce steps or logs, I am afraid I have to close this bug as insufficient data.

Comment 9 Ram Lavi 2020-02-25 08:40:35 UTC
If you want to reproduce the bug the same way I did it ( from scratch) then here are the steps you should take:

1. install virt-manager: sudo dnf install virt-manager
2. download centos8 cloud version: CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 (https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2) 
3. uninstall cloud init from the image (causes the installation to get stuck) ans set passwork by this command: virt-customize --root-password password:changeme \
    --uninstall cloud-init \
    --selinux-relabel \
    -a rhel-guest-image-8.0-1.x86_64.qcow2 (more info in link: https://access.redhat.com/solutions/3798671)
5. install the vm via the qcow in virt-manager gui
6. ALL the following cmds should be performed in the vm cli:
7. make sure external network connection: ping google.com
8. enable ssh to the vm: edit /etc/ssh/sshd_config and uncomment PasswordAuthentication yes
9. restart ssh service: service sshd restart.  (you can now proceed via ssh using root user and password as set in article 3)
10. enable copr repositories: 
yum copr enable nmstate/nmstate-0.2
yum install nmstate
dnf copr enable nmstate/ovs-el8 -y
dnf copr enable networkmanager/NetworkManager-1.22 -y
11. now dnf/yum install them:
yum install NetworkManager-ovs.x86_64
yum install nmstate
yum install openvswitch2.11.x86_64
dnf update NetworkManager-1.22
yum install NetworkManager-ovs.x86_64
dnf update
12. reboot vm
13. shutdown vm and add 2 nics to the vm from virt-manager gui: eth1, eth2.
14. start the vm and see you have the 2 nic in ip a
15. add the nics to nmstate so that it could manage them: nmtui (and then add them manually using this link: https://lintut.com/how-to-setup-network-after-rhelcentos-7-minimal-installation/)
16. make sure the nics are managed using: nmcli con
17. [at this point you would start the openvswitch service but if you want to recreate the bug then don't..)
18. download the ovs-bridge yaml: wget https://raw.githubusercontent.com/nmstate/nmstate/master/examples/ovsbridge_bond_create.yml
19.  try to add the ovs bridge using nmstatectl: nmstatectl set ovsbridge_bond_create.yml
- first time would give an error, but if you try 2-3 more times then the command will get stuck with all the errors & logs already attached to this BZ.

Comment 10 Till Maas 2020-02-25 08:41:35 UTC
(In reply to Gris Ge from comment #8)
> Hi Edward and Ram,
> 
> If you want to focus on fix the infinity hang in this, 
> please provide reproduce steps or logs when nmstate stuck forever.
> I cannot reproduce this infinity hang using the VM Ram provided.
> 
> Without reproduce steps or logs, I am afraid I have to close this bug as
> insufficient data.

The logfile shows a crash and not a hang:

2020-02-23 08:51:50,591 root         DEBUG    Connection activation initiated: dev=eth1, con-state=<enum NM_ACTIVE_CONNECTION_STATE_UNKNOWN of type NM.ActiveConnectionState>
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/connection.py", line 226, in _active_connection_callback
    ac.devname, ac.reason
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/active_connection.py", line 171, in devname
    return self._nmdev.get_iface()
AttributeError: 'NoneType' object has no attribute 'get_iface'

This is a problem in logging that the activation failed. I have a simple patch ready for this: https://github.com/nmstate/nmstate/pull/852

Comment 11 Ram Lavi 2020-02-25 08:42:17 UTC
If you want to reproduce the bug the same way I did it ( from scratch) then here are the steps you should take:

1. install virt-manager: sudo dnf install virt-manager
2. download centos8 cloud version: CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 (https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2) 
3. uninstall cloud init from the image (causes the installation to get stuck) ans set passwork by this command: virt-customize --root-password password:changeme \
    --uninstall cloud-init \
    --selinux-relabel \
    -a rhel-guest-image-8.0-1.x86_64.qcow2 (more info in link: https://access.redhat.com/solutions/3798671)
5. install the vm via the qcow in virt-manager gui
6. ALL the following cmds should be performed in the vm cli:
7. make sure external network connection: ping google.com
8. enable ssh to the vm: edit /etc/ssh/sshd_config and uncomment PasswordAuthentication yes
9. restart ssh service: service sshd restart.  (you can now proceed via ssh using root user and password as set in article 3)
10. enable copr repositories: 
yum copr enable nmstate/nmstate-0.2
yum install nmstate
dnf copr enable nmstate/ovs-el8 -y
dnf copr enable networkmanager/NetworkManager-1.22 -y
11. now dnf/yum install them:
yum install NetworkManager-ovs.x86_64
yum install nmstate
yum install openvswitch2.11.x86_64
dnf update NetworkManager-1.22
yum install NetworkManager-ovs.x86_64
dnf update
12. reboot vm
13. shutdown vm and add 2 nics to the vm from virt-manager gui: eth1, eth2.
14. start the vm and see you have the 2 nic in ip a
15. add the nics to nmstate so that it could manage them: nmtui (and then add them manually using this link: https://lintut.com/how-to-setup-network-after-rhelcentos-7-minimal-installation/)
16. make sure the nics are managed using: nmcli con
17. [at this point you would start the openvswitch service but if you want to recreate the bug then don't..)
18. download the ovs-bridge yaml: wget https://raw.githubusercontent.com/nmstate/nmstate/master/examples/ovsbridge_bond_create.yml
19.  try to add the ovs bridge using nmstatectl: nmstatectl set ovsbridge_bond_create.yml
- first time would give an error, but if you try 2-3 more times then the command will get stuck with all the errors & logs already attached to this BZ.

Comment 12 Till Maas 2020-02-25 08:43:34 UTC
Another thought: Since the crash happens when Nmstate tries to quit the mainloop, it might also result in a hang I guess.

Comment 13 Till Maas 2020-02-25 08:43:48 UTC
Another thought: Since the crash happens when Nmstate tries to quit the mainloop, it might also result in a hang I guess.

Comment 14 Till Maas 2020-02-25 08:48:58 UTC
@Ram: Could you please test the patch from https://github.com/nmstate/nmstate/pull/852 to check if it fixes your issue?

Comment 15 Ram Lavi 2020-02-25 14:16:38 UTC
(In reply to Till Maas from comment #14)
> @Ram: Could you please test the patch from
> https://github.com/nmstate/nmstate/pull/852 to check if it fixes your issue?

I'd be happy to check when you have an approved patch

Comment 16 Gris Ge 2020-02-27 13:20:34 UTC
(In reply to Ram Lavi from comment #15)
> (In reply to Till Maas from comment #14)
> > @Ram: Could you please test the patch from
> > https://github.com/nmstate/nmstate/pull/852 to check if it fixes your issue?
> 
> I'd be happy to check when you have an approved patch

Can you try these command to install the patched rpm?

```
sudo dnf install dnf-plugins-core -y
sudo dnf copr enable packit/nmstate-nmstate-852 -y
sudo dnf install nmstate
```

Comment 17 Till Maas 2020-02-27 17:52:30 UTC
The patch has been verified and merged upstream.

Comment 28 errata-xmlrpc 2020-04-28 16:00:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1696