Bug 1806249
Summary: | when trying to create ovs-bond cmd gets stuck | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Ram Lavi <ralavi> | ||||
Component: | nmstate | Assignee: | Fernando F. Mancera <ferferna> | ||||
Status: | CLOSED ERRATA | QA Contact: | Mingyu Shi <mshi> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.2 | CC: | edwardh, ferferna, jiji, jishi, lmiksik, network-qe, till | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.2 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | nmstate-0.2.6-2.7.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-04-28 16:00:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Hi Ram, Can you provides the output of below command when got failure from nmstate? for BR_NAME in `sudo ovs-vsctl list-br`; do sudo ovs-vsctl list-ports $BR_NAME sudo ovs-vsctl list-ifaces $BR_NAME sudo ovs-ofctl dump-ports $BR_NAMEfor BR_NAME in `sudo ovs-vsctl list-br`; do sudo ovs-vsctl list-ports $BR_NAME sudo ovs-vsctl list-ifaces $BR_NAME sudo ovs-ofctl dump-ports $BR_NAME done done Found the issue while running the script you requested. ovs-vsctl wasn't started: `root@localhost ~]# ovs-vsctl show ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)` after starting it `sudo /usr/share/openvswitch/scripts/ovs-ctl start` - I retried to setting the ovs-bond and it worked: `[root@localhost ~]# nmstatectl set ovsbridge_bond_create.yml 2020-02-24 07:51:17,548 root DEBUG Checkpoint /org/freedesktop/NetworkManager/Checkpoint/6 created for all devices: 60 2020-02-24 07:51:17,549 root DEBUG Adding new interfaces: ['ovs-br0'] 2020-02-24 07:51:17,550 root DEBUG Editing interfaces: ['eth1', 'eth2'] 2020-02-24 07:51:17,551 root WARNING IPv6 link local address fe80::b0a8:13fb:ee4d:846b/64 is ignored when applying desired state 2020-02-24 07:51:17,551 root WARNING IPv6 link local address fe80::537d:6595:2caa:cab2/64 is ignored when applying desired state 2020-02-24 07:51:17,553 root DEBUG Executing NM action: func=add_connection_async 2020-02-24 07:51:17,563 root DEBUG Connection adding succeeded: dev=ovs-br0 2020-02-24 07:51:17,563 root DEBUG Executing NM action: func=commit_changes_async 2020-02-24 07:51:17,575 root DEBUG Connection update succeeded: dev=eth1 2020-02-24 07:51:17,575 root DEBUG Executing NM action: func=commit_changes_async 2020-02-24 07:51:17,579 root DEBUG Connection update succeeded: dev=eth2 2020-02-24 07:51:17,579 root DEBUG Executing NM action: func=add_connection_async 2020-02-24 07:51:17,599 root DEBUG Connection adding succeeded: dev=ovs-bond1 2020-02-24 07:51:17,599 root DEBUG Executing NM action: func=safe_activate_async 2020-02-24 07:51:17,639 root DEBUG Connection activation initiated: dev=ovs-br0, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> 2020-02-24 07:51:17,654 root DEBUG Connection activation succeeded: dev=ovs-br0, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags> 2020-02-24 07:51:17,655 root DEBUG Executing NM action: func=safe_activate_async 2020-02-24 07:51:17,682 root DEBUG Connection activation initiated: dev=ovs-bond1, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> 2020-02-24 07:51:17,722 root DEBUG Connection activation succeeded: dev=ovs-bond1, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATED of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_ACTIVATED of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_IS_SLAVE | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags> 2020-02-24 07:51:17,722 root DEBUG Executing NM action: func=_safe_modify_async 2020-02-24 07:51:17,731 root DEBUG Device reapply succeeded: dev=eth2 2020-02-24 07:51:17,731 root DEBUG Executing NM action: func=_safe_modify_async 2020-02-24 07:51:17,735 root DEBUG Device reapply succeeded: dev=eth1 2020-02-24 07:51:18,236 root DEBUG NM action queue exhausted, quiting mainloop 2020-02-24 07:51:18,284 root DEBUG Checkpoint /org/freedesktop/NetworkManager/Checkpoint/6 destroyed Desired state applied: --- interfaces: - name: ovs-br0 type: ovs-bridge state: up bridge: options: stp: false port: - link-aggregation: mode: balance-slb slaves: - name: eth1 - name: eth2 name: ovs-bond1 ` Now your script shows this: ovs-bond1 eth1 eth2 OFPST_PORT reply (xid=0x2): 2 ports port eth1: rx pkts=1724, bytes=112964, drop=3016, errs=0, frame=0, over=0, crc=0 tx pkts=68, bytes=6758, drop=0, errs=0, coll=0 port eth2: rx pkts=1726, bytes=113184, drop=3016, errs=0, frame=0, over=0, crc=0 tx pkts=66, bytes=6538, drop=0, errs=0, coll=0 (In reply to Gris Ge from comment #1) > Hi Ram, > > Can you provides the output of below command when got failure from nmstate? > > > for BR_NAME in `sudo ovs-vsctl list-br`; do > sudo ovs-vsctl list-ports $BR_NAME > sudo ovs-vsctl list-ifaces $BR_NAME > sudo ovs-ofctl dump-ports $BR_NAMEfor BR_NAME in `sudo ovs-vsctl > list-br`; do > sudo ovs-vsctl list-ports $BR_NAME > sudo ovs-vsctl list-ifaces $BR_NAME > sudo ovs-ofctl dump-ports $BR_NAME > done > done script seems duplicate (In reply to Ram Lavi from comment #2) > Found the issue while running the script you requested. > ovs-vsctl wasn't started: openvswitch service I mean > `root@localhost ~]# ovs-vsctl show > ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No > such file or directory)` > > after starting it `sudo /usr/share/openvswitch/scripts/ovs-ctl start` - I To be more accurate the cmd that fixed the issue is: systemctl start openvswitch > retried to setting the ovs-bond and it worked: > `[root@localhost ~]# nmstatectl set ovsbridge_bond_create.yml > 2020-02-24 07:51:17,548 root DEBUG Checkpoint > /org/freedesktop/NetworkManager/Checkpoint/6 created for all devices: 60 > 2020-02-24 07:51:17,549 root DEBUG Adding new interfaces: > ['ovs-br0'] > 2020-02-24 07:51:17,550 root DEBUG Editing interfaces: ['eth1', > 'eth2'] > 2020-02-24 07:51:17,551 root WARNING IPv6 link local address > fe80::b0a8:13fb:ee4d:846b/64 is ignored when applying desired state > 2020-02-24 07:51:17,551 root WARNING IPv6 link local address > fe80::537d:6595:2caa:cab2/64 is ignored when applying desired state > 2020-02-24 07:51:17,553 root DEBUG Executing NM action: > func=add_connection_async > 2020-02-24 07:51:17,563 root DEBUG Connection adding succeeded: > dev=ovs-br0 > 2020-02-24 07:51:17,563 root DEBUG Executing NM action: > func=commit_changes_async > 2020-02-24 07:51:17,575 root DEBUG Connection update succeeded: > dev=eth1 > 2020-02-24 07:51:17,575 root DEBUG Executing NM action: > func=commit_changes_async > 2020-02-24 07:51:17,579 root DEBUG Connection update succeeded: > dev=eth2 > 2020-02-24 07:51:17,579 root DEBUG Executing NM action: > func=add_connection_async > 2020-02-24 07:51:17,599 root DEBUG Connection adding succeeded: > dev=ovs-bond1 > 2020-02-24 07:51:17,599 root DEBUG Executing NM action: > func=safe_activate_async > 2020-02-24 07:51:17,639 root DEBUG Connection activation > initiated: dev=ovs-br0, con-state=<enum > NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> > 2020-02-24 07:51:17,654 root DEBUG Connection activation > succeeded: dev=ovs-br0, con-state=<enum > NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, > dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, > state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | > NM_ACTIVATION_STATE_FLAG_LAYER2_READY | > NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags> > 2020-02-24 07:51:17,655 root DEBUG Executing NM action: > func=safe_activate_async > 2020-02-24 07:51:17,682 root DEBUG Connection activation > initiated: dev=ovs-bond1, con-state=<enum > NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> > 2020-02-24 07:51:17,722 root DEBUG Connection activation > succeeded: dev=ovs-bond1, con-state=<enum > NM_ACTIVE_CONNECTION_STATE_ACTIVATED of type NM.ActiveConnectionState>, > dev-state=<enum NM_DEVICE_STATE_ACTIVATED of type NM.DeviceState>, > state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | > NM_ACTIVATION_STATE_FLAG_IS_SLAVE | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | > NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags> > 2020-02-24 07:51:17,722 root DEBUG Executing NM action: > func=_safe_modify_async > 2020-02-24 07:51:17,731 root DEBUG Device reapply succeeded: > dev=eth2 > 2020-02-24 07:51:17,731 root DEBUG Executing NM action: > func=_safe_modify_async > 2020-02-24 07:51:17,735 root DEBUG Device reapply succeeded: > dev=eth1 > 2020-02-24 07:51:18,236 root DEBUG NM action queue exhausted, > quiting mainloop > 2020-02-24 07:51:18,284 root DEBUG Checkpoint > /org/freedesktop/NetworkManager/Checkpoint/6 destroyed > Desired state applied: > --- > interfaces: > - name: ovs-br0 > type: ovs-bridge > state: up > bridge: > options: > stp: false > port: > - link-aggregation: > mode: balance-slb > slaves: > - name: eth1 > - name: eth2 > name: ovs-bond1 > ` > > Now your script shows this: > ovs-bond1 > eth1 > eth2 > OFPST_PORT reply (xid=0x2): 2 ports > port eth1: rx pkts=1724, bytes=112964, drop=3016, errs=0, frame=0, > over=0, crc=0 > tx pkts=68, bytes=6758, drop=0, errs=0, coll=0 > port eth2: rx pkts=1726, bytes=113184, drop=3016, errs=0, frame=0, > over=0, crc=0 > tx pkts=66, bytes=6538, drop=0, errs=0, coll=0 The need to run openvswitch is mentioned in the installation documentation: https://github.com/nmstate/nmstate/blob/master/README.install.md#post-package-installation Do we need to add some extra information about this in a README file for the RPM package? *** Bug 1806251 has been marked as a duplicate of this bug. *** (In reply to Till Maas from comment #5) > The need to run openvswitch is mentioned in the installation documentation: > > https://github.com/nmstate/nmstate/blob/master/README.install.md#post- > package-installation > > Do we need to add some extra information about this in a README file for the RPM package? I think we should focus on the core issues and not the documentation part. - If OVS service is down, unexpected and non informative errors are reported to the user. This is a maintenance/support heavy burden. - This specific issue was seen with a total freeze. nmstatectl has not returned and got stuck. I also do not think the referenced duplicate is correct. The fact that both are resolved by starting the OVS service does not mean it is the same resolution/fix. This BZ should track why the transaction got stuck. Hi Edward and Ram, If you want to focus on fix the infinity hang in this, please provide reproduce steps or logs when nmstate stuck forever. I cannot reproduce this infinity hang using the VM Ram provided. Without reproduce steps or logs, I am afraid I have to close this bug as insufficient data. If you want to reproduce the bug the same way I did it ( from scratch) then here are the steps you should take: 1. install virt-manager: sudo dnf install virt-manager 2. download centos8 cloud version: CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 (https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2) 3. uninstall cloud init from the image (causes the installation to get stuck) ans set passwork by this command: virt-customize --root-password password:changeme \ --uninstall cloud-init \ --selinux-relabel \ -a rhel-guest-image-8.0-1.x86_64.qcow2 (more info in link: https://access.redhat.com/solutions/3798671) 5. install the vm via the qcow in virt-manager gui 6. ALL the following cmds should be performed in the vm cli: 7. make sure external network connection: ping google.com 8. enable ssh to the vm: edit /etc/ssh/sshd_config and uncomment PasswordAuthentication yes 9. restart ssh service: service sshd restart. (you can now proceed via ssh using root user and password as set in article 3) 10. enable copr repositories: yum copr enable nmstate/nmstate-0.2 yum install nmstate dnf copr enable nmstate/ovs-el8 -y dnf copr enable networkmanager/NetworkManager-1.22 -y 11. now dnf/yum install them: yum install NetworkManager-ovs.x86_64 yum install nmstate yum install openvswitch2.11.x86_64 dnf update NetworkManager-1.22 yum install NetworkManager-ovs.x86_64 dnf update 12. reboot vm 13. shutdown vm and add 2 nics to the vm from virt-manager gui: eth1, eth2. 14. start the vm and see you have the 2 nic in ip a 15. add the nics to nmstate so that it could manage them: nmtui (and then add them manually using this link: https://lintut.com/how-to-setup-network-after-rhelcentos-7-minimal-installation/) 16. make sure the nics are managed using: nmcli con 17. [at this point you would start the openvswitch service but if you want to recreate the bug then don't..) 18. download the ovs-bridge yaml: wget https://raw.githubusercontent.com/nmstate/nmstate/master/examples/ovsbridge_bond_create.yml 19. try to add the ovs bridge using nmstatectl: nmstatectl set ovsbridge_bond_create.yml - first time would give an error, but if you try 2-3 more times then the command will get stuck with all the errors & logs already attached to this BZ. (In reply to Gris Ge from comment #8) > Hi Edward and Ram, > > If you want to focus on fix the infinity hang in this, > please provide reproduce steps or logs when nmstate stuck forever. > I cannot reproduce this infinity hang using the VM Ram provided. > > Without reproduce steps or logs, I am afraid I have to close this bug as > insufficient data. The logfile shows a crash and not a hang: 2020-02-23 08:51:50,591 root DEBUG Connection activation initiated: dev=eth1, con-state=<enum NM_ACTIVE_CONNECTION_STATE_UNKNOWN of type NM.ActiveConnectionState> Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/libnmstate/nm/connection.py", line 226, in _active_connection_callback ac.devname, ac.reason File "/usr/lib/python3.6/site-packages/libnmstate/nm/active_connection.py", line 171, in devname return self._nmdev.get_iface() AttributeError: 'NoneType' object has no attribute 'get_iface' This is a problem in logging that the activation failed. I have a simple patch ready for this: https://github.com/nmstate/nmstate/pull/852 If you want to reproduce the bug the same way I did it ( from scratch) then here are the steps you should take: 1. install virt-manager: sudo dnf install virt-manager 2. download centos8 cloud version: CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 (https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2) 3. uninstall cloud init from the image (causes the installation to get stuck) ans set passwork by this command: virt-customize --root-password password:changeme \ --uninstall cloud-init \ --selinux-relabel \ -a rhel-guest-image-8.0-1.x86_64.qcow2 (more info in link: https://access.redhat.com/solutions/3798671) 5. install the vm via the qcow in virt-manager gui 6. ALL the following cmds should be performed in the vm cli: 7. make sure external network connection: ping google.com 8. enable ssh to the vm: edit /etc/ssh/sshd_config and uncomment PasswordAuthentication yes 9. restart ssh service: service sshd restart. (you can now proceed via ssh using root user and password as set in article 3) 10. enable copr repositories: yum copr enable nmstate/nmstate-0.2 yum install nmstate dnf copr enable nmstate/ovs-el8 -y dnf copr enable networkmanager/NetworkManager-1.22 -y 11. now dnf/yum install them: yum install NetworkManager-ovs.x86_64 yum install nmstate yum install openvswitch2.11.x86_64 dnf update NetworkManager-1.22 yum install NetworkManager-ovs.x86_64 dnf update 12. reboot vm 13. shutdown vm and add 2 nics to the vm from virt-manager gui: eth1, eth2. 14. start the vm and see you have the 2 nic in ip a 15. add the nics to nmstate so that it could manage them: nmtui (and then add them manually using this link: https://lintut.com/how-to-setup-network-after-rhelcentos-7-minimal-installation/) 16. make sure the nics are managed using: nmcli con 17. [at this point you would start the openvswitch service but if you want to recreate the bug then don't..) 18. download the ovs-bridge yaml: wget https://raw.githubusercontent.com/nmstate/nmstate/master/examples/ovsbridge_bond_create.yml 19. try to add the ovs bridge using nmstatectl: nmstatectl set ovsbridge_bond_create.yml - first time would give an error, but if you try 2-3 more times then the command will get stuck with all the errors & logs already attached to this BZ. Another thought: Since the crash happens when Nmstate tries to quit the mainloop, it might also result in a hang I guess. Another thought: Since the crash happens when Nmstate tries to quit the mainloop, it might also result in a hang I guess. @Ram: Could you please test the patch from https://github.com/nmstate/nmstate/pull/852 to check if it fixes your issue? (In reply to Till Maas from comment #14) > @Ram: Could you please test the patch from > https://github.com/nmstate/nmstate/pull/852 to check if it fixes your issue? I'd be happy to check when you have an approved patch (In reply to Ram Lavi from comment #15) > (In reply to Till Maas from comment #14) > > @Ram: Could you please test the patch from > > https://github.com/nmstate/nmstate/pull/852 to check if it fixes your issue? > > I'd be happy to check when you have an approved patch Can you try these command to install the patched rpm? ``` sudo dnf install dnf-plugins-core -y sudo dnf copr enable packit/nmstate-nmstate-852 -y sudo dnf install nmstate ``` The patch has been verified and merged upstream. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1696 |
Created attachment 1665169 [details] tar-files Description of problem: when trying to create ovs-bond cmd gets stuck. The cmd gets stuck after trying to set it twice (the first time was not successful). Version-Release number of selected component (if applicable): NetworkManager-1.22.8-1.el8.x86_64, nmstate-0.2.5-1.el8.noarch How reproducible: Steps to Reproduce: 1. start with vm centos8, yum install NetworkManager-1.22.8-1.el8.x86_64, nmstate-0.2.5-1.el8.noarch 2. add 2 nics eth1, eth2 3. create ovs-bond on the nics (yaml attached). (you may need to do this more than once) Actual results: cmd gets stuck Expected results: Additional info: [root@localhost ~]# rpm -q NetworkManager NetworkManager-1.22.8-1.el8.x86_64 [root@localhost ~]# rpm -q nmstate nmstate-0.2.5-1.el8.noarch