Bug 1722578
Summary: | Loss of network connectivity of a compute node after reboot due to wrong network services startup sequence | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Aviv Guetta <aguetta> |
Component: | openstack-neutron | Assignee: | Slawek Kaplonski <skaplons> |
Status: | CLOSED ERRATA | QA Contact: | Candido Campos <ccamposr> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 13.0 (Queens) | CC: | amoralej, amuller, astupnik, bcafarel, ccamposr, chrisw, ealcaniz, mburns, pmorey, rhos-maint, scohen, skaplons, tfreger |
Target Milestone: | z9 | Keywords: | Triaged, ZStream |
Target Release: | 13.0 (Queens) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-neutron-12.0.6-11.el7ost | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-07 14:00:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Aviv Guetta
2019-06-20 16:35:21 UTC
After troubleshooting it looks like this issue is caused by neutron-openvswitch-agent: after compute reboot br-ex interface comes online for some short interval: other hosts in the same network can ping affected compute. But it goes back offline after openvswitch establishes connection to neutron-openvswitch-agent [1]. We have performed the following troubleshooting steps: - we have sent ICMP echo requests from second compute to affected one and collected timestamps to isolate the time intervals of this issue. Results: - Aug 2 13:24:46 --> outage (compute was rebooted, boot process started at 2019-08-02 13:31:08) - Aug 2 13:31:12 --> successful ping - Aug 2 13:31:26 --> last successful ping - Aug 2 13:31:27 --> outage - we can see that outage occurred when OVS connected to neutron-openvswitch-agent [1] Next steps: - it will be great to have some update from neutron developers: sosreports are available, you can check the data we have collected in collect-data.tar.gz archive; - support will enable debug for neutron services and provide detailed logs for neutron OVS agent at the time of the outage [1] 2019-08-02T11:31:27.013Z|00396|rconn|INFO|br-uplink1<->tcp:127.0.0.1:6633: connected 2019-08-02T11:31:27.014Z|00397|rconn|INFO|br-ex<->tcp:127.0.0.1:6633: connected 2019-08-02T11:31:27.014Z|00398|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connected Sorry, I haven't included the root cause of the networking outage in my previous comment: there are no flows on br-ex bridge. This issue occurs only if customer reboots a compute with some of the bonds in down state. Short summary: - before reboot customer shuts down one of bond interfaces - after reboot IP address from br-ex interface becomes available for short period of time - after [1] this IP address becomes unavailable because there are no flows in br-ex table. [1] 2019-08-02T11:31:27.013Z|00396|rconn|INFO|br-uplink1<->tcp:127.0.0.1:6633: connected 2019-08-02T11:31:27.014Z|00397|rconn|INFO|br-ex<->tcp:127.0.0.1:6633: connected 2019-08-02T11:31:27.014Z|00398|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connected I just reported related bug in u/s: https://bugs.launchpad.net/neutron/+bug/1840443 I think that it will be easy to fix this in u/s. If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3803 |