| Summary: | openvswitch-agent timing out on "ovs-vsctl set port" operations | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Matt Flusche <mflusche> |
| Component: | puppet-neutron | Assignee: | Brent Eagles <beagles> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | nlevinki <nlevinki> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.0 (Juno) | CC: | amuller, beagles, chrisw, ihrachys, jjoyce, jlibosva, jschluet, mflusche, nyechiel, rcernin, sclewis, scorcora, slinaber, srevivo, tvignaud, twilson |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | 12.0 (Pike) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-16 17:36:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
> I think we want to increase the default timeout so other customers will not be affected by this. What do you think Terry?
I'd be ok with increasing the timeout; though 10 seconds seems like a long time for an ovs-vsctl execution. I'd like to see some profiling to see what is taking so long.
Terry
(In reply to Terry Wilson from comment #13) > > I think we want to increase the default timeout so other customers will not be affected by this. What do you think Terry? > > I'd be ok with increasing the timeout; though 10 seconds seems like a long > time for an ovs-vsctl execution. I'd like to see some profiling to see what > is taking so long. This came up again in https://access.redhat.com/support/cases/#/case/01787838. I suspect that the original default of 10 was not chosen via scientific methods, and that 20 would be equally as arbitrary, but would have our customers avoid these type of issues. I'd suggest we either: 1) Bump the default significantly. 2) Stick to 10 but improve the error handling so that when we do hit a timeout, DEBUG level logs would dump profiling and hypervisor CPU consumption data to help us understand why we hit the timeout. Obviously (1) is significantly more feasible :) > > Terry BTW I proposed to rename the option: https://review.openstack.org/#/c/518391/ so any fix applied here should count on the new name. What's the expected fix here? A neutron bump? A bump via tripleo/puppet? No bump but better error handling? Nothing at all? This keeps falling to the back of my queue. It should be a simple fix to puppet-neutron, I'll look. We've switched to ovsdb native interface. If you still experience timeout issues in newer OSP product, please feel free to re-open this bug. |
Description of problem: On servers with a relatively high number of neutron net namespaces, during restart we are seeing the following in openvswitch-agent.log: Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', 'Port', 'ha-f1ff1111-1f', 'tag=236'] Exit code: 242 Stdout: '' Stderr: '2016-02-25T23:18:18Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)\n' On a system that contains 1022 namespaces; this error was seen 52 times during startup (for different ports); however, for the ports that generated this error they appear to have been configured correctly. Version-Release number of selected component (if applicable): openvswitch-2.4.0-1.el7.x86_64 openstack-neutron-openvswitch-2014.2.3-26.el7ost.rhbz1281583.noarch python-neutron-2014.2.3-26.el7ost.rhbz1281583.noarch How reproducible: I believe customer can reproduce on demand Steps to Reproduce: 1. Restart the neutron server running l3-agent, dhcp-agent, lbaas-agent 2. 3. Actual results: Exit code: 242 Stdout: '' Stderr: '2016-02-25T23:18:18Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)\n' Expected results: No error Additional info: It does appear the ovs port does get set correctly. Port "ha-f1ff1111-1f" tag: 236 Interface "ha-f1ff1111-1f" type: internal