Bug 1483962
| Summary: | Network on controllers is down due to neutron-ovs-cleanup failure | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Pradipta Kumar Sahoo <psahoo> | |
| Component: | openstack-neutron | Assignee: | Terry Wilson <twilson> | |
| Status: | CLOSED ERRATA | QA Contact: | Toni Freger <tfreger> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.0 (Kilo) | CC: | amuller, ccollett, chrisw, nyechiel, psahoo, srevivo, twilson | |
| Target Milestone: | zstream | Keywords: | Triaged, ZStream | |
| Target Release: | 7.0 (Kilo) | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-neutron-2015.1.4-31.el7ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1531141 1531180 (view as bug list) | Environment: | ||
| Last Closed: | 2018-03-07 15:22:24 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1531141, 1531143, 1531144, 1531180 | |||
|
Description
Pradipta Kumar Sahoo
2017-08-22 11:39:12 UTC
Can we have an sosreport from a node that includes the ovs-cleanup failures? Yeah, we'll need the logs. ovs_cleanup could definitely be sped up by not using ovs_lib's delete_port for each port and instead creating a single transaction that deletes all of the ports that need deleting. But w/o the log, I'm not sure exactly why things are timing out. The SOS reports don't seem include the /var/log/neutron/ovs-cleanup.log files. Do these files exist somewhere? Testing with creating dummy neutron ports via: $ sudo rmmod dummy; sudo modprobe dummy numdummies=1000 $ for ((i=0;i<1000;i++));do echo "-- add-port br-int dummy$i -- set Interface dummy$i external-ids:attached-mac=00:01:02:03:04:05 external-ids:iface-id=1";done|xargs sudo ovs-vsctl $ time sudo systemctl stop neutron-ovs-cleanup.service results in a timeout at 90s and leftover ports. re-running cleans up the ports. Doing the same with: $ time sudo systemctl start neutron-ovs-cleanup.service results in all ports being cleaned up after ~130s. So the stop/start timeouts seem to be different at least on my RHEL 7.4/kilo install. Modifying /usr/lib/systemd/system/neutron-ovs-cleanup.service to have: [Service] ... TimeoutSec=0 allows systemctl stop neutron-ovs-cleanup to run until completion. I think this would be a sufficient solution (several other openstack services have stop/start timeout=0). I should note, however, that on one occasion i did not reload the dummy module and recreated the ports in ovs. There were errors from vswitchd, though the ports existed in the OVSDB. Running neutron-ovs-cleanup start ran *very* slowly in this case (after 10 minutes, there were only about 100 ports deleted). strace showed that it was waiting on the rootwrap daemon to send responses. So setting the an infinite timeout could theoretically cause a server to reboot extremely slowly. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0463 |