The watcher agent (neutron-agent-watch) periodically polls the OpenStack Networking service to get a network list for a specific host. If a network is no longer available (for example, if it is deleted), the watcher agent is supposed to remove it from the 'known' resources dictionary. This 'known' resources dictionary lists what networks are available for scheduling.
In previous releases, the watcher agent did not actually remove deleted networks from the 'known' resources dictionary as expected. This made it possible for the agent to crash if a network that was scheduled to a host's DHCP agent or L3 agent is deleted.
With this update, the watcher agent now cleans the 'known' resource dictionary as expected, thereby ensuring that deleting networks no longer causes the agent to crash.
on RHEL6.5
python-neutron-2013.2.3-4.el6ost.noarch
python-neutronclient-2.3.4-1.el6ost.noarch
openstack-neutron-openvswitch-2013.2.3-4.el6ost.noarch
openstack-neutron-2013.2.3-4.el6ost.noarch
Description of problem:
following steps listed in https://bugzilla.redhat.com/show_bug.cgi?id=1051444#c15
deleting the failed network (for which dnsmasq was killed) prints to agent_watch.log:
2014-04-23 12:46:09.030 118482 ERROR root [-] Unexpected exception occurred 51 time(s)... retrying.
2014-04-23 12:46:09.030 118482 TRACE root Traceback (most recent call last):
2014-04-23 12:46:09.030 118482 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 62, in inner_func
2014-04-23 12:46:09.030 118482 TRACE root return infunc(*args, **kwargs)
2014-04-23 12:46:09.030 118482 TRACE root File "/usr/bin/neutron-agent-watch", line 627, in run
2014-04-23 12:46:09.030 118482 TRACE root watcher.run(context)
2014-04-23 12:46:09.030 118482 TRACE root File "/usr/bin/neutron-agent-watch", line 426, in run
2014-04-23 12:46:09.030 118482 TRACE root self._run() # run method implemented in child class
2014-04-23 12:46:09.030 118482 TRACE root File "/usr/bin/neutron-agent-watch", line 515, in _run
2014-04-23 12:46:09.030 118482 TRACE root self._remove_old_known_pidfiles(expected_pid_files)
2014-04-23 12:46:09.030 118482 TRACE root File "/usr/bin/neutron-agent-watch", line 388, in _remove_old_known_pidfiles
2014-04-23 12:46:09.030 118482 TRACE root self._remove_expected_pid_file(known)
2014-04-23 12:46:09.030 118482 TRACE root AttributeError: 'DhcpAgentWatcher' object has no attribute '_remove_expected_pid_file'
even though pid file in /var/lib/neutron/dhcp/<network-id>/pid was deleted when network was deleted
also:
# /etc/init.d/neutron-dhcp-agent status ; echo $?
neutron-dhcp-agent (pid 20898) is running...
neutron-dhcp-agent health is not good
150
even though agents are now ok.
only restarting the agent-watch service returns it to normal state
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHSA-2014-1686.html