Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1090421

Summary: neutron-agent-watch fails when deleting failed resource
Product: Red Hat OpenStack Reporter: yfried
Component: openstack-neutronAssignee: Miguel Angel Ajo <majopela>
Status: CLOSED ERRATA QA Contact: yfried
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: chrisw, ddomingo, lpeer, majopela, nyechiel, oblaut, sclewis, sgordon, tfreger, yeylon
Target Milestone: z5Keywords: OtherQA, ZStream
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-2013.2.3-17.el6ost Doc Type: Bug Fix
Doc Text:
The watcher agent (neutron-agent-watch) periodically polls the OpenStack Networking service to get a network list for a specific host. If a network is no longer available (for example, if it is deleted), the watcher agent is supposed to remove it from the 'known' resources dictionary. This 'known' resources dictionary lists what networks are available for scheduling. In previous releases, the watcher agent did not actually remove deleted networks from the 'known' resources dictionary as expected. This made it possible for the agent to crash if a network that was scheduled to a host's DHCP agent or L3 agent is deleted. With this update, the watcher agent now cleans the 'known' resource dictionary as expected, thereby ensuring that deleting networks no longer causes the agent to crash.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-22 17:22:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch for the error
none
Patch none

Description yfried 2014-04-23 09:50:41 UTC
on RHEL6.5

python-neutron-2013.2.3-4.el6ost.noarch
python-neutronclient-2.3.4-1.el6ost.noarch
openstack-neutron-openvswitch-2013.2.3-4.el6ost.noarch
openstack-neutron-2013.2.3-4.el6ost.noarch

Description of problem:
following steps listed in https://bugzilla.redhat.com/show_bug.cgi?id=1051444#c15

deleting the failed network (for which dnsmasq was killed) prints to agent_watch.log:

2014-04-23 12:46:09.030 118482 ERROR root [-] Unexpected exception occurred 51 time(s)... retrying.
2014-04-23 12:46:09.030 118482 TRACE root Traceback (most recent call last):
2014-04-23 12:46:09.030 118482 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 62, in inner_func
2014-04-23 12:46:09.030 118482 TRACE root     return infunc(*args, **kwargs)
2014-04-23 12:46:09.030 118482 TRACE root   File "/usr/bin/neutron-agent-watch", line 627, in run
2014-04-23 12:46:09.030 118482 TRACE root     watcher.run(context)
2014-04-23 12:46:09.030 118482 TRACE root   File "/usr/bin/neutron-agent-watch", line 426, in run
2014-04-23 12:46:09.030 118482 TRACE root     self._run()  # run method implemented in child class
2014-04-23 12:46:09.030 118482 TRACE root   File "/usr/bin/neutron-agent-watch", line 515, in _run
2014-04-23 12:46:09.030 118482 TRACE root     self._remove_old_known_pidfiles(expected_pid_files)
2014-04-23 12:46:09.030 118482 TRACE root   File "/usr/bin/neutron-agent-watch", line 388, in _remove_old_known_pidfiles
2014-04-23 12:46:09.030 118482 TRACE root     self._remove_expected_pid_file(known)
2014-04-23 12:46:09.030 118482 TRACE root AttributeError: 'DhcpAgentWatcher' object has no attribute '_remove_expected_pid_file'

even though pid file in /var/lib/neutron/dhcp/<network-id>/pid was deleted when network was deleted

also:
# /etc/init.d/neutron-dhcp-agent status ; echo $?
neutron-dhcp-agent (pid  20898) is running...
neutron-dhcp-agent health is not good
150

even though agents are now ok.
only restarting the agent-watch service returns it to normal state

Comment 2 Miguel Angel Ajo 2014-06-02 09:14:31 UTC
Created attachment 901387 [details]
Patch for the error

It seems that this case was not covered by my manual testing.

Comment 3 Miguel Angel Ajo 2014-06-02 09:18:17 UTC
This agent should go away as soon as I get the agent-status patches merged upstream, which will have proper functional and unit testing.

Comment 6 Miguel Angel Ajo 2014-09-18 05:52:10 UTC
Created attachment 938753 [details]
Patch

Comment 12 errata-xmlrpc 2014-10-22 17:22:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1686.html