Bug 1792417 - HA routers not cleaned properly
Summary: HA routers not cleaned properly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: 16.0 (Train on RHEL 8.1)
Assignee: Nate Johnston
QA Contact: Alex Katz
URL:
Whiteboard:
Depends On:
Blocks: 1787632
TreeView+ depends on / blocked
 
Reported: 2020-01-17 16:00 UTC by Slawek Kaplonski
Modified: 2020-03-03 09:45 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200128213922.1b63547.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-03 09:45:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1860155 0 None None None 2020-01-17 16:55:28 UTC
OpenStack gerrit 703123 0 None MERGED Fix substitution in kill-script 2020-12-19 22:22:55 UTC
OpenStack gerrit 703128 0 None MERGED Add handling of signal 15 in kill script 2020-12-19 22:23:26 UTC
OpenStack gerrit 704463 0 None MERGED Fix kill-script 2020-12-19 22:22:55 UTC
Red Hat Product Errata RHBA-2020:0655 0 None None None 2020-03-03 09:45:47 UTC

Description Slawek Kaplonski 2020-01-17 16:00:00 UTC
I noticed in logs from job https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-network-neutron-16_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ml2ovs-non_dvr/28/artifact/ that L3 HA routers aren't cleaned properly as there is problem with killing keepalived containers.

Error in L3 agent logs:

2020-01-16 04:46:55.521 121536 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['keepalived-kill', '15', '888052'] execute_rootwrap_daemon /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:103
2020-01-16 04:46:55.686 121536 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: + exec
+ trap 'exec 2>&4 1>&3' 0 1 2 3
+ exec

2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent [-] Error while deleting router 82565834-bf99-431f-9092-e68fae912344: neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: + exec
+ trap 'exec 2>&4 1>&3' 0 1 2 3
+ exec
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 506, in _safe_router_removed
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     self._router_removed(ri, router_id)
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 542, in _router_removed
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     self.router_info[router_id] = ri
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     self.force_reraise()
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     six.reraise(self.type_, self.value, self.tb)
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     raise value
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 539, in _router_removed
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     ri.delete()
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/l3/ha_router.py", line 479, in delete
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     self.disable_keepalived()
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/l3/ha_router.py", line 190, in disable_keepalived
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     self.keepalived_manager.disable()
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/keepalived.py", line 453, in disable
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     pm.disable(sig='15')
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 113, in disable
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     utils.execute(cmd, run_as_root=self.run_as_root)
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent     returncode=returncode)
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: + exec
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent + trap 'exec 2>&4 1>&3' 0 1 2 3
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent + exec
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent 
2020-01-16 04:46:55.687 121536 ERROR neutron.agent.l3.agent 


Error in kill-script.log:
+ SIG=15
+ PID=881230
++ ip netns identify 881230
+ NETNS=qrouter-82565834-bf99-431f-9092-e68fae912344
+ '[' xqrouter-82565834-bf99-431f-9092-e68fae912344 == x ']'
+ CLI='nsenter --net=/run/netns/qrouter-82565834-bf99-431f-9092-e68fae912344 --preserve-credentials -m -t 1 podman'
+ '[' -f /proc/881230/cgroup ']'
++ awk 'BEGIN {FS="[-.]"} /name=/{print $3}' /proc/881230/cgroup
+ CT_ID=31d9a79a18faa70cff94cbe1ea96073ff2a96932f45b31aaa6792792c7e589e3
++ nsenter --net=/run/netns/qrouter-82565834-bf99-431f-9092-e68fae912344 --preserve-credentials -m -t 1 podman inspect -f '{{.Name}}' 31d9a79a18faa70cff94cbe1ea96073ff2a96932f45b31aaa6792792c7e589e3
+ CT_NAME=neutron-keepalived-qrouter-82565834-bf99-431f-9092-e68fae912344
+ case $SIG in
/etc/neutron/kill_scripts/keepalived-kill: line 50: Unknown action ${SIG} for ${$CT_NAME} ${CT_ID}: bad substitution



I'm not sure what real problems it may cause for users. For sure there will be not killed keepalived processes on host but it may potentially be also the reason of failures of some of tests from tempest.scenario.test_network_basic_ops.TestNetworkBasicOps which is reported in https://bugzilla.redhat.com/show_bug.cgi?id=1787632

Comment 10 Alex McLeod 2020-02-19 12:39:25 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 12 errata-xmlrpc 2020-03-03 09:45:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0655


Note You need to log in before you can comment on or make changes to this bug.