Description of problem:
neutron-keepalived-qrouter container and neutron-haproxy-qrouter container are not removed even if a correspond router is deleted.
~~~
(test) [stack@undercloud ~]$ openstack router delete a2dc8d5e-2806-4b94-8483-1515377c78b7
[root@overcloud-controller-2 ~]# podman ps -a |grep a2dc8d5e-2806-4b94-8483-1515377c78b7
dc212cfbf236 undercloud.ctlplane.yamato.example.com:8787/rhosp-rhel8/openstack-neutron-l3-agent:16.2 /usr/sbin/keepali... 2 minutes ago Exited (0) About a minute ago neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7
a9b7500be6fd undercloud.ctlplane.yamato.example.com:8787/rhosp-rhel8/openstack-neutron-l3-agent:16.2 /bin/bash -c HAPR... 2 minutes ago Exited (143) About a minute ago neutron-haproxy-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7
~~~
Only SIGTERM was sent.
The containers are not removed
~~~
[root@overcloud-controller-2 ~]# grep a2dc8d5e-2806-4b94-8483-1515377c78b7 /var/log/containers/neutron/kill-script.log
Mon May 2 03:42:16 UTC 2022 Sending signal 'HUP' to neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (dc212cfbf2360fa852c8e7d1bc634764fd1be2053e2163f21f2e95625584de71)
Mon May 2 03:42:26 UTC 2022 Sending signal 'HUP' to neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (dc212cfbf2360fa852c8e7d1bc634764fd1be2053e2163f21f2e95625584de71)
Mon May 2 03:42:36 UTC 2022 Sending signal '15' to neutron-haproxy-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (a9b7500be6fdf6f433b8c34d05fa63b04fac5539f5cf0fb46ac46026668a3723)
Mon May 2 03:42:38 UTC 2022 Sending signal '15' to neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (dc212cfbf2360fa852c8e7d1bc634764fd1be2053e2163f21f2e95625584de71)
~~~
The containers are removed or sent a signal by the following script.
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/neutron/kill-script
The above script is invoked by the following codes.
- https://github.com/openstack/neutron/blob/050273ca210e5d4a08d39bf7012b15e929844cf6/neutron/agent/linux/external_process.py#L128
- https://github.com/openstack/neutron/blob/stable/train/neutron/agent/linux/external_process.py#L102
- https://github.com/openstack/neutron/blob/04e345f2d5788ccde1567084ca8d6a6e35e080fa/neutron/agent/metadata/driver.py#L250-L268
- https://github.com/openstack/neutron/blob/stable/train/neutron/agent/linux/keepalived.py#L457-L470
If I understand correctly, only SIGTERM will be sent and the containers are not removed.
https://github.com/openstack/neutron/blob/stable/train/neutron/agent/linux/keepalived.py#L457-L470
~~~
def disable(self):
self.process_monitor.unregister(uuid=self.resource_id,
service_name=KEEPALIVED_SERVICE_NAME)
pm = self.get_process()
pm.disable(sig=str(int(signal.SIGTERM))) <================(*) send SIGTERM
try:
utils.wait_until_true(lambda: not pm.active,
timeout=SIGTERM_TIMEOUT)
except utils.WaitTimeout:
LOG.warning('Keepalived process %s did not finish after SIGTERM '
'signal in %s seconds, sending SIGKILL signal',
pm.pid, SIGTERM_TIMEOUT)
pm.disable(sig=str(int(signal.SIGKILL))) <=================(*)remove container. But this is not called if the SIGTERM was sent correctly.
~~~
Version-Release number of selected component (if applicable):
RHOSP 16.2.1 (my customer's env)
RHOSP 16.2.2 (my lab env)
How reproducible:
Steps to Reproduce:
1. Create an router
$ openstack router create router1
2. Add a subnet
$ openstack router add subnet router1 subnet1
3. Remove the subnet
$ openstack router remove subnet router1 subnet1
4. Delete the router
$ openstack router delete router1
5. Login to a Controller node
6. The containers corresponding to the deleted router still remain
$ sudo podman ps -a |grep a2dc8d5e-2806-4b94-8483-1515377c78b7
Actual results:
The containers were removed
Expected results:
The containers remain
Additional info:
The following bugs are similar to this.
But the version is different.
- https://bugzilla.redhat.com/show_bug.cgi?id=1839071
- https://bugzilla.redhat.com/show_bug.cgi?id=1816657