Bug 2080811

Summary: [ML2/OVS] neutron-keepalived-qrouter container and neutron-haproxy-qrouter container are not removed even if a correspond router is deleted.
Product: Red Hat OpenStack Reporter: yatanaka
Component: openstack-tripleo-heat-templatesAssignee: Slawek Kaplonski <skaplons>
Status: VERIFIED --- QA Contact: Maor <mblue>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: averdagu, beagles, chrisw, ekuris, gthiemon, mblue, mburns, scohen, skaplons, tkajinam, tvignaud, ykarel
Target Milestone: z5Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)Flags: skaplons: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20230717085025.1608f56.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yatanaka 2022-05-02 03:52:19 UTC
Description of problem:

neutron-keepalived-qrouter container and neutron-haproxy-qrouter container are not removed even if a correspond router is deleted.

~~~
(test) [stack@undercloud ~]$ openstack router delete a2dc8d5e-2806-4b94-8483-1515377c78b7

[root@overcloud-controller-2 ~]# podman ps -a  |grep a2dc8d5e-2806-4b94-8483-1515377c78b7
dc212cfbf236  undercloud.ctlplane.yamato.example.com:8787/rhosp-rhel8/openstack-neutron-l3-agent:16.2           /usr/sbin/keepali...  2 minutes ago      Exited (0) About a minute ago            neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7
a9b7500be6fd  undercloud.ctlplane.yamato.example.com:8787/rhosp-rhel8/openstack-neutron-l3-agent:16.2           /bin/bash -c HAPR...  2 minutes ago      Exited (143) About a minute ago          neutron-haproxy-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7
~~~

Only SIGTERM was sent.
The containers are not removed

~~~
[root@overcloud-controller-2 ~]# grep a2dc8d5e-2806-4b94-8483-1515377c78b7 /var/log/containers/neutron/kill-script.log 
Mon May  2 03:42:16 UTC 2022 Sending signal 'HUP' to neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (dc212cfbf2360fa852c8e7d1bc634764fd1be2053e2163f21f2e95625584de71)
Mon May  2 03:42:26 UTC 2022 Sending signal 'HUP' to neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (dc212cfbf2360fa852c8e7d1bc634764fd1be2053e2163f21f2e95625584de71)
Mon May  2 03:42:36 UTC 2022 Sending signal '15' to neutron-haproxy-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (a9b7500be6fdf6f433b8c34d05fa63b04fac5539f5cf0fb46ac46026668a3723)
Mon May  2 03:42:38 UTC 2022 Sending signal '15' to neutron-keepalived-qrouter-a2dc8d5e-2806-4b94-8483-1515377c78b7 (dc212cfbf2360fa852c8e7d1bc634764fd1be2053e2163f21f2e95625584de71)
~~~

The containers are removed or sent a signal by the following script.
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/neutron/kill-script

The above script is invoked by the following codes.
  - https://github.com/openstack/neutron/blob/050273ca210e5d4a08d39bf7012b15e929844cf6/neutron/agent/linux/external_process.py#L128
  - https://github.com/openstack/neutron/blob/stable/train/neutron/agent/linux/external_process.py#L102
  - https://github.com/openstack/neutron/blob/04e345f2d5788ccde1567084ca8d6a6e35e080fa/neutron/agent/metadata/driver.py#L250-L268
  - https://github.com/openstack/neutron/blob/stable/train/neutron/agent/linux/keepalived.py#L457-L470

If I understand correctly, only SIGTERM will be sent and the containers are not removed.

https://github.com/openstack/neutron/blob/stable/train/neutron/agent/linux/keepalived.py#L457-L470
~~~
    def disable(self):
        self.process_monitor.unregister(uuid=self.resource_id,
                                        service_name=KEEPALIVED_SERVICE_NAME)

        pm = self.get_process()
        pm.disable(sig=str(int(signal.SIGTERM))) <================(*) send SIGTERM
        try:
            utils.wait_until_true(lambda: not pm.active,
                                  timeout=SIGTERM_TIMEOUT)
        except utils.WaitTimeout:
            LOG.warning('Keepalived process %s did not finish after SIGTERM '
                        'signal in %s seconds, sending SIGKILL signal',
                        pm.pid, SIGTERM_TIMEOUT)
            pm.disable(sig=str(int(signal.SIGKILL))) <=================(*)remove container. But this is not called if the SIGTERM was sent correctly.
~~~





Version-Release number of selected component (if applicable):

RHOSP 16.2.1 (my customer's env)
RHOSP 16.2.2 (my lab env)


How reproducible:

Steps to Reproduce:
1. Create an router
  $ openstack router create router1
2. Add a subnet
  $ openstack router add subnet router1 subnet1
3. Remove the subnet
  $ openstack router remove subnet router1 subnet1
4. Delete the router
  $ openstack router delete router1
5. Login to a Controller node
6. The containers corresponding to the deleted router still remain
  $ sudo podman ps -a  |grep a2dc8d5e-2806-4b94-8483-1515377c78b7


Actual results:

The containers were removed


Expected results:

The containers remain


Additional info:

The following bugs are similar to this.
But the version is different.

  - https://bugzilla.redhat.com/show_bug.cgi?id=1839071
  - https://bugzilla.redhat.com/show_bug.cgi?id=1816657

Comment 1 Takashi Kajinami 2022-05-03 06:19:38 UTC
iiuc this is expected behavior and these orphaned containers are deleted when a new sidecar container is started.
 https://github.com/openstack/puppet-tripleo/blob/stable/train/templates/neutron/keepalived.epp#L36-L45
 https://github.com/openstack/puppet-tripleo/blob/stable/train/templates/neutron/haproxy.epp#L37-L46

Comment 6 Lon Hohberger 2023-01-10 11:33:03 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-11.6.1-2.20221010235135.el8ost.  This build is available now.