1566463 – OSP containers do not re-spawned after killing them by docker

Bug 1566463 - OSP containers do not re-spawned after killing them by docker

Summary: OSP containers do not re-spawned after killing them by docker

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-common
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	13.0 (Queens)
Assignee:	Alex Schultz
QA Contact:	Omri Hochman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1433537
TreeView+	depends on / blocked

Reported:	2018-04-12 11:19 UTC by Eran Kuris
Modified:	2018-06-27 13:51 UTC (History)
CC List:	15 users (show)
Fixed In Version:	openstack-tripleo-common-8.6.1-6.el7ost
Doc Type:	If docs needed, set a value
Doc Text:	The 'docker kill' command does not exit if the container is set to automatically restart. If a user attempts to run 'docker kill <container>', it may hang indefinitely. In this case, CTRL+C will stop the command. To avoid the problem, use 'docker stop' (instead of 'docker kill') to stop a containerized service.
Clone Of:
Environment:
Last Closed:	2018-06-27 13:50:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
nova_api default inspect info (14.34 KB, text/plain) 2018-04-19 01:03 UTC, Alex Schultz	no flags	Details
nova_api with --stop-signal SIGTERM inspect info (18.23 KB, text/plain) 2018-04-19 01:03 UTC, Alex Schultz	no flags	Details
nova_api inspect before killing (83.90 KB, text/plain) 2018-04-19 15:33 UTC, Alex Schultz	no flags	Details
nova_api inspect after killing (16.50 KB, text/plain) 2018-04-19 15:34 UTC, Alex Schultz	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1765802	None	None	None	2018-04-20 22:05:22 UTC
OpenStack gerrit	563322	None	MERGED	Add in a STOPSIGNAL configuration	2020-09-30 08:53:02 UTC
OpenStack gerrit	565527	None	MERGED	Add in a STOPSIGNAL configuration	2020-09-30 08:53:03 UTC
Red Hat Product Errata	RHEA-2018:2086	None	None	None	2018-06-27 13:51:40 UTC

Description Eran Kuris 2018-04-12 11:19:03 UTC

Description of problem:
when running kill of osp containers like ovn-controller/ nova-api docker they do not spawn by docker.

$docker kill ddfcef7e7692
$docker ps | grep ovn
$docker ps -a | grep ovn
docker ps -a| grep ovn
1efca8564495        192.168.24.1:8787/rhosp13/openstack-ovn-northd:2018-04-03.3                "/bin/bash /usr/lo..."   13 minutes ago      Up 13 minutes                                    ovn-dbs-bundle-docker-0
ddfcef7e7692        192.168.24.1:8787/rhosp13/openstack-ovn-controller:2018-04-03.3            "kolla_start"            43 hours ago        Exited (137) 5 minutes ago                       ovn_controller

Version-Release number of selected component (if applicable):

13   -p 2018-04-03.3
[root@controller-0 ~]# rpm -qa |grep ovn 
openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64
python-networking-ovn-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64
openstack-nova-novncproxy-17.0.2-0.20180323024604.0390d5f.el7ost.noarch
novnc-0.6.1-1.el7ost.noarch
python-networking-ovn-metadata-agent-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch



How reproducible:
100%

Steps to Reproduce:
1.deploy osp13 ovn ha setup
2.kill the ovn-controller docker on the controller / compute node 
3.the container does not return to life.

you can kill all the container that not manage by Pacemaker like Nova for example
Actual results:


Expected results:


Additional info:

Comment 1 Assaf Muller 2018-04-18 13:41:41 UTC

This is a TripleO wide issue, possibly by design, nevertheless moving to DF DFG.

Comment 2 Alex Schultz 2018-04-18 20:25:07 UTC

It should restart if restart: always is configured in THT. Will have to look to see if it's launched.  It might be an issue with docker though.

Comment 3 Alex Schultz 2018-04-18 22:04:11 UTC

Confirmed the --restart always doesn't seem to be taking effect on any of the containers however I noticed that a container pulled down from docker.io (nginx) does honor the --restart always.

Comment 4 Steve Baker 2018-04-18 23:12:16 UTC

Just to help diagnose what docker is doing, could you please attach the output of "docker inspect <container>" both before and after killing it?

Comment 5 Alex Schultz 2018-04-19 01:03:08 UTC

Created attachment 1423839 [details]
nova_api default inspect info

Comment 6 Alex Schultz 2018-04-19 01:03:42 UTC

Created attachment 1423840 [details]
nova_api with --stop-signal SIGTERM inspect info

Comment 7 Alex Schultz 2018-04-19 01:05:33 UTC

So I reproduced this and it appears that the stopsignal configuration on the container is not properly configured in the kolla containers.  When the stop signal is configured to SIGTERM, the container will properly restart but the docker kill command will hang. That being said if you kill -9 the docker process it does restart the container just fine. I'm not sure if the correct thing to do is to fix the kolla containers (probably) or append --stop-signal=SIGTERM in the paunch run command.  I was similarly able to reproduce the kill command hang with the official nginx container as well. So i don't think the docker kill command hanging is related to our containers.

Comment 8 Alex Schultz 2018-04-19 15:33:58 UTC

Created attachment 1424191 [details]
nova_api inspect before killing

Comment 9 Alex Schultz 2018-04-19 15:34:25 UTC

Created attachment 1424192 [details]
nova_api inspect after killing

Comment 17 Omri Hochman 2018-05-09 14:32:27 UTC

Issue reproduced with: openstack-tripleo-common-8.6.1-7

[root@undercloud75 ~]# rpm -qa | grep openstack-tripleo-common
openstack-tripleo-common-containers-8.6.1-7.el7ost.noarch
openstack-tripleo-common-8.6.1-7.el7ost.noarch


[root@overcloud-controller-0 ~]# docker ps | grep nova_api
5c55e3dd569f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours (healthy)                       nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]# docker kill 5c55e3dd569f
5c55e3dd569f
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron

Comment 18 Omri Hochman 2018-05-09 14:38:54 UTC

[root@overcloud-controller-0 ~]# docker images | grep api.0.1:8787/rhosp13/openstack-nova-api                    13.0-24             b343fcded56c        41 hours ago        875 MB

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Status
            "Status": "exited",
                "Status": "healthy",

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Running
            "Running": false,

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep StopSignal
[root@overcloud-controller-0 ~]#

Comment 19 Alex Schultz 2018-05-09 14:47:11 UTC

The rpm is fine, the containers were not built with the kolla changes from the openstack-tripleo-common fixes.  Moving back to MODIFIED, we'll need containers rebuild with the tripleo-common from this BZ

Comment 22 Artem Hrechanychenko 2018-05-14 18:54:38 UTC

VERIFIED
puddle - 2018-05-10.3
openstack-tripleo-common-8.6.1-9.el7ost.noarch

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            44 minutes ago      Up 5 minutes (healthy)                        nova_api
542eb0c9125b        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            45 minutes ago      Up 45 minutes                                 nova_api_cron
[heat-admin@controller-0 ~]$ sudo docker kill 879ea02f91eb

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            51 minutes ago      Up 6 minutes (healthy)

[heat-admin@controller-1 ~]$ sudo docker inspect nova_api | grep StopSignal
            "StopSignal": "SIGTERM"

Comment 24 errata-xmlrpc 2018-06-27 13:50:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.