Bug 1566463 - OSP containers do not re-spawned after killing them by docker
Summary: OSP containers do not re-spawned after killing them by docker
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: 13.0 (Queens)
Assignee: Alex Schultz
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks: 1433537
TreeView+ depends on / blocked
 
Reported: 2018-04-12 11:19 UTC by Eran Kuris
Modified: 2018-06-27 13:51 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-common-8.6.1-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
The 'docker kill' command does not exit if the container is set to automatically restart. If a user attempts to run 'docker kill <container>', it may hang indefinitely. In this case, CTRL+C will stop the command. To avoid the problem, use 'docker stop' (instead of 'docker kill') to stop a containerized service.
Clone Of:
Environment:
Last Closed: 2018-06-27 13:50:58 UTC
Target Upstream Version:


Attachments (Terms of Use)
nova_api default inspect info (14.34 KB, text/plain)
2018-04-19 01:03 UTC, Alex Schultz
no flags Details
nova_api with --stop-signal SIGTERM inspect info (18.23 KB, text/plain)
2018-04-19 01:03 UTC, Alex Schultz
no flags Details
nova_api inspect before killing (83.90 KB, text/plain)
2018-04-19 15:33 UTC, Alex Schultz
no flags Details
nova_api inspect after killing (16.50 KB, text/plain)
2018-04-19 15:34 UTC, Alex Schultz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1765802 0 None None None 2018-04-20 22:05:22 UTC
OpenStack gerrit 563322 0 None MERGED Add in a STOPSIGNAL configuration 2020-09-30 08:53:02 UTC
OpenStack gerrit 565527 0 None MERGED Add in a STOPSIGNAL configuration 2020-09-30 08:53:03 UTC
Red Hat Product Errata RHEA-2018:2086 0 None None None 2018-06-27 13:51:40 UTC

Description Eran Kuris 2018-04-12 11:19:03 UTC
Description of problem:
when running kill of osp containers like ovn-controller/ nova-api docker they do not spawn by docker.

$docker kill ddfcef7e7692
$docker ps | grep ovn
$docker ps -a | grep ovn
docker ps -a| grep ovn
1efca8564495        192.168.24.1:8787/rhosp13/openstack-ovn-northd:2018-04-03.3                "/bin/bash /usr/lo..."   13 minutes ago      Up 13 minutes                                    ovn-dbs-bundle-docker-0
ddfcef7e7692        192.168.24.1:8787/rhosp13/openstack-ovn-controller:2018-04-03.3            "kolla_start"            43 hours ago        Exited (137) 5 minutes ago                       ovn_controller

Version-Release number of selected component (if applicable):

13   -p 2018-04-03.3
[root@controller-0 ~]# rpm -qa |grep ovn 
openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64
python-networking-ovn-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64
openstack-nova-novncproxy-17.0.2-0.20180323024604.0390d5f.el7ost.noarch
novnc-0.6.1-1.el7ost.noarch
python-networking-ovn-metadata-agent-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch



How reproducible:
100%

Steps to Reproduce:
1.deploy osp13 ovn ha setup
2.kill the ovn-controller docker on the controller / compute node 
3.the container does not return to life.

you can kill all the container that not manage by Pacemaker like Nova for example
Actual results:


Expected results:


Additional info:

Comment 1 Assaf Muller 2018-04-18 13:41:41 UTC
This is a TripleO wide issue, possibly by design, nevertheless moving to DF DFG.

Comment 2 Alex Schultz 2018-04-18 20:25:07 UTC
It should restart if restart: always is configured in THT. Will have to look to see if it's launched.  It might be an issue with docker though.

Comment 3 Alex Schultz 2018-04-18 22:04:11 UTC
Confirmed the --restart always doesn't seem to be taking effect on any of the containers however I noticed that a container pulled down from docker.io (nginx) does honor the --restart always.

Comment 4 Steve Baker 2018-04-18 23:12:16 UTC
Just to help diagnose what docker is doing, could you please attach the output of "docker inspect <container>" both before and after killing it?

Comment 5 Alex Schultz 2018-04-19 01:03:08 UTC
Created attachment 1423839 [details]
nova_api default inspect info

Comment 6 Alex Schultz 2018-04-19 01:03:42 UTC
Created attachment 1423840 [details]
nova_api with --stop-signal SIGTERM inspect info

Comment 7 Alex Schultz 2018-04-19 01:05:33 UTC
So I reproduced this and it appears that the stopsignal configuration on the container is not properly configured in the kolla containers.  When the stop signal is configured to SIGTERM, the container will properly restart but the docker kill command will hang. That being said if you kill -9 the docker process it does restart the container just fine. I'm not sure if the correct thing to do is to fix the kolla containers (probably) or append --stop-signal=SIGTERM in the paunch run command.  I was similarly able to reproduce the kill command hang with the official nginx container as well. So i don't think the docker kill command hanging is related to our containers.

Comment 8 Alex Schultz 2018-04-19 15:33:58 UTC
Created attachment 1424191 [details]
nova_api inspect before killing

Comment 9 Alex Schultz 2018-04-19 15:34:25 UTC
Created attachment 1424192 [details]
nova_api inspect after killing

Comment 17 Omri Hochman 2018-05-09 14:32:27 UTC
Issue reproduced with: openstack-tripleo-common-8.6.1-7

[root@undercloud75 ~]# rpm -qa | grep openstack-tripleo-common
openstack-tripleo-common-containers-8.6.1-7.el7ost.noarch
openstack-tripleo-common-8.6.1-7.el7ost.noarch


[root@overcloud-controller-0 ~]# docker ps | grep nova_api
5c55e3dd569f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours (healthy)                       nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]# docker kill 5c55e3dd569f
5c55e3dd569f
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron

Comment 18 Omri Hochman 2018-05-09 14:38:54 UTC
[root@overcloud-controller-0 ~]# docker images | grep api.0.1:8787/rhosp13/openstack-nova-api                    13.0-24             b343fcded56c        41 hours ago        875 MB

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Status
            "Status": "exited",
                "Status": "healthy",

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Running
            "Running": false,

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep StopSignal
[root@overcloud-controller-0 ~]#

Comment 19 Alex Schultz 2018-05-09 14:47:11 UTC
The rpm is fine, the containers were not built with the kolla changes from the openstack-tripleo-common fixes.  Moving back to MODIFIED, we'll need containers rebuild with the tripleo-common from this BZ

Comment 22 Artem Hrechanychenko 2018-05-14 18:54:38 UTC
VERIFIED
puddle - 2018-05-10.3
openstack-tripleo-common-8.6.1-9.el7ost.noarch

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            44 minutes ago      Up 5 minutes (healthy)                        nova_api
542eb0c9125b        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            45 minutes ago      Up 45 minutes                                 nova_api_cron
[heat-admin@controller-0 ~]$ sudo docker kill 879ea02f91eb

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            51 minutes ago      Up 6 minutes (healthy)

[heat-admin@controller-1 ~]$ sudo docker inspect nova_api | grep StopSignal
            "StopSignal": "SIGTERM"

Comment 24 errata-xmlrpc 2018-06-27 13:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.