Bug 1566463

Summary: OSP containers do not re-spawned after killing them by docker
Product: Red Hat OpenStack Reporter: Eran Kuris <ekuris>
Component: openstack-tripleo-commonAssignee: Alex Schultz <aschultz>
Status: CLOSED ERRATA QA Contact: Omri Hochman <ohochman>
Severity: urgent Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: ahrechan, amuller, aschultz, astafeye, bcafarel, bdobreli, jamsmith, jcoufal, jschluet, knylande, m.andre, mburns, michele, ohochman, slinaber
Target Milestone: rcKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-8.6.1-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
The 'docker kill' command does not exit if the container is set to automatically restart. If a user attempts to run 'docker kill <container>', it may hang indefinitely. In this case, CTRL+C will stop the command. To avoid the problem, use 'docker stop' (instead of 'docker kill') to stop a containerized service.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:50:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1433537    
Attachments:
Description Flags
nova_api default inspect info
none
nova_api with --stop-signal SIGTERM inspect info
none
nova_api inspect before killing
none
nova_api inspect after killing none

Description Eran Kuris 2018-04-12 11:19:03 UTC
Description of problem:
when running kill of osp containers like ovn-controller/ nova-api docker they do not spawn by docker.

$docker kill ddfcef7e7692
$docker ps | grep ovn
$docker ps -a | grep ovn
docker ps -a| grep ovn
1efca8564495        192.168.24.1:8787/rhosp13/openstack-ovn-northd:2018-04-03.3                "/bin/bash /usr/lo..."   13 minutes ago      Up 13 minutes                                    ovn-dbs-bundle-docker-0
ddfcef7e7692        192.168.24.1:8787/rhosp13/openstack-ovn-controller:2018-04-03.3            "kolla_start"            43 hours ago        Exited (137) 5 minutes ago                       ovn_controller

Version-Release number of selected component (if applicable):

13   -p 2018-04-03.3
[root@controller-0 ~]# rpm -qa |grep ovn 
openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64
python-networking-ovn-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64
openstack-nova-novncproxy-17.0.2-0.20180323024604.0390d5f.el7ost.noarch
novnc-0.6.1-1.el7ost.noarch
python-networking-ovn-metadata-agent-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch



How reproducible:
100%

Steps to Reproduce:
1.deploy osp13 ovn ha setup
2.kill the ovn-controller docker on the controller / compute node 
3.the container does not return to life.

you can kill all the container that not manage by Pacemaker like Nova for example
Actual results:


Expected results:


Additional info:

Comment 1 Assaf Muller 2018-04-18 13:41:41 UTC
This is a TripleO wide issue, possibly by design, nevertheless moving to DF DFG.

Comment 2 Alex Schultz 2018-04-18 20:25:07 UTC
It should restart if restart: always is configured in THT. Will have to look to see if it's launched.  It might be an issue with docker though.

Comment 3 Alex Schultz 2018-04-18 22:04:11 UTC
Confirmed the --restart always doesn't seem to be taking effect on any of the containers however I noticed that a container pulled down from docker.io (nginx) does honor the --restart always.

Comment 4 Steve Baker 2018-04-18 23:12:16 UTC
Just to help diagnose what docker is doing, could you please attach the output of "docker inspect <container>" both before and after killing it?

Comment 5 Alex Schultz 2018-04-19 01:03:08 UTC
Created attachment 1423839 [details]
nova_api default inspect info

Comment 6 Alex Schultz 2018-04-19 01:03:42 UTC
Created attachment 1423840 [details]
nova_api with --stop-signal SIGTERM inspect info

Comment 7 Alex Schultz 2018-04-19 01:05:33 UTC
So I reproduced this and it appears that the stopsignal configuration on the container is not properly configured in the kolla containers.  When the stop signal is configured to SIGTERM, the container will properly restart but the docker kill command will hang. That being said if you kill -9 the docker process it does restart the container just fine. I'm not sure if the correct thing to do is to fix the kolla containers (probably) or append --stop-signal=SIGTERM in the paunch run command.  I was similarly able to reproduce the kill command hang with the official nginx container as well. So i don't think the docker kill command hanging is related to our containers.

Comment 8 Alex Schultz 2018-04-19 15:33:58 UTC
Created attachment 1424191 [details]
nova_api inspect before killing

Comment 9 Alex Schultz 2018-04-19 15:34:25 UTC
Created attachment 1424192 [details]
nova_api inspect after killing

Comment 17 Omri Hochman 2018-05-09 14:32:27 UTC
Issue reproduced with: openstack-tripleo-common-8.6.1-7

[root@undercloud75 ~]# rpm -qa | grep openstack-tripleo-common
openstack-tripleo-common-containers-8.6.1-7.el7ost.noarch
openstack-tripleo-common-8.6.1-7.el7ost.noarch


[root@overcloud-controller-0 ~]# docker ps | grep nova_api
5c55e3dd569f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours (healthy)                       nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]# docker kill 5c55e3dd569f
5c55e3dd569f
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron

Comment 18 Omri Hochman 2018-05-09 14:38:54 UTC
[root@overcloud-controller-0 ~]# docker images | grep api.0.1:8787/rhosp13/openstack-nova-api                    13.0-24             b343fcded56c        41 hours ago        875 MB

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Status
            "Status": "exited",
                "Status": "healthy",

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Running
            "Running": false,

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep StopSignal
[root@overcloud-controller-0 ~]#

Comment 19 Alex Schultz 2018-05-09 14:47:11 UTC
The rpm is fine, the containers were not built with the kolla changes from the openstack-tripleo-common fixes.  Moving back to MODIFIED, we'll need containers rebuild with the tripleo-common from this BZ

Comment 22 Artem Hrechanychenko 2018-05-14 18:54:38 UTC
VERIFIED
puddle - 2018-05-10.3
openstack-tripleo-common-8.6.1-9.el7ost.noarch

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            44 minutes ago      Up 5 minutes (healthy)                        nova_api
542eb0c9125b        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            45 minutes ago      Up 45 minutes                                 nova_api_cron
[heat-admin@controller-0 ~]$ sudo docker kill 879ea02f91eb

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            51 minutes ago      Up 6 minutes (healthy)

[heat-admin@controller-1 ~]$ sudo docker inspect nova_api | grep StopSignal
            "StopSignal": "SIGTERM"

Comment 24 errata-xmlrpc 2018-06-27 13:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086