Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1566463 - OSP containers do not re-spawned after killing them by docker
OSP containers do not re-spawned after killing them by docker
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common (Show other bugs)
13.0 (Queens)
Unspecified Unspecified
high Severity urgent
: rc
: 13.0 (Queens)
Assigned To: Alex Schultz
Omri Hochman
: Triaged
Depends On:
Blocks: 1433537
  Show dependency treegraph
 
Reported: 2018-04-12 07:19 EDT by Eran Kuris
Modified: 2018-06-27 09:51 EDT (History)
15 users (show)

See Also:
Fixed In Version: openstack-tripleo-common-8.6.1-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
The 'docker kill' command does not exit if the container is set to automatically restart. If a user attempts to run 'docker kill <container>', it may hang indefinitely. In this case, CTRL+C will stop the command. To avoid the problem, use 'docker stop' (instead of 'docker kill') to stop a containerized service.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-06-27 09:50:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
nova_api default inspect info (14.34 KB, text/plain)
2018-04-18 21:03 EDT, Alex Schultz
no flags Details
nova_api with --stop-signal SIGTERM inspect info (18.23 KB, text/plain)
2018-04-18 21:03 EDT, Alex Schultz
no flags Details
nova_api inspect before killing (83.90 KB, text/plain)
2018-04-19 11:33 EDT, Alex Schultz
no flags Details
nova_api inspect after killing (16.50 KB, text/plain)
2018-04-19 11:34 EDT, Alex Schultz
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1765802 None None None 2018-04-20 18:05 EDT
OpenStack gerrit 563322 None master: MERGED tripleo-common: Add in a STOPSIGNAL configuration (I1939f9e6b2c432a672c7426ddabdcfca6ce150b7) 2018-05-09 10:57 EDT
OpenStack gerrit 565527 None stable/queens: MERGED tripleo-common: Add in a STOPSIGNAL configuration (I1939f9e6b2c432a672c7426ddabdcfca6ce150b7) 2018-05-09 10:59 EDT
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 09:51 EDT

  None (edit)
Description Eran Kuris 2018-04-12 07:19:03 EDT
Description of problem:
when running kill of osp containers like ovn-controller/ nova-api docker they do not spawn by docker.

$docker kill ddfcef7e7692
$docker ps | grep ovn
$docker ps -a | grep ovn
docker ps -a| grep ovn
1efca8564495        192.168.24.1:8787/rhosp13/openstack-ovn-northd:2018-04-03.3                "/bin/bash /usr/lo..."   13 minutes ago      Up 13 minutes                                    ovn-dbs-bundle-docker-0
ddfcef7e7692        192.168.24.1:8787/rhosp13/openstack-ovn-controller:2018-04-03.3            "kolla_start"            43 hours ago        Exited (137) 5 minutes ago                       ovn_controller

Version-Release number of selected component (if applicable):

13   -p 2018-04-03.3
[root@controller-0 ~]# rpm -qa |grep ovn 
openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64
python-networking-ovn-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64
openstack-nova-novncproxy-17.0.2-0.20180323024604.0390d5f.el7ost.noarch
novnc-0.6.1-1.el7ost.noarch
python-networking-ovn-metadata-agent-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch



How reproducible:
100%

Steps to Reproduce:
1.deploy osp13 ovn ha setup
2.kill the ovn-controller docker on the controller / compute node 
3.the container does not return to life.

you can kill all the container that not manage by Pacemaker like Nova for example
Actual results:


Expected results:


Additional info:
Comment 1 Assaf Muller 2018-04-18 09:41:41 EDT
This is a TripleO wide issue, possibly by design, nevertheless moving to DF DFG.
Comment 2 Alex Schultz 2018-04-18 16:25:07 EDT
It should restart if restart: always is configured in THT. Will have to look to see if it's launched.  It might be an issue with docker though.
Comment 3 Alex Schultz 2018-04-18 18:04:11 EDT
Confirmed the --restart always doesn't seem to be taking effect on any of the containers however I noticed that a container pulled down from docker.io (nginx) does honor the --restart always.
Comment 4 Steve Baker 2018-04-18 19:12:16 EDT
Just to help diagnose what docker is doing, could you please attach the output of "docker inspect <container>" both before and after killing it?
Comment 5 Alex Schultz 2018-04-18 21:03 EDT
Created attachment 1423839 [details]
nova_api default inspect info
Comment 6 Alex Schultz 2018-04-18 21:03 EDT
Created attachment 1423840 [details]
nova_api with --stop-signal SIGTERM inspect info
Comment 7 Alex Schultz 2018-04-18 21:05:33 EDT
So I reproduced this and it appears that the stopsignal configuration on the container is not properly configured in the kolla containers.  When the stop signal is configured to SIGTERM, the container will properly restart but the docker kill command will hang. That being said if you kill -9 the docker process it does restart the container just fine. I'm not sure if the correct thing to do is to fix the kolla containers (probably) or append --stop-signal=SIGTERM in the paunch run command.  I was similarly able to reproduce the kill command hang with the official nginx container as well. So i don't think the docker kill command hanging is related to our containers.
Comment 8 Alex Schultz 2018-04-19 11:33 EDT
Created attachment 1424191 [details]
nova_api inspect before killing
Comment 9 Alex Schultz 2018-04-19 11:34 EDT
Created attachment 1424192 [details]
nova_api inspect after killing
Comment 17 Omri Hochman 2018-05-09 10:32:27 EDT
Issue reproduced with: openstack-tripleo-common-8.6.1-7

[root@undercloud75 ~]# rpm -qa | grep openstack-tripleo-common
openstack-tripleo-common-containers-8.6.1-7.el7ost.noarch
openstack-tripleo-common-8.6.1-7.el7ost.noarch


[root@overcloud-controller-0 ~]# docker ps | grep nova_api
5c55e3dd569f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours (healthy)                       nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]# docker kill 5c55e3dd569f
5c55e3dd569f
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
[root@overcloud-controller-0 ~]# docker ps | grep nova_api
b5636462b69f        192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24                    "kolla_start"            15 hours ago        Up 15 hours                                 nova_api_cron
Comment 18 Omri Hochman 2018-05-09 10:38:54 EDT
[root@overcloud-controller-0 ~]# docker images | grep api.0.1:8787/rhosp13/openstack-nova-api                    13.0-24             b343fcded56c        41 hours ago        875 MB

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Status
            "Status": "exited",
                "Status": "healthy",

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep Running
            "Running": false,

[root@overcloud-controller-0 ~]# docker inspect nova_api | grep StopSignal
[root@overcloud-controller-0 ~]#
Comment 19 Alex Schultz 2018-05-09 10:47:11 EDT
The rpm is fine, the containers were not built with the kolla changes from the openstack-tripleo-common fixes.  Moving back to MODIFIED, we'll need containers rebuild with the tripleo-common from this BZ
Comment 22 Artem Hrechanychenko 2018-05-14 14:54:38 EDT
VERIFIED
puddle - 2018-05-10.3
openstack-tripleo-common-8.6.1-9.el7ost.noarch

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            44 minutes ago      Up 5 minutes (healthy)                        nova_api
542eb0c9125b        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            45 minutes ago      Up 45 minutes                                 nova_api_cron
[heat-admin@controller-0 ~]$ sudo docker kill 879ea02f91eb

[heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api
879ea02f91eb        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3                    "kolla_start"            51 minutes ago      Up 6 minutes (healthy)

[heat-admin@controller-1 ~]$ sudo docker inspect nova_api | grep StopSignal
            "StopSignal": "SIGTERM"
Comment 24 errata-xmlrpc 2018-06-27 09:50:58 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.