Description of problem: when running kill of osp containers like ovn-controller/ nova-api docker they do not spawn by docker. $docker kill ddfcef7e7692 $docker ps | grep ovn $docker ps -a | grep ovn docker ps -a| grep ovn 1efca8564495 192.168.24.1:8787/rhosp13/openstack-ovn-northd:2018-04-03.3 "/bin/bash /usr/lo..." 13 minutes ago Up 13 minutes ovn-dbs-bundle-docker-0 ddfcef7e7692 192.168.24.1:8787/rhosp13/openstack-ovn-controller:2018-04-03.3 "kolla_start" 43 hours ago Exited (137) 5 minutes ago ovn_controller Version-Release number of selected component (if applicable): 13 -p 2018-04-03.3 [root@controller-0 ~]# rpm -qa |grep ovn openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64 openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64 python-networking-ovn-4.0.1-0.20180315174741.a57c70e.el7ost.noarch openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64 openstack-nova-novncproxy-17.0.2-0.20180323024604.0390d5f.el7ost.noarch novnc-0.6.1-1.el7ost.noarch python-networking-ovn-metadata-agent-4.0.1-0.20180315174741.a57c70e.el7ost.noarch puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1.deploy osp13 ovn ha setup 2.kill the ovn-controller docker on the controller / compute node 3.the container does not return to life. you can kill all the container that not manage by Pacemaker like Nova for example Actual results: Expected results: Additional info:
This is a TripleO wide issue, possibly by design, nevertheless moving to DF DFG.
It should restart if restart: always is configured in THT. Will have to look to see if it's launched. It might be an issue with docker though.
Confirmed the --restart always doesn't seem to be taking effect on any of the containers however I noticed that a container pulled down from docker.io (nginx) does honor the --restart always.
Just to help diagnose what docker is doing, could you please attach the output of "docker inspect <container>" both before and after killing it?
Created attachment 1423839 [details] nova_api default inspect info
Created attachment 1423840 [details] nova_api with --stop-signal SIGTERM inspect info
So I reproduced this and it appears that the stopsignal configuration on the container is not properly configured in the kolla containers. When the stop signal is configured to SIGTERM, the container will properly restart but the docker kill command will hang. That being said if you kill -9 the docker process it does restart the container just fine. I'm not sure if the correct thing to do is to fix the kolla containers (probably) or append --stop-signal=SIGTERM in the paunch run command. I was similarly able to reproduce the kill command hang with the official nginx container as well. So i don't think the docker kill command hanging is related to our containers.
Created attachment 1424191 [details] nova_api inspect before killing
Created attachment 1424192 [details] nova_api inspect after killing
Issue reproduced with: openstack-tripleo-common-8.6.1-7 [root@undercloud75 ~]# rpm -qa | grep openstack-tripleo-common openstack-tripleo-common-containers-8.6.1-7.el7ost.noarch openstack-tripleo-common-8.6.1-7.el7ost.noarch [root@overcloud-controller-0 ~]# docker ps | grep nova_api 5c55e3dd569f 192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24 "kolla_start" 15 hours ago Up 15 hours (healthy) nova_api b5636462b69f 192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24 "kolla_start" 15 hours ago Up 15 hours nova_api_cron [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# docker kill 5c55e3dd569f 5c55e3dd569f [root@overcloud-controller-0 ~]# docker ps | grep nova_api b5636462b69f 192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24 "kolla_start" 15 hours ago Up 15 hours nova_api_cron [root@overcloud-controller-0 ~]# docker ps | grep nova_api b5636462b69f 192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24 "kolla_start" 15 hours ago Up 15 hours nova_api_cron [root@overcloud-controller-0 ~]# docker ps | grep nova_api b5636462b69f 192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24 "kolla_start" 15 hours ago Up 15 hours nova_api_cron [root@overcloud-controller-0 ~]# docker ps | grep nova_api b5636462b69f 192.168.0.1:8787/rhosp13/openstack-nova-api:13.0-24 "kolla_start" 15 hours ago Up 15 hours nova_api_cron
[root@overcloud-controller-0 ~]# docker images | grep api.0.1:8787/rhosp13/openstack-nova-api 13.0-24 b343fcded56c 41 hours ago 875 MB [root@overcloud-controller-0 ~]# docker inspect nova_api | grep Status "Status": "exited", "Status": "healthy", [root@overcloud-controller-0 ~]# docker inspect nova_api | grep Running "Running": false, [root@overcloud-controller-0 ~]# docker inspect nova_api | grep StopSignal [root@overcloud-controller-0 ~]#
The rpm is fine, the containers were not built with the kolla changes from the openstack-tripleo-common fixes. Moving back to MODIFIED, we'll need containers rebuild with the tripleo-common from this BZ
VERIFIED puddle - 2018-05-10.3 openstack-tripleo-common-8.6.1-9.el7ost.noarch [heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api 879ea02f91eb 192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3 "kolla_start" 44 minutes ago Up 5 minutes (healthy) nova_api 542eb0c9125b 192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3 "kolla_start" 45 minutes ago Up 45 minutes nova_api_cron [heat-admin@controller-0 ~]$ sudo docker kill 879ea02f91eb [heat-admin@controller-0 ~]$ sudo docker ps |grep nova_api 879ea02f91eb 192.168.24.1:8787/rhosp13/openstack-nova-api:2018-05-10.3 "kolla_start" 51 minutes ago Up 6 minutes (healthy) [heat-admin@controller-1 ~]$ sudo docker inspect nova_api | grep StopSignal "StopSignal": "SIGTERM"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086