Description of problem: Cannot restart neutron_ovs_agent container: container kill failed because of 'container not found' neutron_ovs_agent containers won't restart. Running `docker restart neutron_ovs_agent` yields a container that shows up as `unhealthy`. ~~~ [root@compute-02 ~]# docker restart neutron_ovs_agent neutron_ovs_agent [root@compute-02 ~]# docker ps | grep neutron_ovs_agent bcb806bbeed9 registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105 "dumb-init --singl..." 2 minutes ago Up 2 minutes (unhealthy) neutron_ovs_agent ~~~ Workaround: If the container is unhealthy, run: ~~~ docker rm -f neutron_ovs_agent docker network disconnect -f host neutron_ovs_agent # if that doesn't work, run systemctl restart docker ~~~ Then, restart the container: ~~~ docker run --name neutron_ovs_agent --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --pid=host --ulimit=nofile=16384 --health-cmd="/openstack/healthcheck 5672" --privileged=true --restart=always --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/neutron:/var/log/neutron --volume=/var/lib/kolla/config_files/neutron_ovs_agent.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro --volume=/var/lib/docker-config-scripts/neutron_ovs_agent_launcher.sh:/neutron_ovs_agent_launcher.sh:ro --volume=/lib/modules:/lib/modules:ro --volume=/run/openvswitch:/run/openvswitch registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105 ~~~ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
We seem to get this: https://access.redhat.com/solutions/4307991 https://bugzilla.redhat.com/show_bug.cgi?id=1697619 ~~~ Jan 17 17:19:58 . dockerd-current[135840]: time="2020-01-17T17:19:58.833169362+01:00" level=error msg="containerd: deleting container" error="exit status 1: \"container f4589ecf883aa24f6562195f5dcbc47ce9700ffe77baa173f70f24459418e950 is not exist\\none or more of the container deletions failed\\n\"" Jan 17 17:19:59 . dockerd-current[135840]: time="2020-01-17T17:19:59.190807658+01:00" level=warning msg="f4589ecf883aa24f6562195f5dcbc47ce9700ffe77baa173f70f24459418e950 cleanup: failed to unmount secrets: invalid argument" ~~~ But in our case, the container is unhealthy and doesn't react after restart ..
By the way, another workaround is to: ~~~ docker rm -f neutron_ovs_agent systemctl restart docker docker run --name neutron_ovs_agent --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --pid=host --ulimit=nofile=16384 --health-cmd="/openstack/healthcheck 5672" --privileged=true --restart=always --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/neutron:/var/log/neutron --volume=/var/lib/kolla/config_files/neutron_ovs_agent.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro --volume=/var/lib/docker-config-scripts/neutron_ovs_agent_launcher.sh:/neutron_ovs_agent_launcher.sh:ro --volume=/lib/modules:/lib/modules:ro --volume=/run/openvswitch:/run/openvswitch registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105 a875dd9a965cb75c298e9cb1d3c9e93981ae13da8871c436ccbfaebe1a4e3148 ~~~ The log messages (which might or might not be a red herring) also looks similar to: https://github.com/moby/moby/issues/31768
Issue identified. This seems to be a problem with the latest version of docker (in combination with neutron_ovs_agent) and downgrading to 104 fixes this: ~~~ yum downgrade docker-1.13.1-104.git4ef4b30.el7.x86_64 docker-client-1.13.1-104.git4ef4b30.el7.x86_64 docker-common-1.13.1-104.git4ef4b30.el7.x86_64 docker-rhel-push-plugin-1.13.1-104.git4ef4b30.el7.x86_64 ~~~ ~~~ [root@hostname ~]# docker restart neutron_ovs_agent neutron_ovs_agent [root@hostname ~]# docker ps | grep neutron_ovs_agent e8d3e046a5e2 registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105 "dumb-init --singl..." About an hour ago Up About a minute (healthy) neutron_ovs_agent ~~~ - Andreas
So this must have been introduced by one of the following: https://access.redhat.com/downloads/content/docker/1.13.1-108.git4ef4b30.el7/x86_64/fd431d51/package-changelog ~~~ 2019-12-13 Jindrich Novy <jnovy> - 2:1.13.1-107.git4ef4b30 - revert fix for #1766665 as RHEL 7 systemd does not have the CollectMode property 2019-12-13 Jindrich Novy <jnovy> - 2:1.13.1-108.git4ef4b30 - bump release to not to clash with RHEL7.8 2019-12-05 Jindrich Novy <jnovy> - 2:1.13.1-106.git4ef4b30 - fix "libcontainerd: failed to receive event from containerd:" error (#1636244) - fix "Pods stuck in terminating state with rpc error: code = 2" (#1653292) - fix "Docker panics when performing `docker search` due to potential Search bug when using multiple registries" (#1732626) - fix race condition in kubelet cgroup destroy process (#1766665) 2019-11-21 Jindrich Novy <jnovy> - 2:1.13.1-105.git4ef4b30 - update runc - Resolves: #1718441 ~~~
This can be reproduced consistently on this customer environment: upgrading to docker minor 108: neutron_ovs_agent cannot be restarted, stays in unhealthy downgrading to docker minor 104: neutron_ovs_agent can be restarted
The same issue was identified here although the paunch guys try working around it on their side: https://bugzilla.redhat.com/show_bug.cgi?id=1790792
Check https://bugzilla.redhat.com/show_bug.cgi?id=1790792#c19 for a solution to this issue with paunch and for what seems to trigger the issue.
Taking for tracking, this will probably end up as duplicate of bug 1790792 or bug 1795376, but keeping open until we know which one is "final" fix and confirming it solves customer issue
*** Bug 1797534 has been marked as a duplicate of this bug. ***
@Andreas I am closing bug as duplicate of 1793455 as everything should be good now with fixed docker release, cases updated and KB article. Feel free to reopen if specific steps were still required! *** This bug has been marked as a duplicate of bug 1795376 ***