Bug 1793455 - Cannot restart neutron_ovs_agent container: container kill failed because of 'container not found'
Summary: Cannot restart neutron_ovs_agent container: container kill failed because of ...
Keywords:
Status: CLOSED DUPLICATE of bug 1795376
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Bernard Cafarelli
QA Contact: Eran Kuris
URL:
Whiteboard:
: 1797534 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-21 11:44 UTC by Andreas Karis
Modified: 2023-09-07 21:34 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-24 15:55:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1861694 0 None None None 2020-02-06 14:21:06 UTC
Red Hat Issue Tracker OSP-28320 0 None None None 2023-09-07 21:34:17 UTC
Red Hat Knowledge Base (Solution) 4759761 0 None None None 2020-01-21 13:17:15 UTC

Description Andreas Karis 2020-01-21 11:44:59 UTC
Description of problem:
Cannot restart neutron_ovs_agent container: container kill failed because of 'container not found' 

neutron_ovs_agent containers won't restart. Running `docker restart neutron_ovs_agent` yields a container that shows up as `unhealthy`. 

~~~
[root@compute-02 ~]# docker restart neutron_ovs_agent
neutron_ovs_agent
[root@compute-02 ~]# docker ps | grep neutron_ovs_agent
bcb806bbeed9        registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105   "dumb-init --singl..."   2 minutes ago       Up 2 minutes (unhealthy)                       neutron_ovs_agent
~~~

Workaround:

If the container is unhealthy, run:
~~~
docker rm -f neutron_ovs_agent
docker network disconnect -f host neutron_ovs_agent
# if that doesn't work, run systemctl restart docker
~~~

Then, restart the container:
~~~
docker run --name neutron_ovs_agent --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --pid=host --ulimit=nofile=16384 --health-cmd="/openstack/healthcheck 5672" --privileged=true --restart=always --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/neutron:/var/log/neutron --volume=/var/lib/kolla/config_files/neutron_ovs_agent.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro --volume=/var/lib/docker-config-scripts/neutron_ovs_agent_launcher.sh:/neutron_ovs_agent_launcher.sh:ro --volume=/lib/modules:/lib/modules:ro --volume=/run/openvswitch:/run/openvswitch registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105
~~~

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Andreas Karis 2020-01-21 12:12:30 UTC
We seem to get this:
https://access.redhat.com/solutions/4307991
https://bugzilla.redhat.com/show_bug.cgi?id=1697619

~~~
Jan 17 17:19:58 . dockerd-current[135840]: time="2020-01-17T17:19:58.833169362+01:00" level=error msg="containerd: deleting container" error="exit status 1: \"container f4589ecf883aa24f6562195f5dcbc47ce9700ffe77baa173f70f24459418e950 is not exist\\none or more of the container deletions failed\\n\""
Jan 17 17:19:59 . dockerd-current[135840]: time="2020-01-17T17:19:59.190807658+01:00" level=warning msg="f4589ecf883aa24f6562195f5dcbc47ce9700ffe77baa173f70f24459418e950 cleanup: failed to unmount secrets: invalid argument"
~~~

But in our case, the container is unhealthy and doesn't react after restart ..

Comment 7 Andreas Karis 2020-01-21 12:18:30 UTC
By the way, another workaround is to:
~~~
docker rm -f neutron_ovs_agent
systemctl restart docker
docker run --name neutron_ovs_agent --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --pid=host --ulimit=nofile=16384 --health-cmd="/openstack/healthcheck 5672" --privileged=true --restart=always --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/neutron:/var/log/neutron --volume=/var/lib/kolla/config_files/neutron_ovs_agent.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro --volume=/var/lib/docker-config-scripts/neutron_ovs_agent_launcher.sh:/neutron_ovs_agent_launcher.sh:ro --volume=/lib/modules:/lib/modules:ro --volume=/run/openvswitch:/run/openvswitch registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105
a875dd9a965cb75c298e9cb1d3c9e93981ae13da8871c436ccbfaebe1a4e3148
~~~

The log messages (which might or might not be a red herring) also looks similar to: https://github.com/moby/moby/issues/31768

Comment 11 Andreas Karis 2020-01-21 13:01:42 UTC
Issue identified. This seems to be a problem with the latest version of docker (in combination with neutron_ovs_agent) and downgrading to 104 fixes this:
~~~
yum downgrade docker-1.13.1-104.git4ef4b30.el7.x86_64 docker-client-1.13.1-104.git4ef4b30.el7.x86_64 docker-common-1.13.1-104.git4ef4b30.el7.x86_64 docker-rhel-push-plugin-1.13.1-104.git4ef4b30.el7.x86_64
~~~

~~~
[root@hostname ~]# docker restart neutron_ovs_agent
neutron_ovs_agent
[root@hostname ~]# docker ps | grep neutron_ovs_agent
e8d3e046a5e2        registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-105   "dumb-init --singl..."   About an hour ago   Up About a minute (healthy)                       neutron_ovs_agent
~~~

- Andreas

Comment 12 Andreas Karis 2020-01-21 13:03:29 UTC
So this must have been introduced by one of the following: 

https://access.redhat.com/downloads/content/docker/1.13.1-108.git4ef4b30.el7/x86_64/fd431d51/package-changelog
~~~
2019-12-13 Jindrich Novy <jnovy> - 2:1.13.1-107.git4ef4b30

    - revert fix for #1766665 as RHEL 7 systemd does not have the CollectMode
    property
2019-12-13 Jindrich Novy <jnovy> - 2:1.13.1-108.git4ef4b30

    - bump release to not to clash with RHEL7.8
2019-12-05 Jindrich Novy <jnovy> - 2:1.13.1-106.git4ef4b30

    - fix "libcontainerd: failed to receive event from containerd:" error (#1636244)
    - fix "Pods stuck in terminating state with rpc error: code = 2" (#1653292)
    - fix "Docker panics when performing `docker search` due to potential
    Search bug when using multiple registries" (#1732626)
    - fix race condition in kubelet cgroup destroy process (#1766665)
2019-11-21 Jindrich Novy <jnovy> - 2:1.13.1-105.git4ef4b30

    - update runc
    - Resolves: #1718441
~~~

Comment 14 Andreas Karis 2020-01-21 13:11:21 UTC
This can be reproduced consistently on this customer environment:

upgrading to docker minor 108: neutron_ovs_agent cannot be restarted, stays in unhealthy
downgrading to docker minor 104: neutron_ovs_agent can be restarted

Comment 18 Andreas Karis 2020-01-28 11:18:09 UTC
The same issue was identified here although the paunch guys try working around it on their side: https://bugzilla.redhat.com/show_bug.cgi?id=1790792

Comment 19 Andreas Karis 2020-01-28 15:41:55 UTC
Check https://bugzilla.redhat.com/show_bug.cgi?id=1790792#c19 for a solution to this issue with paunch and for what seems to trigger the issue.

Comment 20 Bernard Cafarelli 2020-01-30 09:17:54 UTC
Taking for tracking, this will probably end up as duplicate of bug 1790792 or bug 1795376, but keeping open until we know which one is "final" fix and confirming it solves customer issue

Comment 21 Mike Burns 2020-02-04 12:41:04 UTC
*** Bug 1797534 has been marked as a duplicate of this bug. ***

Comment 22 Bernard Cafarelli 2020-02-24 15:55:11 UTC
@Andreas I am closing bug as duplicate of 1793455 as everything should be good now with fixed docker release, cases updated and KB article. Feel free to reopen if specific steps were still required!

*** This bug has been marked as a duplicate of bug 1795376 ***


Note You need to log in before you can comment on or make changes to this bug.