Description of problem: After a compute node is powered off and then powered on the neutron-haproxy-ovnmeta are in Created status which cause the metadata service to be unavailable. Version-Release number of selected component (if applicable): rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1 puppet-ovn-15.4.1-0.20191014133046.192ac4e.el8ost.noarch RHOS_TRUNK-16.0-RHEL-8-20200124.n.1 How reproducible: Steps to Reproduce: 1.As described 2. 3. Actual results: Expected results: Additional info:
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c' + nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:z,shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf' Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c" is already in use by "d2eaaa321e37a377e6c550204fb7823204f4438b55822b618510323b4f8f726f". You have to remove that container to be able to reuse that name.: that name is already in use 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event Traceback (most recent call last): 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/ovsdbapp/event.py", line 143, in notify_loop 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event match.run(event, row, updates) 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 93, in run 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event self.agent.update_datapath(str(row.datapath.uuid)) 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 303, in update_datapath 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event self.provision_datapath(datapath) 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 417, in provision_datapath 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event self.conf, bind_address=METADATA_DEFAULT_IP, network_id=datapath) 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 200, in spawn_monitored_metadata_proxy 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event pm.enable() 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 90, in enable 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event run_as_root=self.run_as_root) 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 713, in execute 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event run_as_root=run_as_root) 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event returncode=returncode) 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ; Stderr: + export DOCKER_HOST= 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + DOCKER_HOST= 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + ARGS='-f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ ip netns identify 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + NETNS=ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + NAME=neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + CLI='nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + CMD='$HAPROXY' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ awk '{print $1}' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + LIST='d2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ printf '%s\n' 'd2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ grep :Exited 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + ORPHANTS= 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + '[' -n '' ']' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + printf '%s\n' 'd2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + grep -q 'neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c$' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + echo 'Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:z,shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf' 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c" is already in use by "d2eaaa321e37a377e6c550204fb7823204f4438b55822b618510323b4f8f726f". You have to remove that container to be able to reuse that name.: that name is already in use 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event 2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event
The wrapper script doesn't clean up containers in "Created" status, only "Exited": ORPHANTS=$(printf "%s\n" "${LIST}" | grep ":Exited") We can either grep for both :Exited and :Created. Or we can not spawn a new container if there is already one in Created and start the Created one instead.
The impact is that affected compute node won't be able to use metadata for already existing network. Meaning new instances won't have SSH keys. I'm putting a blocker flag to raise awareness.
It looks like this is a regression between OSP 14 and OSP 15
It comes from a bug in podman: https://github.com/containers/libpod/issues/1703 fixed upstream a year ago: https://github.com/containers/libpod/pull/2273
The setup is OSP16, podman 1.6.4. In my case it's a Hybrid setup (Controllers and undercloud as vms on one Bare metal and the compute is Bare metal).
(In reply to Jakub Libosvar from comment #5) > It comes from a bug in podman: > https://github.com/containers/libpod/issues/1703 fixed upstream a year ago: > https://github.com/containers/libpod/pull/2273 so if this is the case we can consider it as not a blocker? According to Edu & roman testing on OVN virt deployment without openshist the issue does not reproduce
(In reply to Eran Kuris from comment #7) > (In reply to Jakub Libosvar from comment #5) > > It comes from a bug in podman: > > https://github.com/containers/libpod/issues/1703 fixed upstream a year ago: > > https://github.com/containers/libpod/pull/2273 > > so if this is the case we can consider it as not a blocker? > According to Edu & roman testing on OVN virt deployment without openshist > the issue does not reproduce I was able to reproduce it on Edu's virtual environment without shiftonstack. I'd keep the blocker flag for now until we know more information and the impact. Also I checked the upstream podman 1.6.0 contains the fix from comment 5 but still the Created status happens even with podman-1.6.4-2.module+el8.1.1+5363+bf8ff1af.x86_64
I have a fix for this, I'll send it upstream soon.
Merged to Train
Verified on: puddle: RHOS_TRUNK-16.0-RHEL-8-20200213.n.1 rpm: puppet-tripleo-11.4.1-0.20200205150840.71ff36d.el8ost 1- create network, subnet and VM instances 2- check VMs running on compute nodes 3- check neutron-haproxy-ovnmeta container running on computes, too 4- hard reboot one compute node: virsh destroy + virsh start 5- start VMs from the rebooted compute node (openstack server start <vm-name>) 6- check neutron-haproxy-ovnmeta correctly started, no error logs at ovn-metadata-agent.log and VMs connectivity is correct I repeated this procedure several times successfully, since the original issue was not always reproduced.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0655