Bug 1797892 - neutron-haproxy-ovnmeta containers are not up after compute node restarted
Summary: neutron-haproxy-ovnmeta containers are not up after compute node restarted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z1
: 16.0 (Train on RHEL 8.1)
Assignee: Jakub Libosvar
QA Contact: Eduardo Olivares
URL:
Whiteboard:
Depends On:
Blocks: 1843821
TreeView+ depends on / blocked
 
Reported: 2020-02-04 07:36 UTC by Itzik Brown
Modified: 2020-07-15 11:10 UTC (History)
18 users (show)

Fixed In Version: puppet-tripleo-11.4.1-0.20200205150840.71ff36d.el8ost
Doc Type: Known Issue
Doc Text:
There is a known issue in Red Hat OpenStack Platform 16.0, when nodes that experience a hard shutdown put containers that were previously running into a `Created` state in podman when the node reboots. As a workaround, you can run the following Ansible command to clean all haproxy containers in the `Created` state: `ansible -b <nodes> -i /usr/bin/tripleo-ansible-inventory -m shell -a "podman ps -a --format {{'{{'}}.ID{{'}}'}} -f name=haproxy,status=created | xargs podman rm -f || :"` Replace `<nodes>` with a single host from the inventory, a group of hosts, or `all`. After you run this command, the metadata-agent spawns a new container for the given network.
Clone Of:
: 1843821 (view as bug list)
Environment:
Last Closed: 2020-03-03 09:45:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1862010 0 None None None 2020-02-05 12:33:00 UTC
OpenStack gerrit 705777 0 None MERGED Remove side-car containers in Create status 2020-09-20 03:34:09 UTC
Red Hat Knowledge Base (Solution) 5224571 0 None None None 2020-07-15 11:10:57 UTC
Red Hat Product Errata RHBA-2020:0655 0 None None None 2020-03-03 09:45:28 UTC

Description Itzik Brown 2020-02-04 07:36:24 UTC
Description of problem:
After a compute node is powered off and then powered on the neutron-haproxy-ovnmeta are in Created status which cause the metadata service to be unavailable.

Version-Release number of selected component (if applicable):
rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1
puppet-ovn-15.4.1-0.20191014133046.192ac4e.el8ost.noarch
RHOS_TRUNK-16.0-RHEL-8-20200124.n.1

How reproducible:


Steps to Reproduce:
1.As described
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jakub Libosvar 2020-02-04 09:46:24 UTC

+ echo 'Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c'
+ nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:z,shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c" is already in use by "d2eaaa321e37a377e6c550204fb7823204f4438b55822b618510323b4f8f726f". You have to remove that container to be able to reuse that name.: that name is already in use
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event Traceback (most recent call last):
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/event.py", line 143, in notify_loop
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     match.run(event, row, updates)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 93, in run
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     self.agent.update_datapath(str(row.datapath.uuid))
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 303, in update_datapath
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     self.provision_datapath(datapath)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 417, in provision_datapath
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     self.conf, bind_address=METADATA_DEFAULT_IP, network_id=datapath)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 200, in spawn_monitored_metadata_proxy
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     pm.enable()
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 90, in enable
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     run_as_root=self.run_as_root)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 713, in execute
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     run_as_root=run_as_root)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event     returncode=returncode)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ; Stderr: + export DOCKER_HOST=
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + DOCKER_HOST=
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + ARGS='-f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ ip netns identify
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + NETNS=ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + NAME=neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + CLI='nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + CMD='$HAPROXY'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ awk '{print $1}'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + LIST='d2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ printf '%s\n' 'd2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ grep :Exited
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + ORPHANTS=
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + '[' -n '' ']'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + printf '%s\n' 'd2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + grep -q 'neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c$'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + echo 'Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:z,shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c" is already in use by "d2eaaa321e37a377e6c550204fb7823204f4438b55822b618510323b4f8f726f". You have to remove that container to be able to reuse that name.: that name is already in use
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event

Comment 2 Jakub Libosvar 2020-02-04 10:39:02 UTC
The wrapper script doesn't clean up containers in "Created" status, only "Exited":

ORPHANTS=$(printf "%s\n" "${LIST}" | grep ":Exited")

We can either grep for both :Exited and :Created. Or we can not spawn a new container if there is already one in Created and start the Created one instead.

Comment 3 Jakub Libosvar 2020-02-04 10:41:38 UTC
The impact is that affected compute node won't be able to use metadata for already existing network. Meaning new instances won't have SSH keys. I'm putting a blocker flag to raise awareness.

Comment 4 Jakub Libosvar 2020-02-04 11:04:20 UTC
It looks like this is a regression between OSP 14 and OSP 15

Comment 5 Jakub Libosvar 2020-02-04 11:10:24 UTC
It comes from a bug in podman: https://github.com/containers/libpod/issues/1703 fixed upstream a year ago: https://github.com/containers/libpod/pull/2273

Comment 6 Itzik Brown 2020-02-04 11:29:48 UTC
The setup is OSP16, podman 1.6.4.
In my case it's a Hybrid setup (Controllers and undercloud as vms on one Bare metal and the compute is Bare metal).

Comment 7 Eran Kuris 2020-02-04 11:31:49 UTC
(In reply to Jakub Libosvar from comment #5)
> It comes from a bug in podman:
> https://github.com/containers/libpod/issues/1703 fixed upstream a year ago:
> https://github.com/containers/libpod/pull/2273

so if this is the case  we can consider it as not a blocker?
According to Edu & roman testing on OVN virt deployment without openshist the issue does not reproduce

Comment 8 Jakub Libosvar 2020-02-04 11:50:42 UTC
(In reply to Eran Kuris from comment #7)
> (In reply to Jakub Libosvar from comment #5)
> > It comes from a bug in podman:
> > https://github.com/containers/libpod/issues/1703 fixed upstream a year ago:
> > https://github.com/containers/libpod/pull/2273
> 
> so if this is the case  we can consider it as not a blocker?
> According to Edu & roman testing on OVN virt deployment without openshist
> the issue does not reproduce

I was able to reproduce it on Edu's virtual environment without shiftonstack. I'd keep the blocker flag for now until we know more information and the impact. Also I checked the upstream podman 1.6.0 contains the fix from comment 5 but still the Created status happens even with podman-1.6.4-2.module+el8.1.1+5363+bf8ff1af.x86_64

Comment 9 Jakub Libosvar 2020-02-04 12:08:54 UTC
I have a fix for this, I'll send it upstream soon.

Comment 14 Jakub Libosvar 2020-02-05 12:33:01 UTC
Merged to Train

Comment 18 Eduardo Olivares 2020-02-18 09:40:37 UTC
Verified on:
puddle: RHOS_TRUNK-16.0-RHEL-8-20200213.n.1
rpm: puppet-tripleo-11.4.1-0.20200205150840.71ff36d.el8ost

1- create network, subnet and VM instances
2- check VMs running on compute nodes
3- check neutron-haproxy-ovnmeta container running on computes, too
4- hard reboot one compute node: virsh destroy + virsh start
5- start VMs from the rebooted compute node (openstack server start <vm-name>)
6- check neutron-haproxy-ovnmeta correctly started, no error logs at ovn-metadata-agent.log and VMs connectivity is correct

I repeated this procedure several times successfully, since the original issue was not always reproduced.

Comment 20 errata-xmlrpc 2020-03-03 09:45:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0655


Note You need to log in before you can comment on or make changes to this bug.