Bug 1797892
| Summary: | neutron-haproxy-ovnmeta containers are not up after compute node restarted | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Itzik Brown <itbrown> | |
| Component: | puppet-tripleo | Assignee: | Jakub Libosvar <jlibosva> | |
| Status: | CLOSED ERRATA | QA Contact: | Eduardo Olivares <eolivare> | |
| Severity: | high | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 16.0 (Train) | CC: | akaris, amcleod, amuller, apevec, bcafarel, bperkins, gregraka, jjoyce, jlibosva, jschluet, lhh, lpeer, majopela, njohnston, pgrist, scohen, slinaber, tvignaud | |
| Target Milestone: | z1 | Keywords: | Triaged | |
| Target Release: | 16.0 (Train on RHEL 8.1) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | puppet-tripleo-11.4.1-0.20200205150840.71ff36d.el8ost | Doc Type: | Known Issue | |
| Doc Text: |
There is a known issue in Red Hat OpenStack Platform 16.0, when nodes that experience a hard shutdown put containers that were previously running into a `Created` state in podman when the node reboots.
As a workaround, you can run the following Ansible command to clean all haproxy containers in the `Created` state:
`ansible -b <nodes> -i /usr/bin/tripleo-ansible-inventory -m shell -a "podman ps -a --format {{'{{'}}.ID{{'}}'}} -f name=haproxy,status=created | xargs podman rm -f || :"`
Replace `<nodes>` with a single host from the inventory, a group of hosts, or `all`. After you run this command, the metadata-agent spawns a new container for the given network.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1843821 (view as bug list) | Environment: | ||
| Last Closed: | 2020-03-03 09:45:05 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1843821 | |||
|
Description
Itzik Brown
2020-02-04 07:36:24 UTC
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c'
+ nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:z,shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c" is already in use by "d2eaaa321e37a377e6c550204fb7823204f4438b55822b618510323b4f8f726f". You have to remove that container to be able to reuse that name.: that name is already in use
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event Traceback (most recent call last):
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/ovsdbapp/event.py", line 143, in notify_loop
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event match.run(event, row, updates)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 93, in run
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event self.agent.update_datapath(str(row.datapath.uuid))
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 303, in update_datapath
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event self.provision_datapath(datapath)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 417, in provision_datapath
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event self.conf, bind_address=METADATA_DEFAULT_IP, network_id=datapath)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 200, in spawn_monitored_metadata_proxy
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event pm.enable()
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 90, in enable
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event run_as_root=self.run_as_root)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 713, in execute
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event run_as_root=run_as_root)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event returncode=returncode)
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ; Stderr: + export DOCKER_HOST=
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + DOCKER_HOST=
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + ARGS='-f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ ip netns identify
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + NETNS=ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + NAME=neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + CLI='nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + CMD='$HAPROXY'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ awk '{print $1}'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + LIST='d2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ printf '%s\n' 'd2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event ++ grep :Exited
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + ORPHANTS=
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + '[' -n '' ']'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + printf '%s\n' 'd2eaaa321e37:neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c:Created
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event c3732bcd836a:neutron-haproxy-ovnmeta-8bff1973-cfb0-4422-b61e-1ee4d24d6398:Created'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + grep -q 'neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c$'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + echo 'Starting a new child container neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event + nsenter --net=/run/netns/ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:z,shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200124.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/f1af5172-627c-4e51-b1fd-5f6524e2876c.conf'
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-f1af5172-627c-4e51-b1fd-5f6524e2876c" is already in use by "d2eaaa321e37a377e6c550204fb7823204f4438b55822b618510323b4f8f726f". You have to remove that container to be able to reuse that name.: that name is already in use
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event
2020-02-04 07:31:41.625 4565 ERROR ovsdbapp.event
The wrapper script doesn't clean up containers in "Created" status, only "Exited":
ORPHANTS=$(printf "%s\n" "${LIST}" | grep ":Exited")
We can either grep for both :Exited and :Created. Or we can not spawn a new container if there is already one in Created and start the Created one instead.
The impact is that affected compute node won't be able to use metadata for already existing network. Meaning new instances won't have SSH keys. I'm putting a blocker flag to raise awareness. It looks like this is a regression between OSP 14 and OSP 15 It comes from a bug in podman: https://github.com/containers/libpod/issues/1703 fixed upstream a year ago: https://github.com/containers/libpod/pull/2273 The setup is OSP16, podman 1.6.4. In my case it's a Hybrid setup (Controllers and undercloud as vms on one Bare metal and the compute is Bare metal). (In reply to Jakub Libosvar from comment #5) > It comes from a bug in podman: > https://github.com/containers/libpod/issues/1703 fixed upstream a year ago: > https://github.com/containers/libpod/pull/2273 so if this is the case we can consider it as not a blocker? According to Edu & roman testing on OVN virt deployment without openshist the issue does not reproduce (In reply to Eran Kuris from comment #7) > (In reply to Jakub Libosvar from comment #5) > > It comes from a bug in podman: > > https://github.com/containers/libpod/issues/1703 fixed upstream a year ago: > > https://github.com/containers/libpod/pull/2273 > > so if this is the case we can consider it as not a blocker? > According to Edu & roman testing on OVN virt deployment without openshist > the issue does not reproduce I was able to reproduce it on Edu's virtual environment without shiftonstack. I'd keep the blocker flag for now until we know more information and the impact. Also I checked the upstream podman 1.6.0 contains the fix from comment 5 but still the Created status happens even with podman-1.6.4-2.module+el8.1.1+5363+bf8ff1af.x86_64 I have a fix for this, I'll send it upstream soon. Merged to Train Verified on: puddle: RHOS_TRUNK-16.0-RHEL-8-20200213.n.1 rpm: puppet-tripleo-11.4.1-0.20200205150840.71ff36d.el8ost 1- create network, subnet and VM instances 2- check VMs running on compute nodes 3- check neutron-haproxy-ovnmeta container running on computes, too 4- hard reboot one compute node: virsh destroy + virsh start 5- start VMs from the rebooted compute node (openstack server start <vm-name>) 6- check neutron-haproxy-ovnmeta correctly started, no error logs at ovn-metadata-agent.log and VMs connectivity is correct I repeated this procedure several times successfully, since the original issue was not always reproduced. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0655 |