Bug 2231893 - Networking-ovn-metadata-agent keeps restarting when some haproxy service can't be started
Summary: Networking-ovn-metadata-agent keeps restarting when some haproxy service can'...
Keywords:
Status: CLOSED DUPLICATE of bug 2229779
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 17.1
Assignee: Miro Tomaska
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-14 14:51 UTC by Ihar Hrachyshka
Modified: 2024-01-08 12:17 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
The metadata service can become unavailable after the metadata agent fails in multiple attempts to start a malfunctioning HAProxy child container. The metadata agent logs an error message similar to: `ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-<uuid>”. + *Workaround:* Run `podman kill <_container name_>` to stop the problematic haproxy child container.
Clone Of:
Environment:
Last Closed: 2023-10-11 17:26:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 890986 0 None NEW OVN Metadata handle process exceptions 2023-09-05 08:11:52 UTC
Red Hat Issue Tracker OSP-27481 0 None None None 2023-08-14 15:17:44 UTC

Description Ihar Hrachyshka 2023-08-14 14:51:45 UTC
This bug was initially created as a copy of Bug #2228822

I am copying this bug because: 

We'll need to track the fix in 17.1 too.

It was found in the tobiko job: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/test_results/tobiko_gate_1/tobiko_gate_1_03_faults_faults.html where some tests failed as metadata service wasn't reachable after networking-ovn-metadata-agent was restarted.

Looking at logs of the networking-ovn-metadata-agent it looks that after it was stopped and then started again (around 2023-07-24 00:33:33.105 in file: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/compute-0/var/log/containers/neutron/ovn-metadata-agent.log.gz) it started restarting all the time after hitting error:

2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DelPortCommand(port=tap9043e251-90, bridge=br-ex, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): DelPortCommand(port=tap9043e251-90, bridge=br-tenant, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.937 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:127
2023-07-24 00:33:34.938 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=br-int, port=tap9043e251-90, may_exist=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.943 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DbSetCommand(table=Interface, record=tap9043e251-90, col_values=(('external_ids', {'iface-id': 'd3f56cd5-a022-4046-b440-9c5cc7fab3b4'}),)) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.949 38604 DEBUG networking_ovn.agent.metadata.driver [-] haproxy_cfg = 
global
    log         /dev/log local0 debug
    user        neutron
    group       neutron
    maxconn     1024
    pidfile     /var/lib/neutron/external/pids/9043e251-95bd-404c-b018-16f06156d353.pid.haproxy
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    option http-server-close
    option forwardfor
    retries                 3
    timeout http-request    30s
    timeout connect         30s
    timeout client          32s
    timeout server          32s
    timeout http-keep-alive 30s

listen listener
    bind 169.254.169.254:80
    server metadata /var/lib/neutron/metadata_proxy
    http-request add-header X-OVN-Network-ID 9043e251-95bd-404c-b018-16f06156d353
 create_config_file /usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py:151
2023-07-24 00:33:34.950 38604 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'ovnmeta-9043e251-95bd-404c-b018-16f06156d353', 'haproxy', '-f', '/var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'] create_process /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:88
2023-07-24 00:33:35.388 38604 ERROR neutron.agent.linux.utils [-] Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use

2023-07-24 00:33:35.389 38604 CRITICAL neutron [-] Unhandled error: neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron Traceback (most recent call last):
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/bin/networking-ovn-metadata-agent", line 10, in <module>
2023-07-24 00:33:35.389 38604 ERROR neutron     sys.exit(main())
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/cmd/eventlet/agents/metadata.py", line 17, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     metadata_agent.main()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata_agent.py", line 42, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     agt.start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 608, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     super().start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 254, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     self.sync()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 65, in wrapped
2023-07-24 00:33:35.389 38604 ERROR neutron     return f(*args, **kwargs)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 334, in sync
2023-07-24 00:33:35.389 38604 ERROR neutron     self.provision_datapath(datapath)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 603, in provision_datapath
2023-07-24 00:33:35.389 38604 ERROR neutron     network_id=net_name)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 238, in spawn_monitored_metadata_proxy
2023-07-24 00:33:35.389 38604 ERROR neutron     pm.enable()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 90, in enable
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=self.run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 743, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 148, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     returncode=returncode)
2023-07-24 00:33:35.389 38604 ERROR neutron neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron ; Stderr: + export DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ ip netns identify
2023-07-24 00:33:35.389 38604 ERROR neutron + NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
2023-07-24 00:33:35.389 38604 ERROR neutron + CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
2023-07-24 00:33:35.389 38604 ERROR neutron + LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
2023-07-24 00:33:35.389 38604 ERROR neutron + CMD='$HAPROXY'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ awk '{print $1}'
2023-07-24 00:33:35.389 38604 ERROR neutron + LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ grep -E ':(Exited|Created)'
2023-07-24 00:33:35.389 38604 ERROR neutron + ORPHANTS=
2023-07-24 00:33:35.389 38604 ERROR neutron + '[' -n '' ']'
2023-07-24 00:33:35.389 38604 ERROR neutron + printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron + grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
2023-07-24 00:33:35.389 38604 ERROR neutron + echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
2023-07-24 00:33:35.389 38604 ERROR neutron + nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.644 38631 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2023-07-24 00:33:35.644 38630 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting

Comment 10 Miro Tomaska 2023-10-11 17:26:37 UTC

*** This bug has been marked as a duplicate of bug 2229779 ***


Note You need to log in before you can comment on or make changes to this bug.