Bug 2228822 - Networking-ovn-metadata-agent keeps restarting when some haproxy service can't be started
Summary: Networking-ovn-metadata-agent keeps restarting when some haproxy service can'...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z6
: 16.2 (Train on RHEL 8.4)
Assignee: Miro Tomaska
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-03 11:01 UTC by Slawek Kaplonski
Modified: 2024-09-12 19:31 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-09-12 19:31:10 UTC
Target Upstream Version:
Embargoed:
mtomaska: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 890986 0 None MERGED Metadata: handle process exceptions 2024-09-12 19:26:40 UTC
Red Hat Issue Tracker OSP-27171 0 None None None 2023-08-03 11:01:54 UTC

Description Slawek Kaplonski 2023-08-03 11:01:18 UTC
It was found in the tobiko job: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/test_results/tobiko_gate_1/tobiko_gate_1_03_faults_faults.html where some tests failed as metadata service wasn't reachable after networking-ovn-metadata-agent was restarted.

Looking at logs of the networking-ovn-metadata-agent it looks that after it was stopped and then started again (around 2023-07-24 00:33:33.105 in file: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/compute-0/var/log/containers/neutron/ovn-metadata-agent.log.gz) it started restarting all the time after hitting error:

2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DelPortCommand(port=tap9043e251-90, bridge=br-ex, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): DelPortCommand(port=tap9043e251-90, bridge=br-tenant, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.937 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:127
2023-07-24 00:33:34.938 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=br-int, port=tap9043e251-90, may_exist=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.943 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DbSetCommand(table=Interface, record=tap9043e251-90, col_values=(('external_ids', {'iface-id': 'd3f56cd5-a022-4046-b440-9c5cc7fab3b4'}),)) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.949 38604 DEBUG networking_ovn.agent.metadata.driver [-] haproxy_cfg = 
global
    log         /dev/log local0 debug
    user        neutron
    group       neutron
    maxconn     1024
    pidfile     /var/lib/neutron/external/pids/9043e251-95bd-404c-b018-16f06156d353.pid.haproxy
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    option http-server-close
    option forwardfor
    retries                 3
    timeout http-request    30s
    timeout connect         30s
    timeout client          32s
    timeout server          32s
    timeout http-keep-alive 30s

listen listener
    bind 169.254.169.254:80
    server metadata /var/lib/neutron/metadata_proxy
    http-request add-header X-OVN-Network-ID 9043e251-95bd-404c-b018-16f06156d353
 create_config_file /usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py:151
2023-07-24 00:33:34.950 38604 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'ovnmeta-9043e251-95bd-404c-b018-16f06156d353', 'haproxy', '-f', '/var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'] create_process /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:88
2023-07-24 00:33:35.388 38604 ERROR neutron.agent.linux.utils [-] Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use

2023-07-24 00:33:35.389 38604 CRITICAL neutron [-] Unhandled error: neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron Traceback (most recent call last):
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/bin/networking-ovn-metadata-agent", line 10, in <module>
2023-07-24 00:33:35.389 38604 ERROR neutron     sys.exit(main())
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/cmd/eventlet/agents/metadata.py", line 17, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     metadata_agent.main()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata_agent.py", line 42, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     agt.start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 608, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     super().start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 254, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     self.sync()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 65, in wrapped
2023-07-24 00:33:35.389 38604 ERROR neutron     return f(*args, **kwargs)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 334, in sync
2023-07-24 00:33:35.389 38604 ERROR neutron     self.provision_datapath(datapath)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 603, in provision_datapath
2023-07-24 00:33:35.389 38604 ERROR neutron     network_id=net_name)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 238, in spawn_monitored_metadata_proxy
2023-07-24 00:33:35.389 38604 ERROR neutron     pm.enable()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 90, in enable
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=self.run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 743, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 148, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     returncode=returncode)
2023-07-24 00:33:35.389 38604 ERROR neutron neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron ; Stderr: + export DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ ip netns identify
2023-07-24 00:33:35.389 38604 ERROR neutron + NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
2023-07-24 00:33:35.389 38604 ERROR neutron + CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
2023-07-24 00:33:35.389 38604 ERROR neutron + LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
2023-07-24 00:33:35.389 38604 ERROR neutron + CMD='$HAPROXY'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ awk '{print $1}'
2023-07-24 00:33:35.389 38604 ERROR neutron + LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ grep -E ':(Exited|Created)'
2023-07-24 00:33:35.389 38604 ERROR neutron + ORPHANTS=
2023-07-24 00:33:35.389 38604 ERROR neutron + '[' -n '' ']'
2023-07-24 00:33:35.389 38604 ERROR neutron + printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron + grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
2023-07-24 00:33:35.389 38604 ERROR neutron + echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
2023-07-24 00:33:35.389 38604 ERROR neutron + nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.644 38631 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2023-07-24 00:33:35.644 38630 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting

Comment 1 Ihar Hrachyshka 2023-08-07 15:22:33 UTC
The metadata agent should be more robust in handling the exception when the container exists more gracefully.


Note You need to log in before you can comment on or make changes to this bug.