Bug 2228822

Summary: Networking-ovn-metadata-agent keeps restarting when some haproxy service can't be started
Product: Red Hat OpenStack Reporter: Slawek Kaplonski <skaplons>
Component: python-networking-ovnAssignee: Miro Tomaska <mtomaska>
Status: CLOSED CURRENTRELEASE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: apevec, ihrachys, lhh, majopela, mariel, mburns, mtomaska, scohen, tvignaud
Target Milestone: z6Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)Flags: mtomaska: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-09-12 19:31:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Slawek Kaplonski 2023-08-03 11:01:18 UTC
It was found in the tobiko job: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/test_results/tobiko_gate_1/tobiko_gate_1_03_faults_faults.html where some tests failed as metadata service wasn't reachable after networking-ovn-metadata-agent was restarted.

Looking at logs of the networking-ovn-metadata-agent it looks that after it was stopped and then started again (around 2023-07-24 00:33:33.105 in file: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/compute-0/var/log/containers/neutron/ovn-metadata-agent.log.gz) it started restarting all the time after hitting error:

2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DelPortCommand(port=tap9043e251-90, bridge=br-ex, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): DelPortCommand(port=tap9043e251-90, bridge=br-tenant, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.937 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:127
2023-07-24 00:33:34.938 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=br-int, port=tap9043e251-90, may_exist=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.943 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DbSetCommand(table=Interface, record=tap9043e251-90, col_values=(('external_ids', {'iface-id': 'd3f56cd5-a022-4046-b440-9c5cc7fab3b4'}),)) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.949 38604 DEBUG networking_ovn.agent.metadata.driver [-] haproxy_cfg = 
global
    log         /dev/log local0 debug
    user        neutron
    group       neutron
    maxconn     1024
    pidfile     /var/lib/neutron/external/pids/9043e251-95bd-404c-b018-16f06156d353.pid.haproxy
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    option http-server-close
    option forwardfor
    retries                 3
    timeout http-request    30s
    timeout connect         30s
    timeout client          32s
    timeout server          32s
    timeout http-keep-alive 30s

listen listener
    bind 169.254.169.254:80
    server metadata /var/lib/neutron/metadata_proxy
    http-request add-header X-OVN-Network-ID 9043e251-95bd-404c-b018-16f06156d353
 create_config_file /usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py:151
2023-07-24 00:33:34.950 38604 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'ovnmeta-9043e251-95bd-404c-b018-16f06156d353', 'haproxy', '-f', '/var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'] create_process /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:88
2023-07-24 00:33:35.388 38604 ERROR neutron.agent.linux.utils [-] Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use

2023-07-24 00:33:35.389 38604 CRITICAL neutron [-] Unhandled error: neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron Traceback (most recent call last):
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/bin/networking-ovn-metadata-agent", line 10, in <module>
2023-07-24 00:33:35.389 38604 ERROR neutron     sys.exit(main())
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/cmd/eventlet/agents/metadata.py", line 17, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     metadata_agent.main()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata_agent.py", line 42, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     agt.start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 608, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     super().start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 254, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     self.sync()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 65, in wrapped
2023-07-24 00:33:35.389 38604 ERROR neutron     return f(*args, **kwargs)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 334, in sync
2023-07-24 00:33:35.389 38604 ERROR neutron     self.provision_datapath(datapath)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 603, in provision_datapath
2023-07-24 00:33:35.389 38604 ERROR neutron     network_id=net_name)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 238, in spawn_monitored_metadata_proxy
2023-07-24 00:33:35.389 38604 ERROR neutron     pm.enable()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 90, in enable
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=self.run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 743, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 148, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     returncode=returncode)
2023-07-24 00:33:35.389 38604 ERROR neutron neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron ; Stderr: + export DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ ip netns identify
2023-07-24 00:33:35.389 38604 ERROR neutron + NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
2023-07-24 00:33:35.389 38604 ERROR neutron + CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
2023-07-24 00:33:35.389 38604 ERROR neutron + LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
2023-07-24 00:33:35.389 38604 ERROR neutron + CMD='$HAPROXY'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ awk '{print $1}'
2023-07-24 00:33:35.389 38604 ERROR neutron + LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ grep -E ':(Exited|Created)'
2023-07-24 00:33:35.389 38604 ERROR neutron + ORPHANTS=
2023-07-24 00:33:35.389 38604 ERROR neutron + '[' -n '' ']'
2023-07-24 00:33:35.389 38604 ERROR neutron + printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron + grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
2023-07-24 00:33:35.389 38604 ERROR neutron + echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
2023-07-24 00:33:35.389 38604 ERROR neutron + nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.644 38631 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2023-07-24 00:33:35.644 38630 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting

Comment 1 Ihar Hrachyshka 2023-08-07 15:22:33 UTC
The metadata agent should be more robust in handling the exception when the container exists more gracefully.