Bug 2229779 - Networking-ovn-metadata-agent keeps restarting when some haproxy service can't be started
Summary: Networking-ovn-metadata-agent keeps restarting when some haproxy service can'...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: z3
: 17.1
Assignee: Miro Tomaska
QA Contact: Elvira
URL:
Whiteboard:
: 2231893 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-07 16:15 UTC by Ihar Hrachyshka
Modified: 2024-05-22 20:39 UTC (History)
9 users (show)

Fixed In Version: openstack-neutron-18.6.1-17.1.20231025110810.el9ost
Doc Type: Bug Fix
Doc Text:
This update improved the OVN metadata agent process exception handling in an event when a child process is not able to start. With this update, the child process exception is logged and the main OVN metadata agent continues running.
Clone Of:
Environment:
Last Closed: 2024-05-22 20:39:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 890986 0 None MERGED Metadata: handle process exceptions 2024-04-16 13:10:20 UTC
Red Hat Issue Tracker OSP-27270 0 None None None 2023-08-07 16:17:06 UTC
Red Hat Product Errata RHBA-2024:2741 0 None None None 2024-05-22 20:39:30 UTC

Description Ihar Hrachyshka 2023-08-07 16:15:11 UTC
This bug was initially created as a copy of Bug #2228822

I am copying this bug because: we'll need to track the fix in all releases.


It was found in the tobiko job: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/test_results/tobiko_gate_1/tobiko_gate_1_03_faults_faults.html where some tests failed as metadata service wasn't reachable after networking-ovn-metadata-agent was restarted.

Looking at logs of the networking-ovn-metadata-agent it looks that after it was stopped and then started again (around 2023-07-24 00:33:33.105 in file: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.2_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/775/compute-0/var/log/containers/neutron/ovn-metadata-agent.log.gz) it started restarting all the time after hitting error:

2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DelPortCommand(port=tap9043e251-90, bridge=br-ex, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.936 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): DelPortCommand(port=tap9043e251-90, bridge=br-tenant, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.937 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:127
2023-07-24 00:33:34.938 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=br-int, port=tap9043e251-90, may_exist=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.943 38604 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DbSetCommand(table=Interface, record=tap9043e251-90, col_values=(('external_ids', {'iface-id': 'd3f56cd5-a022-4046-b440-9c5cc7fab3b4'}),)) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:88
2023-07-24 00:33:34.949 38604 DEBUG networking_ovn.agent.metadata.driver [-] haproxy_cfg = 
global
    log         /dev/log local0 debug
    user        neutron
    group       neutron
    maxconn     1024
    pidfile     /var/lib/neutron/external/pids/9043e251-95bd-404c-b018-16f06156d353.pid.haproxy
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    option http-server-close
    option forwardfor
    retries                 3
    timeout http-request    30s
    timeout connect         30s
    timeout client          32s
    timeout server          32s
    timeout http-keep-alive 30s

listen listener
    bind 169.254.169.254:80
    server metadata /var/lib/neutron/metadata_proxy
    http-request add-header X-OVN-Network-ID 9043e251-95bd-404c-b018-16f06156d353
 create_config_file /usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py:151
2023-07-24 00:33:34.950 38604 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'ovnmeta-9043e251-95bd-404c-b018-16f06156d353', 'haproxy', '-f', '/var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'] create_process /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:88
2023-07-24 00:33:35.388 38604 ERROR neutron.agent.linux.utils [-] Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use

2023-07-24 00:33:35.389 38604 CRITICAL neutron [-] Unhandled error: neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
++ ip netns identify
+ NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
+ HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
+ CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
+ CMD='$HAPROXY'
++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
+ grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
+ echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
+ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron Traceback (most recent call last):
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/bin/networking-ovn-metadata-agent", line 10, in <module>
2023-07-24 00:33:35.389 38604 ERROR neutron     sys.exit(main())
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/cmd/eventlet/agents/metadata.py", line 17, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     metadata_agent.main()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata_agent.py", line 42, in main
2023-07-24 00:33:35.389 38604 ERROR neutron     agt.start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 608, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     super().start()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 254, in start
2023-07-24 00:33:35.389 38604 ERROR neutron     self.sync()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 65, in wrapped
2023-07-24 00:33:35.389 38604 ERROR neutron     return f(*args, **kwargs)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 334, in sync
2023-07-24 00:33:35.389 38604 ERROR neutron     self.provision_datapath(datapath)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 603, in provision_datapath
2023-07-24 00:33:35.389 38604 ERROR neutron     network_id=net_name)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 238, in spawn_monitored_metadata_proxy
2023-07-24 00:33:35.389 38604 ERROR neutron     pm.enable()
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 90, in enable
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=self.run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 743, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     run_as_root=run_as_root)
2023-07-24 00:33:35.389 38604 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 148, in execute
2023-07-24 00:33:35.389 38604 ERROR neutron     returncode=returncode)
2023-07-24 00:33:35.389 38604 ERROR neutron neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron ; Stderr: + export DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + DOCKER_HOST=
2023-07-24 00:33:35.389 38604 ERROR neutron + ARGS='-f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ ip netns identify
2023-07-24 00:33:35.389 38604 ERROR neutron + NETNS=ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + NAME=neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353
2023-07-24 00:33:35.389 38604 ERROR neutron + HAPROXY_CMD='$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)'
2023-07-24 00:33:35.389 38604 ERROR neutron + CLI='nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman'
2023-07-24 00:33:35.389 38604 ERROR neutron + LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log'
2023-07-24 00:33:35.389 38604 ERROR neutron + CMD='$HAPROXY'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-haproxy- --format '{{.ID}}:{{.Names}}:{{.Status}}'
2023-07-24 00:33:35.389 38604 ERROR neutron ++ awk '{print $1}'
2023-07-24 00:33:35.389 38604 ERROR neutron + LIST=5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron ++ grep -E ':(Exited|Created)'
2023-07-24 00:33:35.389 38604 ERROR neutron + ORPHANTS=
2023-07-24 00:33:35.389 38604 ERROR neutron + '[' -n '' ']'
2023-07-24 00:33:35.389 38604 ERROR neutron + printf '%s\n' 5ad5b68dac8c:neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353:stopping
2023-07-24 00:33:35.389 38604 ERROR neutron + grep -q 'neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353$'
2023-07-24 00:33:35.389 38604 ERROR neutron + echo 'Starting a new child container neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353'
2023-07-24 00:33:35.389 38604 ERROR neutron + nsenter --net=/run/netns/ovnmeta-9043e251-95bd-404c-b018-16f06156d353 --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20230526.1 /bin/bash -c 'HAPROXY="$(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)"; exec $HAPROXY -f /var/lib/neutron/ovn-metadata-proxy/9043e251-95bd-404c-b018-16f06156d353.conf'
2023-07-24 00:33:35.389 38604 ERROR neutron Error: error creating container storage: the container name "neutron-haproxy-ovnmeta-9043e251-95bd-404c-b018-16f06156d353" is already in use by "5ad5b68dac8c3ff8988b399e462fb222c352f2c00714f4301475586c23bdb07d". You have to remove that container to be able to reuse that name.: that name is already in use
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.389 38604 ERROR neutron 
2023-07-24 00:33:35.644 38631 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2023-07-24 00:33:35.644 38630 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting

Comment 7 Miro Tomaska 2023-10-11 17:26:37 UTC
*** Bug 2231893 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2024-05-22 20:39:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.3 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:2741


Note You need to log in before you can comment on or make changes to this bug.