Description of problem: The nova-conductor container will continuously restart if access to keystone fails. While access is important for the service, the service should not be restarting indefinitely if access is not possible.
# Tested in multi-cell env:
[root@cell1-controller-0 heat-admin]# podman ps -a | grep nova_conductor
WARN[0000] binary not found, container dns will not be enabled
3661ce0e351f undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-nova-conductor:17.0_20220714.1 /bin/bash -c chow... 2 hours ago Exited (0) 2 hours ago nova_conductor_init_log
6714ba9d5a69 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-nova-conductor:17.0_20220714.1 kolla_start 2 hours ago Up Less than a second ago (unhealthy) nova_conductor
[root@cell1-controller-0 heat-admin]# podman logs nova_conductor (continuously restarts)
...
+ exec /usr/bin/nova-conductor
+ sudo -E kolla_set_configs
sudo: unable to send audit message: Operation not permitted
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf
INFO:__main__:Writing out command to execute
INFO:__main__:Setting permission for /var/log/nova
INFO:__main__:Setting permission for /var/log/nova/nova-manage.log
INFO:__main__:Setting permission for /var/log/nova/nova-conductor.log
INFO:__main__:Setting permission for /var/log/nova/nova-novncproxy.log
INFO:__main__:Setting permission for /var/log/nova/nova-metadata-api.log
INFO:__main__:Setting permission for /var/log/nova/nova-conductor.log.1
++ cat /run_command
+ CMD='/usr/bin/nova-conductor '
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/usr/bin/nova-conductor '\'''
Running command: '/usr/bin/nova-conductor '
+ exec /usr/bin/nova-conductor
# From /var/log/containers/nova/nova-conductor.log
2022-07-20 18:35:24.376 2 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting https://overcloud.internalapi.redhat.local:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://overcloud.internalapi.redhat.local:5000: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-07-20 18:35:24.379 2 CRITICAL nova [-] Unhandled error: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Unable to establish connection to https://overcloud.internalapi.redhat.local:5000: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn
2022-07-20 18:35:24.379 2 ERROR nova conn = connection.create_connection(
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 96, in create_connection
2022-07-20 18:35:24.379 2 ERROR nova raise err
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 86, in create_connection
2022-07-20 18:35:24.379 2 ERROR nova sock.connect(sa)
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 253, in connect
2022-07-20 18:35:24.379 2 ERROR nova socket_checkerr(fd)
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr
2022-07-20 18:35:24.379 2 ERROR nova raise socket.error(err, errno.errorcode[err])
2022-07-20 18:35:24.379 2 ERROR nova ConnectionRefusedError: [Errno 111] ECONNREFUSED
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova During handling of the above exception, another exception occurred:
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
2022-07-20 18:35:24.379 2 ERROR nova httplib_response = self._make_request(
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 382, in _make_request
2022-07-20 18:35:24.379 2 ERROR nova self._validate_conn(conn)
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
2022-07-20 18:35:24.379 2 ERROR nova conn.connect()
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 353, in connect
2022-07-20 18:35:24.379 2 ERROR nova conn = self._new_conn()
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 181, in _new_conn
2022-07-20 18:35:24.379 2 ERROR nova raise NewConnectionError(
2022-07-20 18:35:24.379 2 ERROR nova urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova During handling of the above exception, another exception occurred:
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
2022-07-20 18:35:24.379 2 ERROR nova resp = conn.urlopen(
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
2022-07-20 18:35:24.379 2 ERROR nova retries = retries.increment(
2022-07-20 18:35:24.379 2 ERROR nova File "/usr/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment
2022-07-20 18:35:24.379 2 ERROR nova raise MaxRetryError(_pool, url, error or ResponseError(cause))
2022-07-20 18:35:24.379 2 ERROR nova urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
Version-Release number of selected component (if applicable):
RHOS-17
How reproducible:
When keystone access fails this happens
Steps to Reproduce:
1. This is from debugging [1], should be possible by breaking networking access to keystone for nova-conductor
2.
3.
Actual results:
When access to keystone fails the nova-conductor container restarts
Expected results:
When access to keystone fails, it should retry connections but not have the entire service restart
Additional info:
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2089512
We're blockers-only for 17.0.1.
Copying to 17.1 at BZ 2152711 to make sure the upstream stable/wallaby backport at [1] makes it into 17.1.
As for 17.0, if there's ever a z2 or a customer asks for it, we can merge and build. Leaving ON_DEV for now.
[1] https://review.opendev.org/c/openstack/nova/+/859002