Bug 2152711

Summary: [RHOS-17] Nova-conductor container service continuously restarts if it cannot reach keystone
Product: Red Hat OpenStack Reporter: Artom Lifshitz <alifshit>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17.1 (Wallaby)CC: dasmith, eglynn, jhakimra, kchamart, mwitt, sbauza, sgordon, vromanso
Target Milestone: z1   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-04 03:09:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artom Lifshitz 2022-12-12 18:59:40 UTC
This bug was initially created as a copy of Bug #2109646

I am copying this bug because: 

Need to fix in 17.1.

Description of problem:  The nova-conductor container will continuously restart if access to keystone fails.  While access is important for the service, the service should not be restarting indefinitely if access is not possible.

# Tested in multi-cell env:
[root@cell1-controller-0 heat-admin]# podman ps -a | grep nova_conductor
WARN[0000]  binary not found, container dns will not be enabled 
3661ce0e351f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-nova-conductor:17.0_20220714.1   /bin/bash -c chow...  2 hours ago  Exited (0) 2 hours ago                             nova_conductor_init_log
6714ba9d5a69  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-nova-conductor:17.0_20220714.1   kolla_start           2 hours ago  Up Less than a second ago (unhealthy)              nova_conductor

[root@cell1-controller-0 heat-admin]# podman logs nova_conductor (continuously restarts)
...
+ exec /usr/bin/nova-conductor
+ sudo -E kolla_set_configs
sudo: unable to send audit message: Operation not permitted
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf
INFO:__main__:Writing out command to execute
INFO:__main__:Setting permission for /var/log/nova
INFO:__main__:Setting permission for /var/log/nova/nova-manage.log
INFO:__main__:Setting permission for /var/log/nova/nova-conductor.log
INFO:__main__:Setting permission for /var/log/nova/nova-novncproxy.log
INFO:__main__:Setting permission for /var/log/nova/nova-metadata-api.log
INFO:__main__:Setting permission for /var/log/nova/nova-conductor.log.1
++ cat /run_command
+ CMD='/usr/bin/nova-conductor '
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/usr/bin/nova-conductor '\'''
Running command: '/usr/bin/nova-conductor '
+ exec /usr/bin/nova-conductor

# From /var/log/containers/nova/nova-conductor.log
2022-07-20 18:35:24.376 2 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting https://overcloud.internalapi.redhat.local:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://overcloud.internalapi.redhat.local:5000: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-07-20 18:35:24.379 2 CRITICAL nova [-] Unhandled error: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Unable to establish connection to https://overcloud.internalapi.redhat.local:5000: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn
2022-07-20 18:35:24.379 2 ERROR nova     conn = connection.create_connection(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 96, in create_connection
2022-07-20 18:35:24.379 2 ERROR nova     raise err
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 86, in create_connection
2022-07-20 18:35:24.379 2 ERROR nova     sock.connect(sa)
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 253, in connect
2022-07-20 18:35:24.379 2 ERROR nova     socket_checkerr(fd)
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr
2022-07-20 18:35:24.379 2 ERROR nova     raise socket.error(err, errno.errorcode[err])
2022-07-20 18:35:24.379 2 ERROR nova ConnectionRefusedError: [Errno 111] ECONNREFUSED
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova During handling of the above exception, another exception occurred:
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
2022-07-20 18:35:24.379 2 ERROR nova     httplib_response = self._make_request(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 382, in _make_request
2022-07-20 18:35:24.379 2 ERROR nova     self._validate_conn(conn)
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
2022-07-20 18:35:24.379 2 ERROR nova     conn.connect()
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 353, in connect
2022-07-20 18:35:24.379 2 ERROR nova     conn = self._new_conn()
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 181, in _new_conn
2022-07-20 18:35:24.379 2 ERROR nova     raise NewConnectionError(
2022-07-20 18:35:24.379 2 ERROR nova urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova During handling of the above exception, another exception occurred:
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
2022-07-20 18:35:24.379 2 ERROR nova     resp = conn.urlopen(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
2022-07-20 18:35:24.379 2 ERROR nova     retries = retries.increment(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment
2022-07-20 18:35:24.379 2 ERROR nova     raise MaxRetryError(_pool, url, error or ResponseError(cause))
2022-07-20 18:35:24.379 2 ERROR nova urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))

Version-Release number of selected component (if applicable):
RHOS-17

How reproducible:
When keystone access fails this happens

Steps to Reproduce:
1. This is from debugging [1], should be possible by breaking networking access to keystone for nova-conductor
2.
3.

Actual results:
When access to keystone fails the nova-conductor container restarts

Expected results:
When access to keystone fails, it should retry connections but not have the entire service restart


Additional info:
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2089512

Comment 2 melanie witt 2023-01-04 03:09:33 UTC

*** This bug has been marked as a duplicate of bug 2129207 ***