Bug 2109646

Summary: [RHOS-17] Nova-conductor container service continuously restarts if it cannot reach keystone
Product: Red Hat OpenStack Reporter: James Parker <jparker>
Component: openstack-novaAssignee: melanie witt <mwitt>
Status: MODIFIED --- QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.0 (Wallaby)CC: alifshit, bdobreli, bgibizer, dasmith, eglynn, jhakimra, kchamart, mwitt, pweeks, sbauza, sgordon, vromanso
Target Milestone: z2Keywords: Patch, Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-23.2.2-0.20230126210326.7074ac0.el9osttrunk Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2129207 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2129207    
Bug Blocks:    

Description James Parker 2022-07-21 17:24:06 UTC
Description of problem:  The nova-conductor container will continuously restart if access to keystone fails.  While access is important for the service, the service should not be restarting indefinitely if access is not possible.

# Tested in multi-cell env:
[root@cell1-controller-0 heat-admin]# podman ps -a | grep nova_conductor
WARN[0000]  binary not found, container dns will not be enabled 
3661ce0e351f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-nova-conductor:17.0_20220714.1   /bin/bash -c chow...  2 hours ago  Exited (0) 2 hours ago                             nova_conductor_init_log
6714ba9d5a69  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-nova-conductor:17.0_20220714.1   kolla_start           2 hours ago  Up Less than a second ago (unhealthy)              nova_conductor

[root@cell1-controller-0 heat-admin]# podman logs nova_conductor (continuously restarts)
...
+ exec /usr/bin/nova-conductor
+ sudo -E kolla_set_configs
sudo: unable to send audit message: Operation not permitted
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf
INFO:__main__:Writing out command to execute
INFO:__main__:Setting permission for /var/log/nova
INFO:__main__:Setting permission for /var/log/nova/nova-manage.log
INFO:__main__:Setting permission for /var/log/nova/nova-conductor.log
INFO:__main__:Setting permission for /var/log/nova/nova-novncproxy.log
INFO:__main__:Setting permission for /var/log/nova/nova-metadata-api.log
INFO:__main__:Setting permission for /var/log/nova/nova-conductor.log.1
++ cat /run_command
+ CMD='/usr/bin/nova-conductor '
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/usr/bin/nova-conductor '\'''
Running command: '/usr/bin/nova-conductor '
+ exec /usr/bin/nova-conductor

# From /var/log/containers/nova/nova-conductor.log
2022-07-20 18:35:24.376 2 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting https://overcloud.internalapi.redhat.local:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://overcloud.internalapi.redhat.local:5000: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-07-20 18:35:24.379 2 CRITICAL nova [-] Unhandled error: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Unable to establish connection to https://overcloud.internalapi.redhat.local:5000: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn
2022-07-20 18:35:24.379 2 ERROR nova     conn = connection.create_connection(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 96, in create_connection
2022-07-20 18:35:24.379 2 ERROR nova     raise err
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 86, in create_connection
2022-07-20 18:35:24.379 2 ERROR nova     sock.connect(sa)
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 253, in connect
2022-07-20 18:35:24.379 2 ERROR nova     socket_checkerr(fd)
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr
2022-07-20 18:35:24.379 2 ERROR nova     raise socket.error(err, errno.errorcode[err])
2022-07-20 18:35:24.379 2 ERROR nova ConnectionRefusedError: [Errno 111] ECONNREFUSED
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova During handling of the above exception, another exception occurred:
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
2022-07-20 18:35:24.379 2 ERROR nova     httplib_response = self._make_request(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 382, in _make_request
2022-07-20 18:35:24.379 2 ERROR nova     self._validate_conn(conn)
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
2022-07-20 18:35:24.379 2 ERROR nova     conn.connect()
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 353, in connect
2022-07-20 18:35:24.379 2 ERROR nova     conn = self._new_conn()
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 181, in _new_conn
2022-07-20 18:35:24.379 2 ERROR nova     raise NewConnectionError(
2022-07-20 18:35:24.379 2 ERROR nova urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova During handling of the above exception, another exception occurred:
2022-07-20 18:35:24.379 2 ERROR nova
2022-07-20 18:35:24.379 2 ERROR nova Traceback (most recent call last):
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
2022-07-20 18:35:24.379 2 ERROR nova     resp = conn.urlopen(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
2022-07-20 18:35:24.379 2 ERROR nova     retries = retries.increment(
2022-07-20 18:35:24.379 2 ERROR nova   File "/usr/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment
2022-07-20 18:35:24.379 2 ERROR nova     raise MaxRetryError(_pool, url, error or ResponseError(cause))
2022-07-20 18:35:24.379 2 ERROR nova urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='overcloud.internalapi.redhat.local', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcf5088850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))

Version-Release number of selected component (if applicable):
RHOS-17

How reproducible:
When keystone access fails this happens

Steps to Reproduce:
1. This is from debugging [1], should be possible by breaking networking access to keystone for nova-conductor
2.
3.

Actual results:
When access to keystone fails the nova-conductor container restarts

Expected results:
When access to keystone fails, it should retry connections but not have the entire service restart


Additional info:
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2089512

Comment 14 Artom Lifshitz 2022-12-12 19:02:30 UTC
We're blockers-only for 17.0.1.

Copying to 17.1 at BZ 2152711 to make sure the upstream stable/wallaby backport at [1] makes it into 17.1.

As for 17.0, if there's ever a z2 or a customer asks for it, we can merge and build. Leaving ON_DEV for now.

[1] https://review.opendev.org/c/openstack/nova/+/859002

Comment 15 pweeks 2023-01-23 16:17:04 UTC
Removing DF, if you need our assistance please reach out.