Introspection in OC gets stuck with ironic composable role. Environment: python-ironic-lib-2.14.0-0.20180810074837.344161b.el7ost.noarch puppet-ironic-13.3.1-0.20181013115248.d85f830.el7ost.noarch python2-ironicclient-2.5.0-0.20180810135843.fb94fb8.el7ost.noarch openstack-ironic-inspector-8.0.1-0.20180924215820.e89450c.el7ost.noarch Steps to reproduce: 1. Deploy OC with introspection enabled and ironic composable role using: - OS::TripleO::Services::IronicConductor - OS::TripleO::Services::IronicInspector - OS::TripleO::Services::IronicPxe 2. Try to introspect nodes in OC. Result: The introspection gets stuck, although the introspection image was booted into. On the booted node's console: "error Node state mismatch detected between the DB and the cached node_info object" Some errors in logs: 2018-11-01 21:46:48.570 1 ERROR ironic_inspector.process (table.description, len(records), rows)) 2018-11-01 21:46:48.570 1 ERROR ironic_inspector.process StaleDataError: UPDATE statement on table 'nodes' expected to update 1 row(s); 0 were matched. 2018-11-01 21:46:48.570 1 ERROR ironic_inspector.process 2018-11-01 21:46:48.578 1 ERROR ironic_inspector.utils [req-e730e8e1-4b87-48d6-8105-821433e55226 - - - - -] [node: e66ba519-c596-45bb-8b42-34968914d907 state error] Node state mismatch detected between the DB and the cached node_info object: NoResultFound: No row was found for one() 2018-11-01 21:46:48.581 1 INFO ironic_inspector.process [req-50132c8c-95c2-4c80-8e29-a060657498f2 - - - - -] [node: e66ba519-c596-45bb-8b42-34968914d907 state error MAC 52:54:00:6f:99:98] Ramdisk logs were stored in file e66ba519-c596-45bb-8b42-34968914d907_20181101-214648.580332.tar.gz 2018-11-01 21:46:48.582 1 ERROR ironic_inspector.utils [req-50132c8c-95c2-4c80-8e29-a060657498f2 - - - - -] [node: e66ba519-c596-45bb-8b42-34968914d907 state error MAC 52:54:00:6f:99:98] Unexpected exception ConnectionError during processing: HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b973f2b90>: Failed to establish a new connection: [Errno 110] ETIMEDOUT',)): ConnectionError: HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b973f2b90>: Failed to establish a new connection: [Errno 110] ETIMEDOUT',))
The introspection later fails with: error | Unexpected exception ConnectionError during processing: HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b973f2b90>: Failed to establish a new connection: [Errno 110] ETIMEDOUT',)) | | finished | True
Created attachment 1500262 [details] ironic logs
Is 172.17.3.12:8080 the correct Swift endpoint? Is it public or internal?
(overcloud) [stack@undercloud-0 ~]$ openstack catalog show swift +-----------+------------------------------------------------------------------------------+ | Field | Value | +-----------+------------------------------------------------------------------------------+ | endpoints | regionOne | | | public: https://10.0.0.101:13808/v1/AUTH_04039b5f242c403da838cea741856684 | | | regionOne | | | admin: http://172.17.3.12:8080 | | | regionOne | | | internal: http://172.17.3.12:8080/v1/AUTH_04039b5f242c403da838cea741856684 | | | | | id | 94f0bbc345fd44898f2a51a86b912c7f | | name | swift | | type | object-store | +-----------+------------------------------------------------------------------------------+
Seems like we either add Storage network on ironic nodes (nodes with ironic role) or document that OS::TripleO::Services::IronicInspector stays on controllers ?
By default we have valid interfaces internal and public: # List of interfaces, in order of preference, for endpoint URL. (list # value) #valid_interfaces = internal,public The auth URL uses the internal endpoint: https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ironic-inspector.yaml#L220 Why do we need the storage network? From the logs above it timed out connecting to the internal endpoint? HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector
(In reply to Harald Jensås from comment #6) > By default we have valid interfaces internal and public: > # List of interfaces, in order of preference, for endpoint URL. (list > # value) > #valid_interfaces = internal,public > > The auth URL uses the internal endpoint: > > https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/ > services/ironic-inspector.yaml#L220 > > > > Why do we need the storage network? > > From the logs above it timed out connecting to the internal endpoint? > HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded > with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector <sasha> hjensas: InternalApiNetCidr: 172.17.1.0/24, StorageNetCidr: 172.17.3.0/24 So the internal and admin endpoint is on the storage network. Adding the storage network to the role when it includes the Inspector is a good fix. The Keystone internal endpoint is used by default for ironic::inspector::swift_auth_url, overriding this to use the public endpoint might also work.
Okay, so we have three options: 1. Add storage network to the ironic-inspector node. 2. Change [swift]valid_interfaces to "public". 3. Set endpoint_override for swift explicitly. I'd prefer to avoid #3, since this is now how things are designed to work. Does anyone have opinions on which of #1 and #2 are preferred? Ramon? > Keystone internal endpoint is used by default for ironic::inspector::swift_auth_url, overriding this to use the public endpoint might also work I doubt it. I don't think they change the catalog based on which endpoint you use to hit it.
By the way, Ironic also access Swift with the direct deploy interface. So I think this problem is not limited to Inspector. Would it be enough to add Storage to https://github.com/openstack/tripleo-heat-templates/blob/master/roles/IronicConductor.yaml#L8? Sasha, do you know?
Adding the storage network to the role used by ironic node(s) and actually having a leg in that network on the respective host(s) is enough. Was able to successfully introspect nodes in OC.
We can adjust /usr/share/openstack-tripleo-heat-templates/roles/IronicConductor.yaml with: 1. adding The storage network 2. Adding '- OS::TripleO::Services::IronicInspector'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0446