Bug 1645307 - Introspection in OC fails with ironic composable role.
Summary: Introspection in OC fails with ironic composable role.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z1
: 14.0 (Rocky)
Assignee: Harald Jensås
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-01 21:57 UTC by Alexander Chuzhoy
Modified: 2019-03-18 13:03 UTC (History)
8 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154860.fe11ade.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-18 13:03:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ironic logs (70.04 KB, application/x-xz)
2018-11-01 22:05 UTC, Alexander Chuzhoy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 617987 0 None None None 2018-11-14 15:39:44 UTC
OpenStack gerrit 621456 0 None None None 2018-12-04 16:25:51 UTC
Red Hat Product Errata RHBA-2019:0446 0 None None None 2019-03-18 13:03:17 UTC

Description Alexander Chuzhoy 2018-11-01 21:57:20 UTC
Introspection in OC gets stuck with ironic composable role.

Environment:

python-ironic-lib-2.14.0-0.20180810074837.344161b.el7ost.noarch
puppet-ironic-13.3.1-0.20181013115248.d85f830.el7ost.noarch
python2-ironicclient-2.5.0-0.20180810135843.fb94fb8.el7ost.noarch
openstack-ironic-inspector-8.0.1-0.20180924215820.e89450c.el7ost.noarch


Steps to reproduce:
1. Deploy OC with introspection enabled and ironic composable role using:
    - OS::TripleO::Services::IronicConductor
    - OS::TripleO::Services::IronicInspector
    - OS::TripleO::Services::IronicPxe

2. Try to introspect nodes in OC.

Result:

The introspection gets stuck, although the introspection image was booted into.
On the booted node's console:
"error Node state mismatch detected between the DB and the cached node_info object"


Some errors in logs:
2018-11-01 21:46:48.570 1 ERROR ironic_inspector.process     (table.description, len(records), rows))
2018-11-01 21:46:48.570 1 ERROR ironic_inspector.process StaleDataError: UPDATE statement on table 'nodes' expected to update 1 row(s); 0 were matched.
2018-11-01 21:46:48.570 1 ERROR ironic_inspector.process 
2018-11-01 21:46:48.578 1 ERROR ironic_inspector.utils [req-e730e8e1-4b87-48d6-8105-821433e55226 - - - - -] [node: e66ba519-c596-45bb-8b42-34968914d907 state error] Node state mismatch detected between the DB and the cached node_info object: NoResultFound: No row was found for one()
2018-11-01 21:46:48.581 1 INFO ironic_inspector.process [req-50132c8c-95c2-4c80-8e29-a060657498f2 - - - - -] [node: e66ba519-c596-45bb-8b42-34968914d907 state error MAC 52:54:00:6f:99:98] Ramdisk logs were stored in file e66ba519-c596-45bb-8b42-34968914d907_20181101-214648.580332.tar.gz
2018-11-01 21:46:48.582 1 ERROR ironic_inspector.utils [req-50132c8c-95c2-4c80-8e29-a060657498f2 - - - - -] [node: e66ba519-c596-45bb-8b42-34968914d907 state error MAC 52:54:00:6f:99:98] Unexpected exception ConnectionError during processing: HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b973f2b90>: Failed to establish a new connection: [Errno 110] ETIMEDOUT',)): ConnectionError: HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b973f2b90>: Failed to establish a new connection: [Errno 110] ETIMEDOUT',))

Comment 1 Alexander Chuzhoy 2018-11-01 21:59:35 UTC
The introspection later fails with:
 
error       | Unexpected exception ConnectionError during processing: HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b973f2b90>: Failed to establish a new connection: [Errno 110] ETIMEDOUT',)) |
| finished    | True

Comment 2 Alexander Chuzhoy 2018-11-01 22:05:50 UTC
Created attachment 1500262 [details]
ironic logs

Comment 3 Dmitry Tantsur 2018-11-05 11:42:45 UTC
Is 172.17.3.12:8080 the correct Swift endpoint? Is it public or internal?

Comment 4 Alexander Chuzhoy 2018-11-05 17:00:16 UTC
(overcloud) [stack@undercloud-0 ~]$ openstack catalog show swift
+-----------+------------------------------------------------------------------------------+
| Field     | Value                                                                        |
+-----------+------------------------------------------------------------------------------+
| endpoints | regionOne                                                                    |
|           |   public: https://10.0.0.101:13808/v1/AUTH_04039b5f242c403da838cea741856684  |
|           | regionOne                                                                    |
|           |   admin: http://172.17.3.12:8080                                             |
|           | regionOne                                                                    |
|           |   internal: http://172.17.3.12:8080/v1/AUTH_04039b5f242c403da838cea741856684 |
|           |                                                                              |
| id        | 94f0bbc345fd44898f2a51a86b912c7f                                             |
| name      | swift                                                                        |
| type      | object-store                                                                 |
+-----------+------------------------------------------------------------------------------+

Comment 5 Alexander Chuzhoy 2018-11-05 17:09:54 UTC
Seems like we either add Storage network on ironic nodes (nodes with ironic role) or document that OS::TripleO::Services::IronicInspector stays on controllers ?

Comment 6 Harald Jensås 2018-11-06 16:56:14 UTC
By default we have valid interfaces internal and public:
  # List of interfaces, in order of preference, for endpoint URL. (list
  # value)
  #valid_interfaces = internal,public

The auth URL uses the internal endpoint:
  https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ironic-inspector.yaml#L220



Why do we need the storage network?

From the logs above it timed out connecting to the internal endpoint?
 HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector

Comment 7 Harald Jensås 2018-11-06 17:42:54 UTC
(In reply to Harald Jensås from comment #6)
> By default we have valid interfaces internal and public:
>   # List of interfaces, in order of preference, for endpoint URL. (list
>   # value)
>   #valid_interfaces = internal,public
> 
> The auth URL uses the internal endpoint:
>  
> https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/
> services/ironic-inspector.yaml#L220
> 
> 
> 
> Why do we need the storage network?
> 
> From the logs above it timed out connecting to the internal endpoint?
>  HTTPConnectionPool(host='172.17.3.12', port=8080): Max retries exceeded
> with url: /v1/AUTH_e54d5948b91d4a98b0ac886358c24c93/ironic-inspector

<sasha> hjensas: InternalApiNetCidr: 172.17.1.0/24, StorageNetCidr: 172.17.3.0/24

So the internal and admin endpoint is on the storage network.


Adding the storage network to the role when it includes the Inspector is a good fix.

The Keystone internal endpoint is used by default for ironic::inspector::swift_auth_url, overriding this to use the public endpoint might also work.

Comment 8 Dmitry Tantsur 2018-11-08 12:12:51 UTC
Okay, so we have three options:

1. Add storage network to the ironic-inspector node.
2. Change [swift]valid_interfaces to "public".
3. Set endpoint_override for swift explicitly.

I'd prefer to avoid #3, since this is now how things are designed to work. Does anyone have opinions on which of #1 and #2 are preferred? Ramon?

> Keystone internal endpoint is used by default for ironic::inspector::swift_auth_url, overriding this to use the public endpoint might also work

I doubt it. I don't think they change the catalog based on which endpoint you use to hit it.

Comment 9 Dmitry Tantsur 2018-11-09 10:21:19 UTC
By the way, Ironic also access Swift with the direct deploy interface. So I think this problem is not limited to Inspector. Would it be enough to add Storage to https://github.com/openstack/tripleo-heat-templates/blob/master/roles/IronicConductor.yaml#L8? Sasha, do you know?

Comment 10 Alexander Chuzhoy 2018-11-14 14:55:47 UTC
Adding the storage network to the role used by ironic node(s) and actually having a leg in that network on the respective host(s) is enough.
Was able to successfully introspect nodes in OC.

Comment 11 Alexander Chuzhoy 2018-11-14 15:03:20 UTC
We can adjust /usr/share/openstack-tripleo-heat-templates/roles/IronicConductor.yaml with:
1.  adding The storage network
2. Adding '- OS::TripleO::Services::IronicInspector'

Comment 16 errata-xmlrpc 2019-03-18 13:03:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0446


Note You need to log in before you can comment on or make changes to this bug.