Bug 2039515 - [osp17]Internal Public endpoints connection are not getting setup properly by tripleo for ironic-inspector
Summary: [osp17]Internal Public endpoints connection are not getting setup properly by...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: Alpha
: 17.0
Assignee: Julia Kreger
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-11 20:36 UTC by Paras Babbar
Modified: 2022-09-21 12:18 UTC (History)
4 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-0.20220204022106.9b9ecb3.el9ost openstack-ironic-inspector-10.6.2-0.20220118051837.8f97076.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:18:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 824251 0 None MERGED Add ironic-inspector TLS endpoint port to be reachable 2022-01-31 20:51:03 UTC
OpenStack gerrit 824955 0 None MERGED Add ironic-inspector TLS endpoint port to be reachable 2022-01-31 20:51:05 UTC
Red Hat Issue Tracker OSP-12099 0 None None None 2022-01-11 20:43:33 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:18:34 UTC

Description Paras Babbar 2022-01-11 20:36:18 UTC
Description of problem:

Ironic in OC is broken in 17 due to introspection is failing as the there is connection leakage and our asumption is it might be due to the iptables or firewall rule.

First it failed here:

failed: [undercloud-0] (item=ironic-0) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": "source /home/stack/overcloudrc\nopenstack baremetal introspection start --wait ironic-0\n",
    "delta": "0:02:15.574174",
    "end": "2022-01-11 18:22:46.488638",
    "item": "ironic-0",
    "rc": 1,
    "start": "2022-01-11 18:20:30.914464"
}

STDERR:

/usr/lib64/python3.6/site-packages/_yaml/__init__.py:23: DeprecationWarning: The _yaml extension module is now located at yaml._yaml and its location is subject to change.  To use the LibYAML-based parser and emitter, import from `yaml`: `from yaml import CLoader as Loader, CDumper as Dumper`.
  DeprecationWarning
Unable to establish connection to https://10.0.0.127:13050: HTTPSConnectionPool(host='10.0.0.127', port=13050): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f7d8f84a978>: Failed to establish a new connection: [Errno 110] Connection timed out',))


We tried adding the following rule to the controller:

"sudo iptables -I INPUT -p tcp -m tcp --dport 13050 -m conntrack --ctstate NEW -j ACCEPT"
but it failed again below after some later stage:

 [stack@undercloud-0 ~]$ openstack baremetal introspection start --wait ironic-0
/usr/lib64/python3.6/site-packages/_yaml/__init__.py:23: DeprecationWarning: The _yaml extension module is now located at yaml._yaml and its location is subject to change.  To use the LibYAML-based parser and emitter, import from `yaml`: `from yaml import CLoader as Loader, CDumper as Dumper`.
  DeprecationWarning
Waiting for introspection to finish...
 
 
 
 
Unable to establish connection to https://10.0.0.127:13050/v1/introspection/ironic-0: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))


Version-Release number of selected component (if applicable):


How reproducible:
everytime for BMAAS

Steps to Reproduce:
1. Deploy OC with ironic services in OC
2. Deploy the BM node
3. run the introspection

Actual results:

Introspection failed
Expected results:

Introspection passed
Additional info:

Comment 1 Julia Kreger 2022-01-11 22:30:00 UTC
A few different things going on.

1) Firewall rules explicitly are not opened for the port which is being setup for the inspector service. Looking through the commit logs, they never were. Period. It does look like there was a template which disappeared between train->wallaby which overridded the endpoint to use port 5050.  I think we should just fix the firewall rule for the container, which is not a big deal imho. Granted all of the issues which can exist with TLS without custom ramdisks to know about the TLS certificates.
2) It looks like we've got a fun issue intertwined between python-requests->urllib3->haproxy->ironic-inspector API. Inspector's api is fairly simple and closes out a socket when the reqeust is done, but python-requests is thinking it is open. To remedy this as quickly as possible, I've posted a change to force the connection closed which is fine in this use case and pattern.
3) Your explicitly missing br-baremetal, and without that network interface your BMaaS deployment will just not work as your ironic-0 and ironic-1 test VMs will not be able to introspect or deploy.

Comment 2 Paras Babbar 2022-01-12 17:19:08 UTC
(In reply to Julia Kreger from comment #1)
> A few different things going on.
> 
> 1) Firewall rules explicitly are not opened for the port which is being
> setup for the inspector service. Looking through the commit logs, they never
> were. Period. It does look like there was a template which disappeared
> between train->wallaby which overridded the endpoint to use port 5050.  I
> think we should just fix the firewall rule for the container, which is not a
> big deal imho. Granted all of the issues which can exist with TLS without
> custom ramdisks to know about the TLS certificates.

Do we have to involve someone from DFG: security for this concern here or is it some lag from the infrared?

> 2) It looks like we've got a fun issue intertwined between
> python-requests->urllib3->haproxy->ironic-inspector API. Inspector's api is
> fairly simple and closes out a socket when the reqeust is done, but
> python-requests is thinking it is open. To remedy this as quickly as
> possible, I've posted a change to force the connection closed which is fine
> in this use case and pattern.
ack.
> 3) Your explicitly missing br-baremetal, and without that network interface
> your BMaaS deployment will just not work as your ironic-0 and ironic-1 test
> VMs will not be able to introspect or deploy.

Umm that seems like another thing to look at from the infrared code:/

Comment 3 Julia Kreger 2022-01-12 17:34:40 UTC
(In reply to Paras Babbar from comment #2)
> (In reply to Julia Kreger from comment #1)
> > A few different things going on.
> > 
> > 1) Firewall rules explicitly are not opened for the port which is being
> > setup for the inspector service. Looking through the commit logs, they never
> > were. Period. It does look like there was a template which disappeared
> > between train->wallaby which overridded the endpoint to use port 5050.  I
> > think we should just fix the firewall rule for the container, which is not a
> > big deal imho. Granted all of the issues which can exist with TLS without
> > custom ramdisks to know about the TLS certificates.
> 
> Do we have to involve someone from DFG: security for this concern here or is
> it some lag from the infrared?

Security has indicated they are not responsible for the firewall rules. It looks like it works fine on Underclouds so it should just be a rule change on templates so it works in overclouds.

This doesn't seem to be infrared lag as much as it is a lag in taking all of the distinct things and then getting into the complex testing cases.
> 
> > 2) It looks like we've got a fun issue intertwined between
> > python-requests->urllib3->haproxy->ironic-inspector API. Inspector's api is
> > fairly simple and closes out a socket when the reqeust is done, but
> > python-requests is thinking it is open. To remedy this as quickly as
> > possible, I've posted a change to force the connection closed which is fine
> > in this use case and pattern.
> ack.
> > 3) Your explicitly missing br-baremetal, and without that network interface
> > your BMaaS deployment will just not work as your ironic-0 and ironic-1 test
> > VMs will not be able to introspect or deploy.
> 
> Umm that seems like another thing to look at from the infrared code:/

On a plus side, we appear to have several issues with WSGI/Python/Eventlet/Python Requests which is making this rather painful across a few different BZ items.

Comment 4 Julia Kreger 2022-03-15 14:03:26 UTC
Patches have merged related to this issue and are present in the builds. Moving to modified state.

Comment 12 errata-xmlrpc 2022-09-21 12:18:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.