Description of problem: Ironic in OC is broken in 17 due to introspection is failing as the there is connection leakage and our asumption is it might be due to the iptables or firewall rule. First it failed here: failed: [undercloud-0] (item=ironic-0) => { "ansible_loop_var": "item", "changed": true, "cmd": "source /home/stack/overcloudrc\nopenstack baremetal introspection start --wait ironic-0\n", "delta": "0:02:15.574174", "end": "2022-01-11 18:22:46.488638", "item": "ironic-0", "rc": 1, "start": "2022-01-11 18:20:30.914464" } STDERR: /usr/lib64/python3.6/site-packages/_yaml/__init__.py:23: DeprecationWarning: The _yaml extension module is now located at yaml._yaml and its location is subject to change. To use the LibYAML-based parser and emitter, import from `yaml`: `from yaml import CLoader as Loader, CDumper as Dumper`. DeprecationWarning Unable to establish connection to https://10.0.0.127:13050: HTTPSConnectionPool(host='10.0.0.127', port=13050): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f7d8f84a978>: Failed to establish a new connection: [Errno 110] Connection timed out',)) We tried adding the following rule to the controller: "sudo iptables -I INPUT -p tcp -m tcp --dport 13050 -m conntrack --ctstate NEW -j ACCEPT" but it failed again below after some later stage: [stack@undercloud-0 ~]$ openstack baremetal introspection start --wait ironic-0 /usr/lib64/python3.6/site-packages/_yaml/__init__.py:23: DeprecationWarning: The _yaml extension module is now located at yaml._yaml and its location is subject to change. To use the LibYAML-based parser and emitter, import from `yaml`: `from yaml import CLoader as Loader, CDumper as Dumper`. DeprecationWarning Waiting for introspection to finish... Unable to establish connection to https://10.0.0.127:13050/v1/introspection/ironic-0: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) Version-Release number of selected component (if applicable): How reproducible: everytime for BMAAS Steps to Reproduce: 1. Deploy OC with ironic services in OC 2. Deploy the BM node 3. run the introspection Actual results: Introspection failed Expected results: Introspection passed Additional info:
A few different things going on. 1) Firewall rules explicitly are not opened for the port which is being setup for the inspector service. Looking through the commit logs, they never were. Period. It does look like there was a template which disappeared between train->wallaby which overridded the endpoint to use port 5050. I think we should just fix the firewall rule for the container, which is not a big deal imho. Granted all of the issues which can exist with TLS without custom ramdisks to know about the TLS certificates. 2) It looks like we've got a fun issue intertwined between python-requests->urllib3->haproxy->ironic-inspector API. Inspector's api is fairly simple and closes out a socket when the reqeust is done, but python-requests is thinking it is open. To remedy this as quickly as possible, I've posted a change to force the connection closed which is fine in this use case and pattern. 3) Your explicitly missing br-baremetal, and without that network interface your BMaaS deployment will just not work as your ironic-0 and ironic-1 test VMs will not be able to introspect or deploy.
(In reply to Julia Kreger from comment #1) > A few different things going on. > > 1) Firewall rules explicitly are not opened for the port which is being > setup for the inspector service. Looking through the commit logs, they never > were. Period. It does look like there was a template which disappeared > between train->wallaby which overridded the endpoint to use port 5050. I > think we should just fix the firewall rule for the container, which is not a > big deal imho. Granted all of the issues which can exist with TLS without > custom ramdisks to know about the TLS certificates. Do we have to involve someone from DFG: security for this concern here or is it some lag from the infrared? > 2) It looks like we've got a fun issue intertwined between > python-requests->urllib3->haproxy->ironic-inspector API. Inspector's api is > fairly simple and closes out a socket when the reqeust is done, but > python-requests is thinking it is open. To remedy this as quickly as > possible, I've posted a change to force the connection closed which is fine > in this use case and pattern. ack. > 3) Your explicitly missing br-baremetal, and without that network interface > your BMaaS deployment will just not work as your ironic-0 and ironic-1 test > VMs will not be able to introspect or deploy. Umm that seems like another thing to look at from the infrared code:/
(In reply to Paras Babbar from comment #2) > (In reply to Julia Kreger from comment #1) > > A few different things going on. > > > > 1) Firewall rules explicitly are not opened for the port which is being > > setup for the inspector service. Looking through the commit logs, they never > > were. Period. It does look like there was a template which disappeared > > between train->wallaby which overridded the endpoint to use port 5050. I > > think we should just fix the firewall rule for the container, which is not a > > big deal imho. Granted all of the issues which can exist with TLS without > > custom ramdisks to know about the TLS certificates. > > Do we have to involve someone from DFG: security for this concern here or is > it some lag from the infrared? Security has indicated they are not responsible for the firewall rules. It looks like it works fine on Underclouds so it should just be a rule change on templates so it works in overclouds. This doesn't seem to be infrared lag as much as it is a lag in taking all of the distinct things and then getting into the complex testing cases. > > > 2) It looks like we've got a fun issue intertwined between > > python-requests->urllib3->haproxy->ironic-inspector API. Inspector's api is > > fairly simple and closes out a socket when the reqeust is done, but > > python-requests is thinking it is open. To remedy this as quickly as > > possible, I've posted a change to force the connection closed which is fine > > in this use case and pattern. > ack. > > 3) Your explicitly missing br-baremetal, and without that network interface > > your BMaaS deployment will just not work as your ironic-0 and ironic-1 test > > VMs will not be able to introspect or deploy. > > Umm that seems like another thing to look at from the infrared code:/ On a plus side, we appear to have several issues with WSGI/Python/Eventlet/Python Requests which is making this rather painful across a few different BZ items.
Patches have merged related to this issue and are present in the builds. Moving to modified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543