Bug 1988154 - Scaling up machineset fails when using virtualmedia via external network
Summary: Scaling up machineset fails when using virtualmedia via external network
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Caleb Boylan
QA Contact: Aleksandra Malykhin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-29 21:41 UTC by Caleb Boylan
Modified: 2021-11-22 21:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-22 21:47:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4712 0 None None None 2021-11-22 21:47:12 UTC

Description Caleb Boylan 2021-07-29 21:41:36 UTC
When scaling a machineset to deploy with virtualmedia via external network, the image used is served from the provisioning network and the provisioning fails if the host cannot access the provisioning network.
If you manually set the baremetalhost image to use the host IP then it works.

2021-07-29 15:03:16.468 673 ERROR root [-] Command failed: prepare_image, error: HTTPConnectionPool(host='172.22.0.3', port=6181): Max retries exceeded with url: /images/rhcos-49.84.202107010027-0-openstack.x86_64.qcow2/cached-rhcos-49.84.202107010027-0-openstack.x86_64.qcow2.md5sum (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f432cebbc88>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)): requests.exceptions.ConnectionError: HTTPConnectionPool(host='172.22.0.3', port=6181): Max retries exceeded with url: /images/rhcos-49.84.202107010027-0-openstack.x86_64.qcow2/cached-rhcos-49.84.202107010027-0-openstack.x86_64.qcow2.md5sum (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f432cebbc88>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
2021-07-29 15:03:16.468 673 ERROR root Traceback (most recent call last):
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 162, in _new_conn
2021-07-29 15:03:16.468 673 ERROR root     (self._dns_host, self.port), self.timeout, **extra_kw)
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
2021-07-29 15:03:16.468 673 ERROR root     raise err
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
2021-07-29 15:03:16.468 673 ERROR root     sock.connect(sa)
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 267, in connect
2021-07-29 15:03:16.468 673 ERROR root     socket_checkerr(fd)
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr
2021-07-29 15:03:16.468 673 ERROR root     raise socket.error(err, errno.errorcode[err])
2021-07-29 15:03:16.468 673 ERROR root ConnectionRefusedError: [Errno 111] ECONNREFUSED

Comment 1 Zane Bitter 2021-07-29 21:49:14 UTC
If we always use the API VIP in the installer then that should resolve this issue for new clusters, since the image is served from the image cache (which runs on every control plane node). A downside of this is that IPA will use the external network to download the qcow2 image to write to disk. This may be undesirable in some environments, so it may not make sense to make this change (given that the option to use the external network while also enabling a provisioning network is not exposed in the installer, but can only happen on Day 2).

Nothing currently modifies existing MachineSets, so for existing clusters or those installed with the provisioning network enabled, there will be an extra step for the user to do to make sure that new Machines are created using the API VIP in the image URL.

Comment 2 Caleb Boylan 2021-08-16 19:46:22 UTC
We have documentation in a PR to explain how to make this change to the machineset https://github.com/openshift/openshift-docs/pull/35304

Comment 5 zhaozhanqi 2021-11-12 02:25:30 UTC
(In reply to Caleb Boylan from comment #2)
> We have documentation in a PR to explain how to make this change to the
> machineset https://github.com/openshift/openshift-docs/pull/35304

Hi, I saw above PR is closed. However the bug is changed to ON_QA.  Could you show the correct PR for this bug?

Comment 7 Zane Bitter 2021-11-12 03:30:51 UTC
It looks like it's probably https://github.com/openshift/openshift-docs/pull/36089

Comment 11 errata-xmlrpc 2021-11-22 21:47:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.8 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4712


Note You need to log in before you can comment on or make changes to this bug.