Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1988154

Summary:	Scaling up machineset fails when using virtualmedia via external network
Product:	OpenShift Container Platform	Reporter:	Caleb Boylan <cboylan>
Component:	Bare Metal Hardware Provisioning	Assignee:	Caleb Boylan <cboylan>
Bare Metal Hardware Provisioning sub component:	cluster-baremetal-operator	QA Contact:	Aleksandra Malykhin <amalykhi>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	amalykhi, aos-bugs, tsedovic, zbitter, zzhao
Version:	4.9	Keywords:	Triaged
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-11-22 21:47:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Caleb Boylan 2021-07-29 21:41:36 UTC

When scaling a machineset to deploy with virtualmedia via external network, the image used is served from the provisioning network and the provisioning fails if the host cannot access the provisioning network.
If you manually set the baremetalhost image to use the host IP then it works.

2021-07-29 15:03:16.468 673 ERROR root [-] Command failed: prepare_image, error: HTTPConnectionPool(host='172.22.0.3', port=6181): Max retries exceeded with url: /images/rhcos-49.84.202107010027-0-openstack.x86_64.qcow2/cached-rhcos-49.84.202107010027-0-openstack.x86_64.qcow2.md5sum (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f432cebbc88>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)): requests.exceptions.ConnectionError: HTTPConnectionPool(host='172.22.0.3', port=6181): Max retries exceeded with url: /images/rhcos-49.84.202107010027-0-openstack.x86_64.qcow2/cached-rhcos-49.84.202107010027-0-openstack.x86_64.qcow2.md5sum (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f432cebbc88>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
2021-07-29 15:03:16.468 673 ERROR root Traceback (most recent call last):
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 162, in _new_conn
2021-07-29 15:03:16.468 673 ERROR root     (self._dns_host, self.port), self.timeout, **extra_kw)
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
2021-07-29 15:03:16.468 673 ERROR root     raise err
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
2021-07-29 15:03:16.468 673 ERROR root     sock.connect(sa)
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 267, in connect
2021-07-29 15:03:16.468 673 ERROR root     socket_checkerr(fd)
2021-07-29 15:03:16.468 673 ERROR root   File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr
2021-07-29 15:03:16.468 673 ERROR root     raise socket.error(err, errno.errorcode[err])
2021-07-29 15:03:16.468 673 ERROR root ConnectionRefusedError: [Errno 111] ECONNREFUSED

Comment 1 Zane Bitter 2021-07-29 21:49:14 UTC

If we always use the API VIP in the installer then that should resolve this issue for new clusters, since the image is served from the image cache (which runs on every control plane node). A downside of this is that IPA will use the external network to download the qcow2 image to write to disk. This may be undesirable in some environments, so it may not make sense to make this change (given that the option to use the external network while also enabling a provisioning network is not exposed in the installer, but can only happen on Day 2).

Nothing currently modifies existing MachineSets, so for existing clusters or those installed with the provisioning network enabled, there will be an extra step for the user to do to make sure that new Machines are created using the API VIP in the image URL.

Comment 2 Caleb Boylan 2021-08-16 19:46:22 UTC

We have documentation in a PR to explain how to make this change to the machineset https://github.com/openshift/openshift-docs/pull/35304

Comment 5 zhaozhanqi 2021-11-12 02:25:30 UTC

(In reply to Caleb Boylan from comment #2)
> We have documentation in a PR to explain how to make this change to the
> machineset https://github.com/openshift/openshift-docs/pull/35304

Hi, I saw above PR is closed. However the bug is changed to ON_QA.  Could you show the correct PR for this bug?

Comment 7 Zane Bitter 2021-11-12 03:30:51 UTC

It looks like it's probably https://github.com/openshift/openshift-docs/pull/36089

Comment 11 errata-xmlrpc 2021-11-22 21:47:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.8 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4712