1971308 – [4.8.0] AI KubeAPI AgentClusterInstall confusing "Validated" condition about VIP not matching machine network

Bug 1971308 - [4.8.0] AI KubeAPI AgentClusterInstall confusing "Validated" condition about VIP not matching machine network

Summary: [4.8.0] AI KubeAPI AgentClusterInstall confusing "Validated" condition about ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	assisted-installer
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Ori Amizur
QA Contact:	bjacot
Docs Contact:
URL:
Whiteboard:	KNI-EDGE-JUKE-4.8 AI-Team-Core
Depends On:	1970134
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-13 14:16 UTC by Ronnie Lazar
Modified:	2021-07-27 23:13 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1970134
Environment:
Last Closed:	2021-07-27 23:12:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift assisted-service pull 2056	0	None	open	[ocm-2.3] Bug 1971308: AI KubeAPI AgentClusterInstall confusing "Validated" condition about VIP not matching machine net...	2021-06-23 09:15:14 UTC
Red Hat Bugzilla	1970134	1	urgent	CLOSED	[master] AI KubeAPI AgentClusterInstall confusing "Validated" condition about VIP not matching machine network	2022-08-28 08:47:34 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:13:09 UTC

Description Ronnie Lazar 2021-06-13 14:16:01 UTC

+++ This bug was initially created as a clone of Bug #1970134 +++

Description of problem:
On a multinode cluster AI installation, when setting the VIPs in an AgentClusterInstall, before the hosts boot up, the machine network cidr is still undetermined. This leads to a lot of confusing validations error messages:

api vip 192.168.111.202 does not belong to the Machine CIDR or is already in use.,ingress vip 192.168.111.203 does not belong to the Machine CIDR or is already in use.,The Machine Network CIDR is undefined; the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs.,The Cluster Machine CIDR  is different than the calculated CIDR .

While in reality, the only real "problem" is that the hosts simply didn't boot up yet.

Version-Release number of selected component (if applicable):
assisted-service 481b03a775007b927dd0ed108f22f98b7f76db9d

How reproducible:
100%

Steps to Reproduce:
1. Multi node agentclusterinstall
2. Set VIPs as required
3. Error is shown

Actual results:
Lots of confusing validations

Expected results:
Should have better UX - not sure how, but it should be made clearer to the user that they just need to wait for the hosts to boot up

Additional info:

--- Additional comment from mfilanov on 20210613T12:06:46

We have all the logic and the validations in the backend, kube-api is just a translation layer that does not aware to the validations that are failing.
because it's a validations issue i think that can be easily resolved in the validations logic.

`clusterValidator` handle specific host so it probably can store a state, so maybe when running the validation it can store a specific error and then use it in `printIsApiVipValid`
so in this case the validation can check if cluster have registered hosts and give a better reply

@oamizur @alazar what do you think? it will require some changes in the logic but i think that this is not the only case that will require different types of errors.

--- Additional comment from alazar on 20210613T13:22:17

@oamizur Maybe in case we don't have hosts, these validation errors should not be displayed, or show "pending" status?

--- Additional comment from oamizur on 20210613T14:09:00

@alazar basically this is right.  Validations that have need some pending inputs should not fail but just be pending.  All the above validations should be pending if there are no hosts with inventories.

Comment 2 Trey West 2021-07-06 19:50:12 UTC

I am seeing this message now: 

The cluster's validations are pending for user: Clusters must have exactly 3 dedicated masters. Please either add hosts, or disable the worker host,Hosts have not been discovered yet,Hosts have not been discovered yet,Hosts have not been discovered yet,Hosts have not been discovered yet,At least one of the CIDRs (Machine Network, Cluster Network, Service Network) is undefined.

Not sure why the discovery message is being displayed 4 times. 

@oamizur does this look acceptable? Is there anyway to cut back on how many times that message is displayed?

Comment 4 Ori Amizur 2021-07-15 13:31:26 UTC

Usually pending validations are not displayed (by UI).  In general it means that these validations cannot be evaluated until these issues are fixed.

Comment 7 errata-xmlrpc 2021-07-27 23:12:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.