Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1968175

Summary: [4.8.0] Agent validation not including reason for being insufficient
Product: OpenShift Container Platform Reporter: Fred Rolland <frolland>
Component: assisted-installerAssignee: Fred Rolland <frolland>
assisted-installer sub component: assisted-service QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, atraeger, ccrum, frolland, mfilanov, mhrivnak
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Hive KNI-EDGE-4.8 KNI-EDGE-JUKE-4.8
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1968067 Environment:
Last Closed: 2021-07-27 23:11:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1968067    
Bug Blocks:    

Description Fred Rolland 2021-06-06 13:52:01 UTC
+++ This bug was initially created as a clone of Bug #1968067 +++

Created attachment 1789025 [details]
cluster resource from REST API

Description of problem:

When a cluster's validations are pending, but not failed, a related Agent's status can show the following condition:


    - lastTransitionTime: "2021-06-03T15:05:29Z"
      message: 'The agent''s validations are failing: '
      reason: ValidationsFailing
      status: "False"
      type: Validated


The lack of any reasons leaves the user unable to understand what's wrong or how to proceed.

Code that ignores pending validations: https://github.com/openshift/assisted-service/blob/80d9b5f/internal/controller/controllers/agent_controller.go#L439-L448

I'll attach the full Cluster representation.

Version-Release number of selected component (if applicable):
0.0.5-rc1


How reproducible:
always

Steps to Reproduce:
1. create a ClusterDeployment and AgentClusterInstall such that validations are pending
2. Create InfraEnv
3. boot the ISO, approve the resulting Agent, and watch its conditions

--- Additional comment from mfilanov on 20210606T06:19:21

Pending validations are not handled.

--- Additional comment from frolland on 20210606T10:18:26

A new Reason for better clarity can be added: ValidationsUserPending:

AgentCLusterInstall:
Validated 	False 	ValidationsFailing 	The cluster's validations are failing: "summary of failed validations" 	If the cluster status is "insufficient"
Validated 	False 	ValidationsUserPending 	The cluster's validations are are pending for user: "summary of failed validations" If the cluster status is "pending-for-input"

Agent:
Validated 	False 	ValidationsFailing 	The agent's validations are failing: "summary of failed validations" 	If the host status is "insufficient" or "pending-for-input"
Validated 	False 	ValidationsUserPending 	The agent's validations are pending for user: "summary of failed validations" 	If the host status is "pending-for-input"


Also, the summary will include all "not-succeeded" validations (ValidationFailure, ValidationPending, ValidationError) excluding ValidationSuccess & ValidationDisabled

@atraeger WDYT?

--- Additional comment from atraeger on 20210606T10:54:13

@frolland sounds good to me

Comment 3 Chad Crum 2021-06-19 13:56:40 UTC
This has been validated:
- 2.3.0-DOWNSTREAM-2021-06-17-01-26-58
- Hub = 4.8.0-fc.7

Steps:
- Created CRs for SNO type cluster, except for ACI created multiple control plane agents:
  provisionRequirements:
    controlPlaneAgents: 7
- Machine started, but did not pass validations due to multiple control plane agents, and was pending:
- Pending validations showed the reason for pending: 
  - lastTransitionTime: "2021-06-19T13:52:21Z"                                                                                                                                                                       
    message: 'The agent''s validations are pending for user: Machine Network CIDR
      is undefined; the Machine Network CIDR can be defined by setting either the
      API or Ingress virtual IPs,Missing inventory or machine network CIDR,Machine
      Network CIDR or Connectivity Majority Groups missing,Host couldn''t synchronize
      with any NTP server'

Comment 5 errata-xmlrpc 2021-07-27 23:11:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438