Bug 2002495
| Summary: | Deploying OCP on RHOS with assisted-installer Failing validation blocks deployment | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alexander Chuzhoy <sasha> |
| Component: | assisted-installer | Assignee: | Eran Cohen <ercohen> |
| assisted-installer sub component: | assisted-service | QA Contact: | Yuri Obshansky <yobshans> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | medium | CC: | ercohen, jkilzi, jtomasek, mfilanov, tjelinek, venkatasubramanian.b |
| Version: | 4.9 | Keywords: | TestBlocker |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-28 08:45:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Alexander Chuzhoy
2021-09-09 02:08:59 UTC
cc: @jtomasek @tjelinek can you please take a look? We enabled to install with Openstack if users are defining none platform but looks like it collide with the UI wizard steps @mfilanov, if the `valid-platform` host validation is in neither one of these states: `disabled`, `success` nor it has been explicitly marked in the UI as a `softValidation` it will fail the validation check and prevent the user from moving to the next wizard step. @sasha do you have an environment where we can reproduce this? I would like to see what we receive from the BE during the polling to /v1/clusters/:cluster_id This is the cluster validation info:
{'configuration': [{'id': 'pull-secret-set',
'message': 'The pull secret is set.',
'status': 'success'}],
'hosts-data': [{'id': 'all-hosts-are-ready-to-install',
'message': 'The cluster has hosts that are not ready to '
'install.',
'status': 'failure'},
{'id': 'sufficient-masters-count',
'message': 'The cluster has a sufficient number of master '
'candidates.',
'status': 'success'}],
'network': [{'id': 'api-vip-defined',
'message': 'The API virtual IP is undefined; IP allocation from '
'the DHCP server timed out.',
'status': 'failure'},
{'id': 'api-vip-valid',
'message': 'The API virtual IP is undefined.',
'status': 'pending'},
{'id': 'cluster-cidr-defined',
'message': 'The Cluster Network CIDR is defined.',
'status': 'success'},
{'id': 'dns-domain-defined',
'message': 'The base domain is defined.',
'status': 'success'},
{'id': 'ingress-vip-defined',
'message': 'The Ingress virtual IP is undefined; IP allocation '
'from the DHCP server timed out.',
'status': 'failure'},
{'id': 'ingress-vip-valid',
'message': 'The Ingress virtual IP is undefined.',
'status': 'pending'},
{'id': 'machine-cidr-defined',
'message': 'The Machine Network CIDR is defined.',
'status': 'success'},
{'id': 'machine-cidr-equals-to-calculated-cidr',
'message': 'The Machine Network CIDR, API virtual IP, or Ingress '
'virtual IP is undefined.',
'status': 'pending'},
{'id': 'network-prefix-valid',
'message': 'The Cluster Network prefix is valid.',
'status': 'success'},
{'id': 'network-type-valid',
'message': 'The cluster has a valid network type',
'status': 'success'},
{'id': 'no-cidrs-overlapping',
'message': 'No CIDRS are overlapping.',
'status': 'success'},
{'id': 'ntp-server-configured',
'message': 'No ntp problems found',
'status': 'success'},
{'id': 'service-cidr-defined',
'message': 'The Service Network CIDR is defined.',
'status': 'success'}],
'operators': [{'id': 'cnv-requirements-satisfied',
'message': 'cnv is disabled',
'status': 'success'},
{'id': 'lso-requirements-satisfied',
'message': 'lso is disabled',
'status': 'success'},
{'id': 'ocs-requirements-satisfied',
'message': 'ocs is disabled',
'status': 'success'}]}
And this is the host validation info:
{'hardware': [{'id': 'has-inventory',
'message': 'Valid inventory exists for the host',
'status': 'success'},
{'id': 'has-min-cpu-cores',
'message': 'Sufficient CPU cores',
'status': 'success'},
{'id': 'has-min-memory',
'message': 'Sufficient minimum RAM',
'status': 'success'},
{'id': 'has-min-valid-disks',
'message': 'Sufficient disk capacity',
'status': 'success'},
{'id': 'has-cpu-cores-for-role',
'message': 'Sufficient CPU cores for role master',
'status': 'success'},
{'id': 'has-memory-for-role',
'message': 'Sufficient RAM for role master',
'status': 'success'},
{'id': 'hostname-unique',
'message': 'Hostname '
'ci-vm-10-0-97-34.hosted.upshift.rdu2.redhat.com is '
'unique in cluster',
'status': 'success'},
{'id': 'hostname-valid',
'message': 'Hostname '
'ci-vm-10-0-97-34.hosted.upshift.rdu2.redhat.com is '
'allowed',
'status': 'success'},
{'id': 'valid-platform',
'message': 'Platform OpenStack Compute is allowed only for '
'Single Node OpenShift or user-managed networking',
'status': 'failure'},
{'id': 'sufficient-installation-disk-speed',
'message': 'Speed of installation disk has not yet been '
'measured',
'status': 'success'},
{'id': 'compatible-with-cluster-platform',
'message': 'Host is compatible with cluster platform baremetal',
'status': 'success'}],
'network': [{'id': 'connected',
'message': 'Host is connected',
'status': 'success'},
{'id': 'machine-cidr-defined',
'message': 'Machine Network CIDR is defined',
'status': 'success'},
{'id': 'belongs-to-machine-cidr',
'message': 'Host belongs to all machine network CIDRs',
'status': 'success'},
{'id': 'belongs-to-majority-group',
'message': 'Host has connectivity to the majority of hosts in '
'the cluster',
'status': 'success'},
{'id': 'ntp-synced',
'message': "Host couldn't synchronize with any NTP server",
'status': 'failure'},
{'id': 'container-images-available',
'message': 'All required container images were either pulled '
'successfully or no attempt was made to pull them',
'status': 'success'},
{'id': 'sufficient-network-latency-requirement-for-role',
'message': 'Network latency requirement has been satisfied.',
'status': 'success'},
{'id': 'sufficient-packet-loss-requirement-for-role',
'message': 'Packet loss requirement has been satisfied.',
'status': 'success'},
{'id': 'has-default-route',
'message': 'Host has been configured with at least one default '
'route.',
'status': 'success'},
{'id': 'api-domain-name-resolved-correctly',
'message': 'Domain name resolution is not required (managed '
'networking)',
'status': 'success'},
{'id': 'api-int-domain-name-resolved-correctly',
'message': 'Domain name resolution is not required (managed '
'networking)',
'status': 'success'},
{'id': 'apps-domain-name-resolved-correctly',
'message': 'Domain name resolution is not required (managed '
'networking)',
'status': 'success'},
{'id': 'dns-wildcard-not-configured',
'message': 'DNS wildcard check was successful',
'status': 'success'}],
'operators': [{'id': 'cnv-requirements-satisfied',
'message': 'cnv is disabled',
'status': 'success'},
{'id': 'lso-requirements-satisfied',
'message': 'lso is disabled',
'status': 'success'},
{'id': 'ocs-requirements-satisfied',
'message': 'ocs is disabled',
'status': 'success'}]}
@jkilzi if we move (technically we will split the logic and add a new validation) the platform validation from "hardware" to "network" will the user be able to get to the network part inorder to set the "user managed networking"?
(In reply to Eran Cohen from comment #4) > This is the cluster validation info: > > {'configuration': [{'id': 'pull-secret-set', > 'message': 'The pull secret is set.', > 'status': 'success'}], > 'hosts-data': [{'id': 'all-hosts-are-ready-to-install', > 'message': 'The cluster has hosts that are not ready to ' > 'install.', > 'status': 'failure'}, > {'id': 'sufficient-masters-count', > 'message': 'The cluster has a sufficient number of master ' > 'candidates.', > 'status': 'success'}], > 'network': [{'id': 'api-vip-defined', > 'message': 'The API virtual IP is undefined; IP allocation > from ' > 'the DHCP server timed out.', > 'status': 'failure'}, > {'id': 'api-vip-valid', > 'message': 'The API virtual IP is undefined.', > 'status': 'pending'}, > {'id': 'cluster-cidr-defined', > 'message': 'The Cluster Network CIDR is defined.', > 'status': 'success'}, > {'id': 'dns-domain-defined', > 'message': 'The base domain is defined.', > 'status': 'success'}, > {'id': 'ingress-vip-defined', > 'message': 'The Ingress virtual IP is undefined; IP allocation > ' > 'from the DHCP server timed out.', > 'status': 'failure'}, > {'id': 'ingress-vip-valid', > 'message': 'The Ingress virtual IP is undefined.', > 'status': 'pending'}, > {'id': 'machine-cidr-defined', > 'message': 'The Machine Network CIDR is defined.', > 'status': 'success'}, > {'id': 'machine-cidr-equals-to-calculated-cidr', > 'message': 'The Machine Network CIDR, API virtual IP, or > Ingress ' > 'virtual IP is undefined.', > 'status': 'pending'}, > {'id': 'network-prefix-valid', > 'message': 'The Cluster Network prefix is valid.', > 'status': 'success'}, > {'id': 'network-type-valid', > 'message': 'The cluster has a valid network type', > 'status': 'success'}, > {'id': 'no-cidrs-overlapping', > 'message': 'No CIDRS are overlapping.', > 'status': 'success'}, > {'id': 'ntp-server-configured', > 'message': 'No ntp problems found', > 'status': 'success'}, > {'id': 'service-cidr-defined', > 'message': 'The Service Network CIDR is defined.', > 'status': 'success'}], > 'operators': [{'id': 'cnv-requirements-satisfied', > 'message': 'cnv is disabled', > 'status': 'success'}, > {'id': 'lso-requirements-satisfied', > 'message': 'lso is disabled', > 'status': 'success'}, > {'id': 'ocs-requirements-satisfied', > 'message': 'ocs is disabled', > 'status': 'success'}]} > > > And this is the host validation info: > {'hardware': [{'id': 'has-inventory', > 'message': 'Valid inventory exists for the host', > 'status': 'success'}, > {'id': 'has-min-cpu-cores', > 'message': 'Sufficient CPU cores', > 'status': 'success'}, > {'id': 'has-min-memory', > 'message': 'Sufficient minimum RAM', > 'status': 'success'}, > {'id': 'has-min-valid-disks', > 'message': 'Sufficient disk capacity', > 'status': 'success'}, > {'id': 'has-cpu-cores-for-role', > 'message': 'Sufficient CPU cores for role master', > 'status': 'success'}, > {'id': 'has-memory-for-role', > 'message': 'Sufficient RAM for role master', > 'status': 'success'}, > {'id': 'hostname-unique', > 'message': 'Hostname ' > 'ci-vm-10-0-97-34.hosted.upshift.rdu2.redhat.com > is ' > 'unique in cluster', > 'status': 'success'}, > {'id': 'hostname-valid', > 'message': 'Hostname ' > 'ci-vm-10-0-97-34.hosted.upshift.rdu2.redhat.com > is ' > 'allowed', > 'status': 'success'}, > {'id': 'valid-platform', > 'message': 'Platform OpenStack Compute is allowed only for ' > 'Single Node OpenShift or user-managed networking', > 'status': 'failure'}, > {'id': 'sufficient-installation-disk-speed', > 'message': 'Speed of installation disk has not yet been ' > 'measured', > 'status': 'success'}, > {'id': 'compatible-with-cluster-platform', > 'message': 'Host is compatible with cluster platform > baremetal', > 'status': 'success'}], > 'network': [{'id': 'connected', > 'message': 'Host is connected', > 'status': 'success'}, > {'id': 'machine-cidr-defined', > 'message': 'Machine Network CIDR is defined', > 'status': 'success'}, > {'id': 'belongs-to-machine-cidr', > 'message': 'Host belongs to all machine network CIDRs', > 'status': 'success'}, > {'id': 'belongs-to-majority-group', > 'message': 'Host has connectivity to the majority of hosts in ' > 'the cluster', > 'status': 'success'}, > {'id': 'ntp-synced', > 'message': "Host couldn't synchronize with any NTP server", > 'status': 'failure'}, > {'id': 'container-images-available', > 'message': 'All required container images were either pulled ' > 'successfully or no attempt was made to pull them', > 'status': 'success'}, > {'id': 'sufficient-network-latency-requirement-for-role', > 'message': 'Network latency requirement has been satisfied.', > 'status': 'success'}, > {'id': 'sufficient-packet-loss-requirement-for-role', > 'message': 'Packet loss requirement has been satisfied.', > 'status': 'success'}, > {'id': 'has-default-route', > 'message': 'Host has been configured with at least one default > ' > 'route.', > 'status': 'success'}, > {'id': 'api-domain-name-resolved-correctly', > 'message': 'Domain name resolution is not required (managed ' > 'networking)', > 'status': 'success'}, > {'id': 'api-int-domain-name-resolved-correctly', > 'message': 'Domain name resolution is not required (managed ' > 'networking)', > 'status': 'success'}, > {'id': 'apps-domain-name-resolved-correctly', > 'message': 'Domain name resolution is not required (managed ' > 'networking)', > 'status': 'success'}, > {'id': 'dns-wildcard-not-configured', > 'message': 'DNS wildcard check was successful', > 'status': 'success'}], > 'operators': [{'id': 'cnv-requirements-satisfied', > 'message': 'cnv is disabled', > 'status': 'success'}, > {'id': 'lso-requirements-satisfied', > 'message': 'lso is disabled', > 'status': 'success'}, > {'id': 'ocs-requirements-satisfied', > 'message': 'ocs is disabled', > 'status': 'success'}]} > > @jkilzi if we move (technically we will split the logic and add a > new validation) the platform validation from "hardware" to "network" will > the user be able to get to the network part inorder to set the "user managed > networking"? Yes. As I mentioned before, at the hosts-discovery step we only pay attention to the 'hardware' group. The 'network' group is evaluated at the networking step. FiledQA Tried to verify with Assisted-ui-lib version: 1.5.36-2 The issue still persists. Unsure the assisted-ui-lib have something to do with it. It failed QA on staging? What is the assisted-service version you tested? Re-tested on staging and it works as expected now. Assisted-ui-lib version: 1.5.37 Hi,
In our 3 (control) + 2 (worker) AI based cluster deployment, hosts are not in Ready state to proceed with installation. Status of host is Insufficient even after NTP sync is successful
Output from the one of the cluster nodes
==============================================================================
$ chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.10.4 3 10 377 947 -292us[ -295us] +/- 92ms
$ timedatectl
Local time: Tue 2022-05-10 15:51:51 UTC
Universal time: Tue 2022-05-10 15:51:51 UTC
RTC time: Tue 2022-05-10 15:51:51
Time zone: UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Cluster events :
============================================================================================
5/10/2022, 7:21:04 PM Updated status of the cluster to insufficient
5/10/2022, 7:21:04 PM Cluster validation 'api-vip-defined' is now fixed
5/10/2022, 7:07:40 PM Cluster validation 'ntp-server-configured' is now fixed
5/10/2022, 7:07:38 PM Host sl12345.net: validation 'ntp-synced' is now fixed
5/10/2022, 7:07:14 PM Host sl12346.net: validation 'ntp-synced' is now fixed
5/10/2022, 7:06:58 PM Host sl12347.net: validation 'ntp-synced' is now fixed
5/10/2022, 7:06:34 PM Host sl12348.net: validation 'ntp-synced' is now fixed
5/10/2022, 7:05:38 PM Cluster validation 'sufficient-masters-count' is now fixed
5/10/2022, 7:05:38 PM
warning
Host sl12345.net: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host couldn't synchronize with any NTP server ; No connectivity to the majority of hosts in the cluster)
5/10/2022, 7:05:38 PM Host sl12349.net: validation 'ntp-synced' is now fixed
5/10/2022, 7:05:14 PM
warning
Host sl12346.net: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host couldn't synchronize with any NTP server ; No connectivity to the majority of hosts in the cluster)
5/10/2022, 7:04:58 PM
warning
Host sl12347.net: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host couldn't synchronize with any NTP server ; No connectivity to the majority of hosts in the cluster)
5/10/2022, 7:04:54 PM Host 67bb1e80-ccb6-2902-bebd-c722391c6b27: Successfully registered
5/10/2022, 7:04:39 PM
warning
Cluster validation 'ntp-server-configured' that used to succeed is now failing
5/10/2022, 7:04:34 PM
warning
Host sl12348.net: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host couldn't synchronize with any NTP server)
5/10/2022, 7:04:30 PM Host 90570f72-619d-2327-e948-7a1ac68387b6: Successfully registered
5/10/2022, 7:03:38 PM
warning
Cluster validation 'all-hosts-are-ready-to-install' that used to succeed is now failing
5/10/2022, 7:03:38 PM
warning
Host sl12349.net: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host couldn't synchronize with any NTP server)
Validation info from the cluster :
=================================================================================================================
"configuration": [
{
"id": "pull-secret-set",
"status": "success",
"message": "The pull secret is set."
}
],
"hosts-data": [
{
"id": "all-hosts-are-ready-to-install",
"status": "failure",
"message": "The cluster has hosts that are not ready to install."
},
{
"id": "sufficient-masters-count",
"status": "success",
"message": "The cluster has a sufficient number of master candidates."
}
],
"network": [
{
"id": "api-vip-defined",
"status": "success",
"message": "The API virtual IP is defined."
},
{
"id": "api-vip-valid",
"status": "success",
"message": "api vip 192.168.10.40 belongs to the Machine CIDR and is not in use."
},
{
"id": "cluster-cidr-defined",
"status": "success",
"message": "The Cluster Network CIDR is defined."
},
{
"id": "dns-domain-defined",
"status": "success",
"message": "The base domain is defined."
},
{
"id": "ingress-vip-defined",
"status": "success",
"message": "The Ingress virtual IP is defined."
},
{
"id": "ingress-vip-valid",
"status": "success",
"message": "ingress vip 192.168.10.41 belongs to the Machine CIDR and is not in use."
},
{
"id": "machine-cidr-defined",
"status": "success",
"message": "The Machine Network CIDR is defined."
},
{
"id": "machine-cidr-equals-to-calculated-cidr",
"status": "success",
"message": "The Cluster Machine CIDR is equivalent to the calculated CIDR."
},
{
"id": "network-prefix-valid",
"status": "success",
"message": "The Cluster Network prefix is valid."
},
{
"id": "network-type-valid",
"status": "success",
"message": "The cluster has a valid network type"
},
{
"id": "networks-same-address-families",
"status": "success",
"message": "Same address families for all networks."
},
{
"id": "no-cidrs-overlapping",
"status": "success",
"message": "No CIDRS are overlapping."
},
{
"id": "ntp-server-configured",
"status": "success",
"message": "No ntp problems found"
},
{
"id": "service-cidr-defined",
"status": "success",
"message": "The Service Network CIDR is defined."
}
],
"operators": [
{
"id": "cnv-requirements-satisfied",
"status": "success",
"message": "cnv is disabled"
},
{
"id": "lso-requirements-satisfied",
"status": "success",
"message": "lso is disabled"
},
{
"id": "odf-requirements-satisfied",
"status": "success",
"message": "odf is disabled"
}
]
}
Used following images :
==============================================================================================
quay.io/edge-infrastructure/postgresql-12-centos7:0.3.25
quay.io/edge-infrastructure/assisted-service:v2.3.1
quay.io/edge-infrastructure/assisted-installer-ui:v2.3.9
quay.io/edge-infrastructure/assisted-image-service:v2.3.1
quay.io/edge-infrastructure/assisted-installer-agent:v2.3.1
quay.io/edge-infrastructure/assisted-installer:v2.3.1
quay.io/edge-infrastructure/assisted-installer-controller:v2.3.1
With this unable to proceed for the installation, how to proceed further?
|