Bug 1819746 - [OSP-IPI][4.4] Installation fails on "Cluster operator insights Disabled is False with : "
Summary: [OSP-IPI][4.4] Installation fails on "Cluster operator insights Disabled is F...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.5.0
Assignee: Pierre Prinetti
QA Contact: David Sanz
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-01 12:48 UTC by Lukas Bednar
Modified: 2020-07-13 17:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: Due to the performance variability of the OpenStack clouds where OpenShift can be installed, the installation times can be unpredictable. Consequence: The installer might time out even when the installation would converge to a working state, over time. Workaround (if any): Waiting even after the installation is failed, and check the cluster. It might be perfectly healthy. Result: The cluster might reach a perfectly healthy state, even after the installer timeout.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:24:45 UTC
Target Upstream Version:


Attachments (Terms of Use)
flexy-console.log (96.66 KB, text/plain)
2020-04-01 12:48 UTC, Lukas Bednar
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 3464 None closed Bug 1819746: Add a note on slow installations 2020-09-22 14:59:43 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:25:09 UTC

Description Lukas Bednar 2020-04-01 12:48:37 UTC
Created attachment 1675399 [details]
flexy-console.log

Description of problem:

I am using flexy installer to deploy OCP-4.4 on RHOS.
I can see that cluster seems to be healthy and functional but installer fails on timeout anyway.
I am adding snippets of logs into description bellow. And full logs as attachments and in additional info.

11:29:15 level=info msg="API v1.17.1 up"
11:29:15 level=info msg="Waiting up to 40m0s for bootstrapping to complete..."
12:09:23 level=info msg="Cluster operator insights Disabled is False with : "
.... TRIMMED ....
12:09:57 level=debug msg="Log bundle written to /var/home/core/log-bundle-20200401100930.tar.gz"
12:09:57 level=info msg="Bootstrap gather logs captured here \"install-dir/log-bundle-20200401100930.tar.gz\""
12:09:57 level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"
12:09:57 tools/launch_instance.rb:623:in `installation_task': shell command failed execution, see logs (RuntimeError)


[fedora@flexy-executor-2 private-flexy-example]$ oc get clusterversion 
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-rc.4   True        False         90m     Cluster version is 4.4.0-rc.4

[fedora@flexy-executor-2 private-flexy-example]$ oc get clusteroperators 
NAME                                       VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-rc.4   True        False         False      90m
cloud-credential                           4.4.0-rc.4   True        False         False      108m
cluster-autoscaler                         4.4.0-rc.4   True        False         False      98m
console                                    4.4.0-rc.4   True        False         False      92m
csi-snapshot-controller                    4.4.0-rc.4   True        False         False      96m
dns                                        4.4.0-rc.4   True        False         False      103m
etcd                                       4.4.0-rc.4   True        False         False      103m
image-registry                             4.4.0-rc.4   True        False         False      96m
ingress                                    4.4.0-rc.4   True        False         False      96m
insights                                   4.4.0-rc.4   True        False         False      100m
kube-apiserver                             4.4.0-rc.4   True        False         False      102m
kube-controller-manager                    4.4.0-rc.4   True        False         False      101m
kube-scheduler                             4.4.0-rc.4   True        False         False      102m
kube-storage-version-migrator              4.4.0-rc.4   True        False         False      96m
machine-api                                4.4.0-rc.4   True        False         False      104m
machine-config                             4.4.0-rc.4   True        False         False      103m
marketplace                                4.4.0-rc.4   True        False         False      100m
monitoring                                 4.4.0-rc.4   True        False         False      94m
network                                    4.4.0-rc.4   True        False         False      103m
node-tuning                                4.4.0-rc.4   True        False         False      104m
openshift-apiserver                        4.4.0-rc.4   True        False         False      96m
openshift-controller-manager               4.4.0-rc.4   True        False         False      99m
openshift-samples                          4.4.0-rc.4   True        False         False      97m
operator-lifecycle-manager                 4.4.0-rc.4   True        False         False      104m
operator-lifecycle-manager-catalog         4.4.0-rc.4   True        False         False      104m
operator-lifecycle-manager-packageserver   4.4.0-rc.4   True        False         False      99m
service-ca                                 4.4.0-rc.4   True        False         False      104m
service-catalog-apiserver                  4.4.0-rc.4   True        False         False      104m
service-catalog-controller-manager         4.4.0-rc.4   True        False         False      104m
storage                                    4.4.0-rc.4   True        False         False      100m


Version-Release number of the following components:
4.4.0-rc.4

How reproducible: 100%


Additional info:
http://file.rdu.redhat.com/lbednar/log-bundle-20200401100930.tar.gz
http://file.rdu.redhat.com/lbednar/must-gather.local.9180392784441685071.tag.gz

Comment 3 Lukas Bednar 2020-04-14 14:29:24 UTC
I was able to reproduce on 4.4.0-rc.8 as well.

Comment 4 Pierre Prinetti 2020-04-15 20:19:53 UTC
The reported debug message "Cluster operator insights Disabled is False with :" does not seem related to the problem.

Depending on the infrastructure performance, the installation takes longer than the global timeout of the installer; the cluster still converges to a healthy state independently from the installer itself.

Comment 5 Pierre Prinetti 2020-04-16 10:49:39 UTC
The Github PR refers to a new "known issues" section dedicated to this problem.

Comment 6 David Sanz 2020-04-16 11:00:03 UTC
Verified as it is a doc adding

Comment 10 Pierre Prinetti 2020-06-30 10:18:27 UTC
(In reply to szacks from comment #9)
> By adding it as a known problem and marking it as Verified, does that mean
> that you are not planning on fixing it by allowing a timeout parameter (for
> example)?

The timeout logic is defined at the orchestration level: a functional area that goes beyond the scope of OpenShift-on-OpenStack. Making changes at that level required coordination and perseverance.

However, we feel the timeout problem and we plan to tackle it in the context of our upcoming "baremetal workers" epic (which is planned for 4.6).

Comment 12 errata-xmlrpc 2020-07-13 17:24:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.