Description of problem: In a typical CI install of OCP 4 we can expect a reproducible and consistent number of revisions for the static pod operators. Revisions are caused by changes to key resources that the operator is watching for a change. These resources include secrets, configmaps, and nodes among others.
A static pod revision is essentially an on disk representation of the operand's resources for its namespace at any given time. When all of the nodes are available at the same time some processes such as TLS certificate creation for all of the etcd members can happen at the same time. The net result is a minimization of revisions.
Each revision requires the operand to restart, which in the case of etcd costly because leader change is a required result.
Currently, etcd has been observed with 6 recisions as a result of scaling with assisted-installer. In some extreme cases, this has resulted in etcd with terms as high a 90.
Install should be a graceful and predictable process.
Version-Release number of selected component (if applicable):
How reproducible: fairly
Steps to Reproduce:
1. create assisted install and observe logs
Actual results: unstable control-plane on some installs can result in failure.
Expected results: install workflow should be consistent for the control-plane and in a way that minimizes disruption.
This bug is awaiting verification
I tried installation many times and have not hit this issue, then contacted with install team qe, they also have not hit it, change to verify status.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.