Bug 1888301 - Openshift installation fails - master node is not ready
Summary: Openshift installation fails - master node is not ready
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.5.z
Assignee: Martin André
QA Contact: weiwei jiang
: 1888520 (view as bug list)
Depends On: 1875005
Blocks: 1847179
TreeView+ depends on / blocked
Reported: 2020-10-14 14:25 UTC by Itzik Brown
Modified: 2021-02-10 21:26 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-11-10 14:53:54 UTC
Target Upstream Version:

Attachments (Terms of Use)
openshift install log (115.97 KB, text/plain)
2020-10-14 14:25 UTC, Itzik Brown
no flags Details

System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 105 0 None closed Bug 1888301: Add check for iptables rule to keepalived-monitor 2021-02-19 12:51:26 UTC
Red Hat Product Errata RHBA-2020:4425 0 None None None 2020-11-10 14:54:08 UTC

Description Itzik Brown 2020-10-14 14:25:26 UTC
Created attachment 1721500 [details]
openshift install log



Please specify:

What happened?
Installation failed to complete.
Only master nodes were stared but only not were ready
$ oc get nodes
NAME                    STATUS     ROLES    AGE   VERSION
ostest-k25cj-master-0   Ready      master   9h    v1.18.3+2fbd7c7
ostest-k25cj-master-1   NotReady   master   9h    v1.18.3+2fbd7c7
ostest-k25cj-master-2   Ready      master   9h    v1.18.3+2fbd7c7

Comment 3 Stephen Cuppett 2020-10-14 15:10:21 UTC
Setting target release to the active development branch (4.7.0). For any fixes, where required and requested, cloned BZs will be created for those release maintenance streams where appropriate once they are identified.

Setting a tentative severity based on description as provided.

Comment 4 Martin André 2020-10-15 10:33:06 UTC
Looking at the cluster, I noticed the keepalived check scripts are not rendered properly. The LBConfig.LbPort and LBConfig.ApiPort variables are unset, causing the scripts to have wrong ports:

[core@ostest-k25cj-master-0 ~]$ cat /etc/keepalived/chk_ocp_script_both.sh 
/usr/bin/curl -o /dev/null -kLfs https://localhost:0/readyz && [ -e /var/run/keepalived/iptables-rule-exists ] || /usr/bin/curl -kLfs https://localhost:0/readyz

This is preventing the API VIP from moving to another node if it is not healthy.

It turns out we were missing a 4.5 backport for https://github.com/openshift/baremetal-runtimecfg/pull/70 when we backported https://github.com/openshift/machine-config-operator/pull/2110.

Comment 6 Martin André 2020-10-15 11:33:53 UTC
Trying again after setting what I think should be the dependent bug.

Comment 7 Martin André 2020-10-15 13:28:13 UTC
*** Bug 1888520 has been marked as a duplicate of this bug. ***

Comment 13 David Sanz 2020-11-03 14:32:51 UTC
No more errors found on 4.5 tests, marking bug as verified

Comment 15 errata-xmlrpc 2020-11-10 14:53:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.