Bug 1804482 - Unable to bring up cluster on vsphere env
Summary: Unable to bring up cluster on vsphere env
Keywords:
Status: CLOSED DUPLICATE of bug 1798945
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.0
Assignee: Joseph Callen
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On: 1798945
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-18 23:33 UTC by Anurag saxena
Modified: 2020-03-16 14:44 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-24 13:49:36 UTC
Target Upstream Version:
anusaxen: needinfo-
anusaxen: needinfo-


Attachments (Terms of Use)

Description Anurag saxena 2020-02-18 23:33:26 UTC
Description of problem: Not able to bring up sdn or ovn clusters on vpshere. Suspecting installer issue as i noticed the installer now waits 20 minutes for kube api to get ready might not be giving enough time as opposed to earlier releases 30 min time. Is it intentional on 4.4 or an error?

level=info msg="Waiting up to 20m0s for the Kubernetes API at https://api.qe-anusaxen-vs1.qe.devcluster.openshift.com:6443..."
level=error msg="Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get https://api.qe-anusaxen-vs1.qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators: dial tcp 139.178.76.10:6443: i/o timeout"
level=info msg="Use the following commands to gather logs from the cluster"
level=info msg="openshift-install gather bootstrap --help"
level=fatal msg="waiting for Kubernetes API: context deadline exceeded"
+ exit 3

Version-Release number of the following components:4.4.0-0.nightly-2020-02-18-200822

How reproducible: Always

Steps to Reproduce:
1.Bring up OCP cluster on vsphere
2.
3.

Actual results:


Expected results: Cluster should come up fine on Vsphere env

Additional info:

Comment 1 Abhinav Dahiya 2020-02-18 23:38:55 UTC
> Suspecting installer issue as i noticed the installer now waits 20 minutes for kube api to get ready might not be giving enough time as opposed to earlier releases 30 min time. Is it intentional on 4.4 or an error?

This is intentional in 4.4


> level=info msg="Use the following commands to gather logs from the cluster"
> level=info msg="openshift-install gather bootstrap --help"
> level=fatal msg="waiting for Kubernetes API: context deadline exceeded"

Debugging requires that you attach the log bundle as requested by the installer.

Comment 2 Anurag saxena 2020-02-19 14:00:38 UTC
Thanks for confirming. I will try to gather more logs on this.

Comment 3 Anurag saxena 2020-02-19 16:48:49 UTC
@Abhinav, I can share the bootstrap node IP with you to look at. Please ping me when you are in. Thanks

Comment 5 Anurag saxena 2020-02-19 18:47:58 UTC
Thanks @Joseph for refering the PR. I will discuss this with installer QE team to find out more.

Comment 6 liujia 2020-02-20 01:59:34 UTC
We hit it 2 days ago in qe's ci test, the failure is caused by another known issue https://bugzilla.redhat.com/show_bug.cgi?id=1804032. Not installer issue.

Comment 7 Anurag saxena 2020-02-20 13:54:26 UTC
depends on Bug 1798945 as discussed with Jainlin/Jia from installer team. etcd operator issue

Comment 8 Anurag saxena 2020-02-20 13:57:57 UTC
Joseph, target version should be 4.4?

Comment 9 Joseph Callen 2020-02-20 14:49:00 UTC
Was the close a mistake?  What is the current status?

I have installed OCP 4.4 on vSphere w/UPI no problems (after the etcd operator issue was resolved)
CI [0] has flakes (not installer related) and passing at 50%.

[0] - https://prow.svc.ci.openshift.org/?job=*vsphere*4.4

Comment 10 Anurag saxena 2020-02-20 15:10:55 UTC
(Yep, close was a mistake) 
Joseph, The root cause seems to be the broken boot rhcos image(rhcos-44.81.202002071430-0), which is being tracked in Bug 1804032.
And if we use another old boot rhcos image(such as rhcos-44.81.202001241431.0), then we will hit another known issue https://bugzilla.redhat.com/show_bug.cgi?id=1798945#c8

So apparently an etcd+RHCOS component issue, not installer

Comment 11 Joseph Callen 2020-02-21 21:20:27 UTC
Why do we have this BZ when the issue is with components other than the installer?
Is there an issue with vSphere UPI that I can help with? If not this really should be closed.

Comment 12 Anurag saxena 2020-02-21 21:30:48 UTC
Joseph. Yes, we can change this to "closed Duplicate of 1798945" thats what i am hitting.

Comment 13 Joseph Callen 2020-02-24 13:49:36 UTC

*** This bug has been marked as a duplicate of bug 1798945 ***


Note You need to log in before you can comment on or make changes to this bug.