Bug 1751274 - [Bare Metal] OVN install fails on UPI
Summary: [Bare Metal] OVN install fails on UPI
Keywords:
Status: CLOSED DUPLICATE of bug 1794775
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.4.0
Assignee: Ricardo Carrillo Cruz
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-11 15:10 UTC by Anurag saxena
Modified: 2020-01-24 16:11 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1794775 (view as bug list)
Environment:
Last Closed: 2020-01-24 16:11:41 UTC
Target Upstream Version:
Embargoed:
anusaxen: needinfo-


Attachments (Terms of Use)
install logs (70.98 KB, text/plain)
2019-09-11 15:10 UTC, Anurag saxena
no flags Details
bootkube logs (180.53 KB, text/plain)
2019-09-12 15:11 UTC, Anurag saxena
no flags Details
bootkube logs 11/19 (579.87 KB, text/plain)
2019-11-19 18:06 UTC, Anurag saxena
no flags Details
bootkube_logs Jan 23 (623.14 KB, text/plain)
2020-01-23 20:41 UTC, Anurag saxena
no flags Details

Description Anurag saxena 2019-09-11 15:10:02 UTC
Created attachment 1614128 [details]
install logs

Description of problem: OVN install fails on Bare Metal. There is not much info except install logs. Bootstrapping is not successful. API is up but refusing the connections like

$ oc login -u kubeadmin -p password
error: dial tcp x.x.x.x:6443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running.

I could't find a way to go inside the cluster. Its failing in early stages.


Version-Release number of selected component (if applicable):4.2.0-0.nightly-2019-09-11-074500


How reproducible:Always


Steps to Reproduce:
1. Install OVNKubernetes on Bare Metal cluster
2.
3.

Actual results: unsuccessful installed


Expected results:successful installation


Additional info:

Comment 1 Dan Winship 2019-09-11 16:38:29 UTC
> I could't find a way to go inside the cluster. Its failing in early stages.

It's bare metal... just make sure the machine is set up to let you ssh in, then "journal -u bootkube"

Comment 2 Anurag saxena 2019-09-11 18:17:22 UTC
I will keep looking at it. Currently, our automation jobs are not disclosing actual baremetal hostname/ip but api.x.x.x:lb hostname. Gathering more info on that part..

Comment 3 Anurag saxena 2019-09-12 15:11:41 UTC
Created attachment 1614565 [details]
bootkube logs

Comment 4 Anurag saxena 2019-09-12 15:12:32 UTC
Some more info along with attachment

$ oc get pods -n openshift-ovn-kubernetes 
NAME                             READY   STATUS    RESTARTS   AGE
ovnkube-master-76c57ddbd-jnzqx   4/4     Running   0          39m
ovnkube-node-9zfnx               2/3     Running   8          39m
ovnkube-node-bwmgc               3/3     Running   0          39m
ovnkube-node-t66pt               2/3     Running   8          39m

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                           4.2.0-0.nightly-2019-09-11-074500   True        False         False      39m
dns                                        4.2.0-0.nightly-2019-09-11-074500   True        True          True       38m
insights                                   4.2.0-0.nightly-2019-09-11-074500   True        False         False      39m
kube-apiserver                                                                 False       True          True       39m
kube-controller-manager                                                        False       True          True       39m
kube-scheduler                             4.2.0-0.nightly-2019-09-11-074500   False       True          True       39m
machine-api                                4.2.0-0.nightly-2019-09-11-074500   True        False         False      39m
machine-config                             4.2.0-0.nightly-2019-09-11-074500   False       True          True       39m
network                                                                        False       True          False      40m
openshift-apiserver                        4.2.0-0.nightly-2019-09-11-074500   Unknown     Unknown       True       39m
openshift-controller-manager               4.2.0-0.nightly-2019-09-11-074500   False       False         False      33m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-09-11-074500   True        False         False      38m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-09-11-074500   True        False         False      38m
operator-lifecycle-manager-packageserver                                       False       True          False      38m
service-ca                                 4.2.0-0.nightly-2019-09-11-074500   True        False         False      39m

Comment 5 Casey Callendrello 2019-09-17 14:13:35 UTC
Anurag,
were you able to get a must-gather? It will be tough to diagnose without the logs.

Comment 6 Anurag saxena 2019-09-17 14:39:06 UTC
(In reply to Casey Callendrello from comment #5)
> Anurag,
> were you able to get a must-gather? It will be tough to diagnose without the
> logs.

Hi Casey, i attached the "journal -u bootkube" as requested by Dan. Let me see again if must-gather is obtainable

Comment 12 Ben Bennett 2019-11-14 20:04:18 UTC
I believe this is fixed with the 4.3 work.

Comment 13 Anurag saxena 2019-11-19 18:06:05 UTC
Created attachment 1637836 [details]
bootkube logs 11/19

Comment 15 Ricardo Carrillo Cruz 2019-11-22 09:57:30 UTC
Please provide the RHCOS used.
Also, oc get clusterversion and oc -n openshift-ovn-kubernetes get kube-apiserver -oyaml.

I'm working on https://bugzilla.redhat.com/show_bug.cgi?id=1750606 , and I think they may be the same issue.

Comment 19 Anurag saxena 2019-12-10 19:35:19 UTC
This is still failing on recent 4.3 with same symptoms mentioned in comment 14

Comment 20 zenghui.shi 2020-01-18 00:23:29 UTC
Just add a note about deploying OpenShift on OVN baremetal environment.
We have a Baremetal CI job which runs UPI deployment of latest 4.4 OpenShift, it passes with version: 4.4.0-0.ci-2020-01-15-133915
https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCP-UPI-Install-4.3/10/console

And a periodical run of 4.3 deployment shows that it passed with version : 4.3.0-0.nightly-2020-01-07-212456
https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OVN-UPI-Install-4.3/19/console

@Anurag, would you please try with latest 4.3 or 4.4 and see if it gives a different result?

Comment 23 Anurag saxena 2020-01-23 20:40:44 UTC
Syncing with Phil on forum-sdn.
FYI: i manually approved the pending CSRs but didn't see any progress on the cluster post that. 
Attaching bootkube logs here as well

Comment 24 Anurag saxena 2020-01-23 20:41:20 UTC
Created attachment 1654914 [details]
bootkube_logs Jan 23

Comment 25 Anurag saxena 2020-01-24 16:11:41 UTC
Continue to track this in https://bugzilla.redhat.com/show_bug.cgi?id=1794775
Closing this one.

*** This bug has been marked as a duplicate of bug 1794775 ***


Note You need to log in before you can comment on or make changes to this bug.