Bug 1751274
Summary: | [Bare Metal] OVN install fails on UPI | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anurag saxena <anusaxen> | ||||||||||
Component: | Networking | Assignee: | Ricardo Carrillo Cruz <ricarril> | ||||||||||
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> | ||||||||||
Status: | CLOSED DUPLICATE | Docs Contact: | |||||||||||
Severity: | medium | ||||||||||||
Priority: | medium | CC: | bbennett, cdc, danw, mcambria, mifiedle, pcameron, rbrattai, xtian, zshi, zzhao | ||||||||||
Version: | 4.2.0 | Keywords: | TestBlocker | ||||||||||
Target Milestone: | --- | Flags: | anusaxen:
needinfo-
|
||||||||||
Target Release: | 4.4.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1794775 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2020-01-24 16:11:41 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
> I could't find a way to go inside the cluster. Its failing in early stages.
It's bare metal... just make sure the machine is set up to let you ssh in, then "journal -u bootkube"
I will keep looking at it. Currently, our automation jobs are not disclosing actual baremetal hostname/ip but api.x.x.x:lb hostname. Gathering more info on that part.. Created attachment 1614565 [details]
bootkube logs
Some more info along with attachment $ oc get pods -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-76c57ddbd-jnzqx 4/4 Running 0 39m ovnkube-node-9zfnx 2/3 Running 8 39m ovnkube-node-bwmgc 3/3 Running 0 39m ovnkube-node-t66pt 2/3 Running 8 39m $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE cloud-credential 4.2.0-0.nightly-2019-09-11-074500 True False False 39m dns 4.2.0-0.nightly-2019-09-11-074500 True True True 38m insights 4.2.0-0.nightly-2019-09-11-074500 True False False 39m kube-apiserver False True True 39m kube-controller-manager False True True 39m kube-scheduler 4.2.0-0.nightly-2019-09-11-074500 False True True 39m machine-api 4.2.0-0.nightly-2019-09-11-074500 True False False 39m machine-config 4.2.0-0.nightly-2019-09-11-074500 False True True 39m network False True False 40m openshift-apiserver 4.2.0-0.nightly-2019-09-11-074500 Unknown Unknown True 39m openshift-controller-manager 4.2.0-0.nightly-2019-09-11-074500 False False False 33m operator-lifecycle-manager 4.2.0-0.nightly-2019-09-11-074500 True False False 38m operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-09-11-074500 True False False 38m operator-lifecycle-manager-packageserver False True False 38m service-ca 4.2.0-0.nightly-2019-09-11-074500 True False False 39m Anurag, were you able to get a must-gather? It will be tough to diagnose without the logs. (In reply to Casey Callendrello from comment #5) > Anurag, > were you able to get a must-gather? It will be tough to diagnose without the > logs. Hi Casey, i attached the "journal -u bootkube" as requested by Dan. Let me see again if must-gather is obtainable I believe this is fixed with the 4.3 work. Created attachment 1637836 [details]
bootkube logs 11/19
Please provide the RHCOS used. Also, oc get clusterversion and oc -n openshift-ovn-kubernetes get kube-apiserver -oyaml. I'm working on https://bugzilla.redhat.com/show_bug.cgi?id=1750606 , and I think they may be the same issue. This is still failing on recent 4.3 with same symptoms mentioned in comment 14 Just add a note about deploying OpenShift on OVN baremetal environment. We have a Baremetal CI job which runs UPI deployment of latest 4.4 OpenShift, it passes with version: 4.4.0-0.ci-2020-01-15-133915 https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCP-UPI-Install-4.3/10/console And a periodical run of 4.3 deployment shows that it passed with version : 4.3.0-0.nightly-2020-01-07-212456 https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OVN-UPI-Install-4.3/19/console @Anurag, would you please try with latest 4.3 or 4.4 and see if it gives a different result? Syncing with Phil on forum-sdn. FYI: i manually approved the pending CSRs but didn't see any progress on the cluster post that. Attaching bootkube logs here as well Created attachment 1654914 [details]
bootkube_logs Jan 23
Continue to track this in https://bugzilla.redhat.com/show_bug.cgi?id=1794775 Closing this one. *** This bug has been marked as a duplicate of bug 1794775 *** |
Created attachment 1614128 [details] install logs Description of problem: OVN install fails on Bare Metal. There is not much info except install logs. Bootstrapping is not successful. API is up but refusing the connections like $ oc login -u kubeadmin -p password error: dial tcp x.x.x.x:6443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running. I could't find a way to go inside the cluster. Its failing in early stages. Version-Release number of selected component (if applicable):4.2.0-0.nightly-2019-09-11-074500 How reproducible:Always Steps to Reproduce: 1. Install OVNKubernetes on Bare Metal cluster 2. 3. Actual results: unsuccessful installed Expected results:successful installation Additional info: