Bug 2102158
| Summary: | Unable to deploy 4.11 Dual Stack in hybrid cluster with two bare metal workers | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Greg Kopels <gkopels> | |
| Component: | Bare Metal Hardware Provisioning | Assignee: | Derek Higgins <derekh> | |
| Bare Metal Hardware Provisioning sub component: | cluster-api-provider | QA Contact: | Amit Ugol <augol> | |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | bzvonar, derekh, elevin | |
| Version: | 4.11 | Keywords: | TestBlocker, Triaged | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2110029 (view as bug list) | Environment: | ||
| Last Closed: | 2022-11-21 10:49:18 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2100035, 2110029 | |||
|
Description
Greg Kopels
2022-06-29 12:12:53 UTC
Deployment logs can be found here: https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/view/CNF-core/job/CNF/job/test-ocp-general-cnf-core-playground/32/ Kubernetes API at https://api.hlxcl7.lab.eng.tlv2.redhat.com:6443. "https://api.hlxcl7.lab.eng.tlv2.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=9541&timeoutSeconds=301&watch=true": dial tcp [2620:52:0:2e38::700]:6443: connect: no route to host 22:33:55 W0628 22:33:54.934837 4150634 reflector.go:324] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ConfigMap: Get *** Bug 2102157 has been marked as a duplicate of this bug. *** On one of the failing masters the ovnkube-node container is failing
[root@hlxcl7-master-1 core]# crictl logs 3a6f95d2a0507 |& tail
I0629 15:47:48.616774 28984 ovs.go:206] Exec(5): stderr: ""
I0629 15:47:48.616796 28984 ovs.go:202] Exec(6): /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 mac=c6\:05\:2d\:06\:c8\:28
I0629 15:47:48.621688 28984 ovs.go:205] Exec(6): stdout: ""
I0629 15:47:48.621718 28984 ovs.go:206] Exec(6): stderr: ""
I0629 15:47:48.686769 28984 gateway_init.go:261] Initializing Gateway Functionality
I0629 15:47:48.686923 28984 gateway_localnet.go:163] Node local addresses initialized to: map[10.130.0.2:{10.130.0.0 fffffe00} 10.46.56.76:{10.46.56.0 ffffff00} 127.0.0.1:{127.0.0.0 ff000000} 2620:52:0:2e38::706:{2620:52:0:2e38::706 ffffffffffffffffffffffffffffffff} ::1:{::1 ffffffffffffffffffffffffffffffff} fd01:0:0:3::2:{fd01:0:0:3:: ffffffffffffffff0000000000000000} fe80::5054:ff:fe57:18a8:{fe80:: ffffffffffffffff0000000000000000} fe80::68d6:50ff:fea4:19b3:{fe80:: ffffffffffffffff0000000000000000} fe80::c405:2dff:fe06:c828:{fe80:: ffffffffffffffff0000000000000000}]
I0629 15:47:48.687017 28984 helper_linux.go:71] Provided gateway interface "br-ex", found as index: 5
I0629 15:47:48.687083 28984 helper_linux.go:97] Found default gateway interface br-ex 10.46.56.254
I0629 15:47:48.687120 28984 helper_linux.go:71] Provided gateway interface "br-ex", found as index: 5
F0629 15:47:48.687179 28984 ovnkube.go:133] failed to get default gateway interface
We've also noticed that the ip= param in the kernel params is ip=dhcp, we'll try a test build with ip=dhcp,dhcp6 to represent dual stack setup and see if this fixes the issue
Note that the 3 dualstack nightlies on 4.11 (e2e-metal-ipi-serial-ovn-dualstack, metal-ipi-ovn-dualstack and e2e-metal-ipi-ovn-dualstack-local-gateway ) are all successfully deploying clusters so this doesn't appear to be a problem in all dualstack environments I've Also asked the reporter to test a PR that forces ip=dhcp,dhcp6 , I'll report back once we know if it made a difference Finally have cluster to work with. I tried deploying 4.10.20 dualstack and have the same issue as with 4.11. [gkopels@ ~]$ oc get nodes NAME STATUS ROLES AGE VERSION hlxcl7-master-0.hlxcl7.lab.eng.tlv2.redhat.com NotReady master 71m v1.23.5+3afdacb hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com NotReady master 71m v1.23.5+3afdacb hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com NotReady master 71m v1.23.5+3afdacb I am able to install 4.9 dualstack with no problem. Who can have a look with me? A draft PR has been created to test the fix. https://github.com/openshift/installer/pull/6063 I am able to create an build with the cluster-bot but am unable to deploy in our CI. Our CI does not have access to the repo where this build is stored. With some help I am trying to manually run our pipeline to pull this build. However so far I have been unsuccessful. I have a meeting again today to attempt the deployment. Hi @pparasur @dhiggins we are having difficulties trying to deploy the cluster-bot build in our CI. Issues with the repo and infrastructure to reach it. Is there anyway you can test the PR 6063 then merge it? At which point I can test it in the nightly image. Thanks We were able to run the build from cluster-bot with the fix https://github.com/openshift/installer/pull/6063. The result was the same as before. Workers don't come up. [root@helix08 tmp7]# oc get nodes NAME STATUS ROLES AGE VERSION hlxcl7-master-0.hlxcl7.lab.eng.tlv2.redhat.com NotReady master 3h11m v1.24.0+9546431 hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com NotReady master 3h11m v1.24.0+9546431 hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com NotReady master 3h11m v1.24.0+9546431 [root@helix08 tmp7]# Hi, Together with Derek Higgins we were able to validate a Dual Stack deployment with no issues using 4.11.0-0.nightly-2022-11-10-202051. Hi, Together with Derek Higgins we were able to validate a Dual Stack deployment with no issues using 4.11.0-0.nightly-2022-11-10-202051. Hi, Together with Derek Higgins we were able to validate a Dual Stack deployment with no issues using 4.11.0-0.nightly-2022-11-10-202051. Closing based on the above comments. |