Hide Forgot
Description of problem: ----------------------- Dualstack installation of 4.7 failed: ... time="2021-01-12T10:38:41Z" level=debug msg="Log bundle written to /var/home/core/log-bundle-20210112103755.tar.gz" time="2021-01-12T10:38:41Z" level=info msg="Bootstrap gather logs captured here \"/home/kni/clusterconfigs/log-bundle-20210112103755.tar.gz\"" time="2021-01-12T10:38:41Z" level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition" ------------------------------------------ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication False False True 106m baremetal 4.7.0-0.nightly-2021-01-10-070949 True True False 106m cloud-credential 4.7.0-0.nightly-2021-01-10-070949 True False False 123m cluster-autoscaler 4.7.0-0.nightly-2021-01-10-070949 True False False 81m config-operator 4.7.0-0.nightly-2021-01-10-070949 True False False 106m console csi-snapshot-controller 4.7.0-0.nightly-2021-01-10-070949 True False False 105m dns 4.7.0-0.nightly-2021-01-10-070949 True False False 80m etcd 4.7.0-0.nightly-2021-01-10-070949 True True False 102m image-registry 4.7.0-0.nightly-2021-01-10-070949 True False False 76m ingress False True True 75m insights 4.7.0-0.nightly-2021-01-10-070949 True False False 98m kube-apiserver 4.7.0-0.nightly-2021-01-10-070949 True True False 80m kube-controller-manager 4.7.0-0.nightly-2021-01-10-070949 True True False 102m kube-scheduler 4.7.0-0.nightly-2021-01-10-070949 True True False 79m kube-storage-version-migrator 4.7.0-0.nightly-2021-01-10-070949 False False False 106m machine-api 4.7.0-0.nightly-2021-01-10-070949 True False False 78m machine-approver 4.7.0-0.nightly-2021-01-10-070949 True False False 105m machine-config 4.7.0-0.nightly-2021-01-10-070949 True False False 79m marketplace 4.7.0-0.nightly-2021-01-10-070949 True False False 80m monitoring False True True 98m network 4.7.0-0.nightly-2021-01-10-070949 True True False 105m node-tuning 4.7.0-0.nightly-2021-01-10-070949 True False False 81m openshift-apiserver 4.7.0-0.nightly-2021-01-10-070949 True False False 75m openshift-controller-manager 4.7.0-0.nightly-2021-01-10-070949 True False False 102m openshift-samples operator-lifecycle-manager 4.7.0-0.nightly-2021-01-10-070949 True False False 81m operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-01-10-070949 True False False 81m operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-01-10-070949 True False False 75m service-ca 4.7.0-0.nightly-2021-01-10-070949 True False False 106m storage 4.7.0-0.nightly-2021-01-10-070949 True False False 106m ---------------------------------------------- oc get po -A | grep Crash openshift-authentication-operator authentication-operator-f6987fb-6vgml 0/1 CrashLoopBackOff 19 125m openshift-console-operator console-operator-6b77dd4477-lht5h 0/1 CrashLoopBackOff 25 75m openshift-machine-api cluster-autoscaler-operator-c48656855-zrjjq 1/2 CrashLoopBackOff 26 125m openshift-machine-api machine-api-controllers-d46575786-bwfnx 5/7 CrashLoopBackOff 71 82m openshift-machine-api metal3-bd4968444-nz5xh 9/10 CrashLoopBackOff 17 106m openshift-machine-api metal3-image-cache-bxq7k 0/1 Init:CrashLoopBackOff 16 106m openshift-machine-api metal3-image-cache-v6frq 0/1 Init:CrashLoopBackOff 16 106m ----------------------------------------------------------------------- oc logs console-operator-6b77dd4477-lht5h -n openshift-console-operator I0112 11:56:55.637630 1 cmd.go:200] Using service-serving-cert provided certificates I0112 11:56:55.638602 1 observer_polling.go:159] Starting file observer W0112 11:57:20.640921 1 builder.go:207] unable to get owner reference (falling back to namespace): Get "https://172.30.0.1:443/api/v1/namespaces/openshift-console-operator/pods/console-operator-6b 77dd4477-lht5h": dial tcp 172.30.0.1:443: i/o timeout I0112 11:57:20.641295 1 builder.go:238] console-operator version - I0112 11:57:20.642440 1 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key" I0112 11:57:27.198344 1 cmd.go:88] Received SIGTERM or SIGINT signal, shutting down controller. I0112 11:57:30.639053 1 observer_polling.go:162] Shutting down file observer F0112 11:57:46.203622 1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authenti$ ation": dial tcp 172.30.0.1:443: i/o timeout goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc000132001, 0xc0008741c0, 0xf7, 0x1af) /go/src/github.com/openshift/console-operator/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9 k8s.io/klog/v2.(*loggingT).output(0x36224a0, 0xc000000003, 0x0, 0x0, 0xc00067c150, 0x353db58, 0x6, 0x81, 0x414300) /go/src/github.com/openshift/console-operator/vendor/k8s.io/klog/v2/klog.go:975 +0x19b k8s.io/klog/v2.(*loggingT).printDepth(0x36224a0, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1, 0xc000662090, 0x1, 0x1) /go/src/github.com/openshift/console-operator/vendor/k8s.io/klog/v2/klog.go:732 +0x16f k8s.io/klog/v2.(*loggingT).print(...) /go/src/github.com/openshift/console-operator/vendor/k8s.io/klog/v2/klog.go:714 k8s.io/klog/v2.Fatal(...) /go/src/github.com/openshift/console-operator/vendor/k8s.io/klog/v2/klog.go:1482 github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).NewCommandWithContext.func1(0xc000268dc0, 0xc00090cd60, 0x0, 0x2) /go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/cmd.go:129 +0x785 github.com/spf13/cobra.(*Command).execute(0xc000268dc0, 0xc00090cd40, 0x2, 0x2, 0xc000268dc0, 0xc00090cd40) /go/src/github.com/openshift/console-operator/vendor/github.com/spf13/cobra/command.go:846 +0x2c2 github.com/spf13/cobra.(*Command).ExecuteC(0xc000268000, 0xc000134120, 0x3621ee0, 0x0) /go/src/github.com/openshift/console-operator/vendor/github.com/spf13/cobra/command.go:950 +0x375 github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/console-operator/vendor/github.com/spf13/cobra/command.go:887 main.main() /go/src/github.com/openshift/console-operator/cmd/console/main.go:38 +0x155 ---------------------------------------------------------------------------------- oc logs authentication-operator-f6987fb-6vgml -n openshift-authentication-operator Copying system trust bundle I0112 11:58:03.869496 1 observer_polling.go:159] Starting file observer I0112 11:58:03.869774 1 cmd.go:200] Using service-serving-cert provided certificates I0112 11:58:03.870600 1 observer_polling.go:159] Starting file observer W0112 11:58:28.872689 1 builder.go:207] unable to get owner reference (falling back to namespace): Get "https://172.30.0.1:443/api/v1/namespaces/openshift-authentication-operator/pods/authenticati on-operator-f6987fb-6vgml": dial tcp 172.30.0.1:443: i/o timeout I0112 11:58:28.872862 1 builder.go:238] cluster-authentication-operator version v4.7.0-202101090212.p0-0-gda548f8- I0112 11:58:28.874389 1 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key" F0112 11:59:20.828040 1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentic ation": dial tcp 172.30.0.1:443: i/o timeout goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc000012001, 0xc00031cc80, 0xf7, 0x260) /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/klog/v2/klog.go:996 +0xb9 k8s.io/klog/v2.(*loggingT).output(0x365ffc0, 0xc000000003, 0x0, 0x0, 0xc00083c230, 0x349da6c, 0x6, 0x81, 0x0) /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/klog/v2/klog.go:945 +0x191 k8s.io/klog/v2.(*loggingT).printDepth(0x365ffc0, 0x3, 0x0, 0x0, 0x1, 0xc000e69c30, 0x1, 0x1) /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/klog/v2/klog.go:718 +0x165 k8s.io/klog/v2.(*loggingT).print(...) /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/klog/v2/klog.go:703 k8s.io/klog/v2.Fatal(...) /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/klog/v2/klog.go:1443 github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).NewCommandWithContext.func1(0xc000622000, 0xc0005c4ae0, 0x0, 0x3) /go/src/github.com/openshift/cluster-authentication-operator/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/cmd.go:129 +0x6fe github.com/spf13/cobra.(*Command).execute(0xc000622000, 0xc0005c4ab0, 0x3, 0x3, 0xc000622000, 0xc0005c4ab0) /go/src/github.com/openshift/cluster-authentication-operator/vendor/github.com/spf13/cobra/command.go:846 +0x2c2 github.com/spf13/cobra.(*Command).ExecuteC(0xc00049db80, 0xc0000b8180, 0x365faa0, 0x0) /go/src/github.com/openshift/cluster-authentication-operator/vendor/github.com/spf13/cobra/command.go:950 +0x375 github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/cluster-authentication-operator/vendor/github.com/spf13/cobra/command.go:887 main.main() /go/src/github.com/openshift/cluster-authentication-operator/cmd/authentication-operator/main.go:29 +0x155 Version-Release number of selected component (if applicable): ------------------------------------------------------------- 4.7.0-0.nightly-2021-01-10-070949 Steps to Reproduce: ------------------- 1. Try to deploy dual stack 4.7 cluster with baremetal ipi Actual results: --------------- Deployment failed Expected results: ----------------- Deployment succeeded Additional info: ---------------- Virtual deployment, 3masters + 2workers, deployment with must-gather failed as well: --------------------------- [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD WARNING: Collecting one or more service logs on ALL linux master nodes in your cluster. This could take a large amount of time. [must-gather-mwm9m] POD INFO: Collecting host service logs for kubelet [must-gather-mwm9m] POD INFO: Collecting host service logs for crio [must-gather-mwm9m] POD INFO: Collecting host service logs for machine-config-daemon-firstboot [must-gather-mwm9m] POD INFO: Collecting host service logs for machine-config-daemon-host [must-gather-mwm9m] POD INFO: Waiting for worker host service log collection to complete ... [must-gather-mwm9m] POD error: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD error: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: i/o timeout [must-gather-mwm9m] POD error: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: i/o timeout
It is a race condition in the new service controller, it runs a function to deal with stale data, but it wrongly delete the right services. We can see in the logs: 2021-01-12T10:44:42.718Z|00363|nbctl|INFO|Running command run --if-exists -- remove load_balancer 0d44f11a-6c61-457b-8065-251f39c9ce22 vips "\"172.30.0.10:9154\"" I0112 10:44:42.721772 1 repair.go:106] Deleting non-existing Kubernetes vip 172.30.0.1:443 from OVN load balancer [0d44f11a-6c61-457b-8065-251f39c9ce22 58d9cf35-1960-4220-b224-51a0158cc12d c0d0698d-1e05-43ec-85c2-6458d25d46a9 7186b91e-cfa0-4509-b68a-b4f7a6fdba67] 2021-01-12T10:44:42.731Z|00368|nbctl|INFO|Running command run --if-exists -- remove load_balancer 0d44f11a-6c61-457b-8065-251f39c9ce22 vips "\"172.30.0.1:443\"" I0112 10:44:42.917421 1 repair.go:106] Deleting non-existing Kubernetes vip 172.30.0.10:53 from OVN load balancer [342e0ee1-7d07-4990-bb0a-60bc2cc5a3ad adb10c2a-8bd5-4ba9-9400-f5df73286251 cfde3642-134d-4e82-9e73-2e20471d4fb3 ac843e5a-89ac-475b-ab61-997c13ba8087] Fix in https://github.com/ovn-org/ovn-kubernetes/pull/1945
Adding the testblocker keyword since it's blocking the test for IPI on Bare Metal with Dualstack
this issue should be fixed. move this issue to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633