Created attachment 1715095 [details] journalctl-log.txt Description of problem: ------------------------------------------------------------------------------------------------------- After configuring the cluster with IPV6 baremetal network and IPV6 provisioning network - the deployment failed with: "Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition..." When entering into one master, this error is received: [core@master-0-0 ~]$ systemctl status node-valid-hostname ● node-valid-hostname.service - Ensure the node hostname is valid for the cluster Loaded: loaded (/etc/systemd/system/node-valid-hostname.service; enabled; vendor preset: enabled) Active: failed (Result: timeout) since Wed 2020-09-16 07:24:54 UTC; 5h 36min ago Process: 3382 ExecStart=/bin/bash -c source /usr/local/sbin/set-valid-hostname.sh; wait_localhost; set_valid_hostname `hostname` (code=killed, signal=TERM) Main PID: 3382 (code=killed, signal=TERM) CPU: 653ms Sep 16 07:19:54 localhost.localdomain systemd[1]: Starting Ensure the node hostname is valid for the cluster... Sep 16 07:24:54 localhost.localdomain systemd[1]: node-valid-hostname.service: Start operation timed out. Terminating. Sep 16 07:24:54 localhost.localdomain systemd[1]: node-valid-hostname.service: Main process exited, code=killed, status=15/TERM Sep 16 07:24:54 localhost.localdomain systemd[1]: node-valid-hostname.service: Failed with result 'timeout'. Sep 16 07:24:54 localhost.localdomain systemd[1]: Failed to start Ensure the node hostname is valid for the cluster. Sep 16 07:24:54 localhost.localdomain systemd[1]: node-valid-hostname.service: Consumed 653ms CPU time In addition, there are also lots of failures when running journalctl from the bootstrap, for example: Sep 16 14:15:37 localhost hyperkube[2497]: E0916 14:15:37.438841 2497 pod_workers.go:191] Error syncing pod 82981e99c5f964f70bbe9b0bb1ef5d69 ("bootstrap-kube-controller-manager-localhost_kube-system(82981e99c5f964f70bbe9b0bb1ef5d69)"), skipping: failed to "StartContainer" for "cluster-policy-controller" with CrashLoopBackOff: "back-off 5m0s restarting failed container=cluster-policy-controller pod=bootstrap-kube-controller-manager-localhost_kube-system(82981e99c5f964f70bbe9b0bb1ef5d69)" (file attached). How reproducible: ----------------------------------------------------------------------------- Always. Steps to Reproduce: -------------------------------------------- 1. Deploy OCP4.6 with IPV6 baremetal network and IPV6 provisioning network (disconnect env). [ These images are been used: quay.io/openshift-release-dev/ocp-release:4.6.0-fc.5-x86_64, registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-16-000734 ] Actual results: -------------------------------- Deployment of OCP4.6 failed. Expected results: -------------------------------- Deployment OCP4.6 finished successfully, with no errors. Additional info: -------------------------------- oc adm must-gather - failed to generate logs.
Could it be that the masters do not have DNS records in the upstream DNS used in this deployment? NetworkManager has a bug[0] in the current release that prevents it from passing the dhcpv6 hostname option (basically, parsing was not done in the internal client). [0] https://bugzilla.redhat.com/show_bug.cgi?id=1858344
After several rechecking, the deployment passed with no issues, so i'm Verifying the bug: NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-29-162625 True False 66s Cluster version is 4.6.0-0.nightly-2020-09-29-162625 [kni@provisionhost-0-0 ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-0.nightly-2020-09-29-162625 True False False 3m21s cloud-credential 4.6.0-0.nightly-2020-09-29-162625 True False False 65m cluster-autoscaler 4.6.0-0.nightly-2020-09-29-162625 True False False 27m config-operator 4.6.0-0.nightly-2020-09-29-162625 True False False 51m console 4.6.0-0.nightly-2020-09-29-162625 True False False 7m29s csi-snapshot-controller 4.6.0-0.nightly-2020-09-29-162625 True False False 28m dns 4.6.0-0.nightly-2020-09-29-162625 True False False 50m etcd 4.6.0-0.nightly-2020-09-29-162625 True False False 49m image-registry 4.6.0-0.nightly-2020-09-29-162625 True False False 28m ingress 4.6.0-0.nightly-2020-09-29-162625 True False False 10m insights 4.6.0-0.nightly-2020-09-29-162625 True False False 28m kube-apiserver 4.6.0-0.nightly-2020-09-29-162625 True False False 48m kube-controller-manager 4.6.0-0.nightly-2020-09-29-162625 True False False 47m kube-scheduler 4.6.0-0.nightly-2020-09-29-162625 True False False 49m kube-storage-version-migrator 4.6.0-0.nightly-2020-09-29-162625 True False False 10m machine-api 4.6.0-0.nightly-2020-09-29-162625 True False False 19m machine-approver 4.6.0-0.nightly-2020-09-29-162625 True False False 50m machine-config 4.6.0-0.nightly-2020-09-29-162625 True False False 50m marketplace 4.6.0-0.nightly-2020-09-29-162625 True False False 27m monitoring 4.6.0-0.nightly-2020-09-29-162625 True False False 10m network 4.6.0-0.nightly-2020-09-29-162625 True False False 51m node-tuning 4.6.0-0.nightly-2020-09-29-162625 True False False 51m openshift-apiserver 4.6.0-0.nightly-2020-09-29-162625 True False False 31m openshift-controller-manager 4.6.0-0.nightly-2020-09-29-162625 True False False 28m openshift-samples 4.6.0-0.nightly-2020-09-29-162625 True False False 28m operator-lifecycle-manager 4.6.0-0.nightly-2020-09-29-162625 True False False 50m operator-lifecycle-manager-catalog 4.6.0-0.nightly-2020-09-29-162625 True False False 51m operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2020-09-29-162625 True False False 29m service-ca 4.6.0-0.nightly-2020-09-29-162625 True False False 51m storage 4.6.0-0.nightly-2020-09-29-162625 True False False 51m 10:49 [kni@provisionhost-0-0 ~]$ oc get endpoints -A NAMESPACE NAME ENDPOINTS AGE default kubernetes [fd2e:6f44:5dd8::12b]:6443,[fd2e:6f44:5dd8::13a]:6443,[fd2e:6f44:5dd8::145]:6443 67m kube-system kube-scheduler <none> 67m kube-system kubelet [fd2e:6f44:5dd8::145]:10250,[fd2e:6f44:5dd8::12b]:10250,[fd2e:6f44:5dd8::13a]:10250 + 12 more... 27m openshift-apiserver-operator metrics [fd01:0:0:3::9]:8443 66m openshift-apiserver api [fd01:0:0:1::10]:8443,[fd01:0:0:2::d]:8443,[fd01:0:0:3::1e]:8443 51m openshift-apiserver check-endpoints [fd01:0:0:1::10]:17698,[fd01:0:0:2::d]:17698,[fd01:0:0:3::1e]:17698 49m openshift-authentication-operator metrics [fd01:0:0:3::12]:8443 66m openshift-authentication oauth-openshift [fd01:0:0:1::34]:6443,[fd01:0:0:2::36]:6443 51m openshift-cloud-credential-operator cco-metrics [fd01:0:0:1::23]:8443 66m openshift-cluster-machine-approver machine-approver [fd2e:6f44:5dd8::13a]:9192 66m openshift-cluster-samples-operator metrics [fd01:0:0:1::21]:60000 48m openshift-cluster-storage-operator csi-snapshot-controller-operator-metrics [fd01:0:0:3::5]:8443 66m openshift-cluster-version cluster-version-operator [fd2e:6f44:5dd8::145]:9099 67m openshift-config-operator metrics [fd01:0:0:3::c]:8443 66m openshift-console-operator metrics [fd01:0:0:2::1d]:8443 43m openshift-console console [fd01:0:0:1::2f]:8443,[fd01:0:0:3::39]:8443 30m openshift-console downloads [fd01:0:0:1::22]:8080,[fd01:0:0:2::21]:8080 30m openshift-controller-manager-operator metrics [fd01:0:0:3::7]:8443 66m openshift-controller-manager controller-manager [fd01:0:0:1::20]:8443,[fd01:0:0:2::1e]:8443,[fd01:0:0:3::2e]:8443 51m openshift-dns-operator metrics [fd01:0:0:3::b]:9393 66m openshift-dns dns-default [fd01:0:0:1::8]:5353,[fd01:0:0:2::7]:5353,[fd01:0:0:3::17]:5353 + 12 more... 50m openshift-etcd-operator metrics [fd01:0:0:1::5]:8443 66m openshift-etcd etcd [fd2e:6f44:5dd8::12b]:2379,[fd2e:6f44:5dd8::13a]:2379,[fd2e:6f44:5dd8::145]:2379 + 3 more... 67m openshift-etcd host-etcd-2 [fd2e:6f44:5dd8::13a]:2379,[fd2e:6f44:5dd8::12b]:2379,[fd2e:6f44:5dd8::145]:2379 67m openshift-image-registry image-registry-operator [fd01:0:0:2::23]:60000 66m openshift-ingress-operator metrics [fd01:0:0:2::24]:9393 66m openshift-ingress router-internal-default [fd2e:6f44:5dd8::101]:1936,[fd2e:6f44:5dd8::142]:1936,[fd2e:6f44:5dd8::101]:443 + 3 more... 28m openshift-insights metrics [fd01:0:0:1::24]:8443 66m openshift-kube-apiserver-operator metrics [fd01:0:0:3::10]:8443 66m openshift-kube-apiserver apiserver [fd2e:6f44:5dd8::12b]:6443,[fd2e:6f44:5dd8::13a]:6443,[fd2e:6f44:5dd8::145]:6443 51m openshift-kube-controller-manager-operator metrics [fd01:0:0:3::4]:8443 66m openshift-kube-controller-manager kube-controller-manager [fd2e:6f44:5dd8::12b]:10257,[fd2e:6f44:5dd8::13a]:10257,[fd2e:6f44:5dd8::145]:10257 51m openshift-kube-scheduler-operator metrics [fd01:0:0:3::8]:8443 66m openshift-kube-scheduler scheduler [fd2e:6f44:5dd8::12b]:10259,[fd2e:6f44:5dd8::13a]:10259,[fd2e:6f44:5dd8::145]:10259 51m openshift-kube-storage-version-migrator-operator metrics [fd01:0:0:3::a]:8443 66m openshift-machine-api cluster-autoscaler-operator [fd01:0:0:2::22]:9192,[fd01:0:0:2::22]:8443 66m openshift-machine-api machine-api-controllers [fd01:0:0:2::27]:8442,[fd01:0:0:2::27]:8441,[fd01:0:0:2::27]:8444 66m openshift-machine-api machine-api-operator [fd01:0:0:2::26]:8443 66m openshift-machine-api machine-api-operator-webhook [fd01:0:0:2::27]:8443 66m openshift-machine-config-operator machine-config-daemon [fd2e:6f44:5dd8::101]:9001,[fd2e:6f44:5dd8::12b]:9001,[fd2e:6f44:5dd8::13a]:9001 + 2 more... 66m openshift-marketplace certified-operators 28m openshift-marketplace community-operators 28m openshift-marketplace marketplace-operator-metrics [fd01:0:0:2::20]:8081,[fd01:0:0:2::20]:8383 66m openshift-marketplace redhat-marketplace 28m openshift-marketplace redhat-operators 28m openshift-monitoring alertmanager-main [fd01:0:0:4::e]:9095,[fd01:0:0:4::f]:9095,[fd01:0:0:5::c]:9095 + 3 more... 10m openshift-monitoring alertmanager-operated [fd01:0:0:4::e]:9095,[fd01:0:0:4::f]:9095,[fd01:0:0:5::c]:9095 + 6 more... 10m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633
I'm not sure why the bot reopened this, but lacking any apparent reason I'm going to re-close it.