Description of problem: node not ready when upgrade on Baremetal disconnected cluster for UPI. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-09-18-211009 -> 4.2.0-0.nightly-2019-09-20-014748 How reproducible: Steps to Reproduce: 1. install Baremetal disconnected cluster for UPI cluster with 4.2.0-0.nightly-2019-09-18-211009, then upgrade to 4.2.0-0.nightly-2019-09-20-014748 2. 3. Actual results: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-09-18-211009 True True 5h57m Unable to apply 4.2.0-0.nightly-2019-09-20-014748: the cluster operator kube-apiserver is degraded $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.2.0-0.nightly-2019-09-20-014748 True False False 29h cloud-credential 4.2.0-0.nightly-2019-09-20-014748 True False False 29h cluster-autoscaler 4.2.0-0.nightly-2019-09-20-014748 True False False 29h console 4.2.0-0.nightly-2019-09-20-014748 True False False 4h4m dns 4.2.0-0.nightly-2019-09-20-014748 True True True 29h image-registry 4.2.0-0.nightly-2019-09-20-014748 True False False 9h ingress 4.2.0-0.nightly-2019-09-20-014748 True False False 29h insights 4.2.0-0.nightly-2019-09-20-014748 True False False 29h kube-apiserver 4.2.0-0.nightly-2019-09-20-014748 True False True 29h kube-controller-manager 4.2.0-0.nightly-2019-09-20-014748 True False True 29h kube-scheduler 4.2.0-0.nightly-2019-09-20-014748 True False True 29h machine-api 4.2.0-0.nightly-2019-09-20-014748 True False False 29h machine-config 4.2.0-0.nightly-2019-09-18-211009 False False True 130m marketplace 4.2.0-0.nightly-2019-09-20-014748 True False False 4h4m monitoring 4.2.0-0.nightly-2019-09-20-014748 True False False 9h network 4.2.0-0.nightly-2019-09-18-211009 True True False 29h node-tuning 4.2.0-0.nightly-2019-09-20-014748 True False False 4h5m openshift-apiserver 4.2.0-0.nightly-2019-09-20-014748 True False False 9h openshift-controller-manager 4.2.0-0.nightly-2019-09-20-014748 True False False 29h openshift-samples 4.2.0-0.nightly-2019-09-20-014748 True False False 4h5m operator-lifecycle-manager 4.2.0-0.nightly-2019-09-20-014748 True False False 29h operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-09-20-014748 True False False 29h operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-09-20-014748 True False False 4h4m service-ca 4.2.0-0.nightly-2019-09-20-014748 True False False 29h service-catalog-apiserver 4.2.0-0.nightly-2019-09-20-014748 True False False 9h service-catalog-controller-manager 4.2.0-0.nightly-2019-09-20-014748 True False False 25h storage 4.2.0-0.nightly-2019-09-20-014748 True False False 4h5m $ oc get node NAME STATUS ROLES AGE VERSION qe-yapei-uos2-6dbch-compute-0 NotReady worker 29h v1.14.6+147115512 qe-yapei-uos2-6dbch-compute-1 NotReady worker 29h v1.14.6+147115512 qe-yapei-uos2-6dbch-compute-2 NotReady worker 29h v1.14.6+147115512 qe-yapei-uos2-6dbch-control-plane-0 NotReady master 29h v1.14.6+147115512 qe-yapei-uos2-6dbch-control-plane-1 NotReady master 29h v1.14.6+147115512 qe-yapei-uos2-6dbch-control-plane-2 NotReady master 29h v1.14.6+147115512 $ oc describe node qe-yapei-uos2-6dbch-compute-0 Name: qe-yapei-uos2-6dbch-compute-0 Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=qe-yapei-uos2-6dbch-compute-0 kubernetes.io/os=linux node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos Annotations: machineconfiguration.openshift.io/currentConfig: rendered-worker-0b2751ccc4b6d419105a4d3118315f7c machineconfiguration.openshift.io/desiredConfig: rendered-worker-0b2751ccc4b6d419105a4d3118315f7c machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 19 Sep 2019 15:37:22 +0800 Taints: node.kubernetes.io/not-ready:NoSchedule Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 20 Sep 2019 20:57:32 +0800 Fri, 20 Sep 2019 11:31:18 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 20 Sep 2019 20:57:32 +0800 Fri, 20 Sep 2019 11:31:18 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 20 Sep 2019 20:57:32 +0800 Fri, 20 Sep 2019 11:31:18 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Fri, 20 Sep 2019 20:57:32 +0800 Fri, 20 Sep 2019 18:44:08 +0800 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network Addresses: InternalIP: 10.0.151.128 Hostname: qe-yapei-uos2-6dbch-compute-0 Capacity: cpu: 4 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8163800Ki pods: 48 Allocatable: cpu: 3500m hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7447000Ki pods: 48 System Info: Machine ID: 025fd090d4404c4e843337b7196c95fb System UUID: 025fd090-d440-4c4e-8433-37b7196c95fb Boot ID: b206d692-6251-4ce3-82f8-3f8c8f57087f Kernel Version: 4.18.0-80.11.1.el8_0.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 42.80.20190918.1 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.14.10-0.12.dev.rhaos4.2.git819260a.el8 Kubelet Version: v1.14.6+147115512 Kube-Proxy Version: v1.14.6+147115512 Non-terminated Pods: (22 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-federation-system kubefed-admission-webhook-7cc9fdbbb5-ghqf6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h26m kube-federation-system kubefed-controller-manager-c84b78b46-s4qkn 100m (2%) 100m (2%) 64Mi (0%) 128Mi (1%) 7h27m kube-federation-system kubefed-operator-bh422 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 7h40m minmli hello-daemonset-tt592 0 (0%) 0 (0%) 0 (0%) 0 (0%) 27h openshift-dns dns-default-sqjgg 110m (3%) 0 (0%) 70Mi (0%) 512Mi (7%) 133m openshift-image-registry node-ca-jwjbf 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 29h openshift-machine-config-operator machine-config-daemon-h4jmm 20m (0%) 0 (0%) 50Mi (0%) 0 (0%) 29h openshift-marketplace broker-manifests-hr5l8 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 8h openshift-monitoring alertmanager-main-1 100m (2%) 100m (2%) 225Mi (3%) 25Mi (0%) 3h58m openshift-monitoring alertmanager-main-2 100m (2%) 100m (2%) 225Mi (3%) 25Mi (0%) 3h59m openshift-monitoring grafana-84dfb8ff94-dvl6j 100m (2%) 0 (0%) 100Mi (1%) 0 (0%) 3h59m openshift-monitoring node-exporter-mpk4c 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 29h openshift-monitoring openshift-state-metrics-bd8877d6-55gz4 120m (3%) 0 (0%) 190Mi (2%) 0 (0%) 3h59m openshift-monitoring prometheus-adapter-9b999c5bf-rs6wl 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 5h42m openshift-monitoring prometheus-adapter-9b999c5bf-z5knb 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 5h42m openshift-multus multus-xlj7z 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 29h openshift-sdn ovs-j9wmz 200m (5%) 0 (0%) 400Mi (5%) 0 (0%) 29h openshift-sdn sdn-5p8jq 100m (2%) 0 (0%) 200Mi (2%) 0 (0%) 133m test-ns test-deployment-95b5bf4cd-c6924 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h3m test-ns test-deployment-95b5bf4cd-cmqtn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h3m test-ns test-deployment-95b5bf4cd-nfthx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h3m wzheng2 jenkins-1-lg9jl 0 (0%) 0 (0%) 1Gi (14%) 1Gi (14%) 3h50m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1020m (29%) 300m (8%) memory 2868Mi (39%) 1714Mi (23%) ephemeral-storage 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 11h (x2 over 23h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable Normal Starting 11h kubelet, qe-yapei-uos2-6dbch-compute-0 Starting kubelet. Normal NodeHasSufficientMemory 11h (x2 over 11h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 11h (x2 over 11h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 11h (x2 over 11h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientPID Warning Rebooted 11h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 has been rebooted, boot id: 162388fe-d135-48c9-8c28-e7578d4059f5 Normal NodeNotReady 11h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady Normal NodeAllocatableEnforced 11h kubelet, qe-yapei-uos2-6dbch-compute-0 Updated Node Allocatable limit across pods Normal NodeReady 11h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeReady Normal NodeSchedulable 11h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeSchedulable Normal NodeNotSchedulable 10h (x2 over 11h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable Normal Starting 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Starting kubelet. Normal NodeHasSufficientMemory 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Updated Node Allocatable limit across pods Normal NodeNotReady 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady Warning Rebooted 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 has been rebooted, boot id: cd54ebb7-f622-427d-ac45-3c3f5c0315b3 Normal NodeReady 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeReady Normal NodeSchedulable 10h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeSchedulable Normal NodeNotSchedulable 9h (x2 over 10h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable Normal NodeAllocatableEnforced 9h kubelet, qe-yapei-uos2-6dbch-compute-0 Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 9h (x2 over 9h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 9h (x2 over 9h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 9h (x2 over 9h) kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientPID Normal Starting 9h kubelet, qe-yapei-uos2-6dbch-compute-0 Starting kubelet. Warning Rebooted 9h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 has been rebooted, boot id: b206d692-6251-4ce3-82f8-3f8c8f57087f Normal NodeNotReady 9h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady Normal NodeNotSchedulable 9h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable Normal NodeReady 9h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeReady Normal NodeSchedulable 9h kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeSchedulable Normal NodeNotReady 133m kubelet, qe-yapei-uos2-6dbch-compute-0 Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady Expected results: Additional info:
I am unable to login to the cluster or access the VMs. Can I please get a cluster with the failure to further debug.
We restarted CRI-O on one of the master nodes and that got the Node back into Ready state. There were no more cni/networking errors in the logs for CRI-O and Kubelet as well. The version of coreos being used here is about 2 weeks old, and the latest one also has reported issues with CNI, which is described in https://bugzilla.redhat.com/show_bug.cgi?id=1753801. This BZ depends on https://bugzilla.redhat.com/show_bug.cgi?id=1753801 being fixed. QE can feel free to try this out with the latest coreos, but will most likely run into the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1753801. Reassigning to the openshift-sdn folks.
*** Bug 1753801 has been marked as a duplicate of this bug. ***
4.2.0-0.nightly-2019-09-24-025718 -> 4.2.0-0.nightly-2019-09-24-194016 [hasha@fedora_pc ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-09-24-194016 True False 9m7s Cluster version is 4.2.0-0.nightly-2019-09-24-194016 [hasha@fedora_pc ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.2.0-0.nightly-2019-09-24-194016 True False False 19h cloud-credential 4.2.0-0.nightly-2019-09-24-194016 True False False 20h cluster-autoscaler 4.2.0-0.nightly-2019-09-24-194016 True False False 20h console 4.2.0-0.nightly-2019-09-24-194016 True False False 20h dns 4.2.0-0.nightly-2019-09-24-194016 True False False 20h image-registry 4.2.0-0.nightly-2019-09-24-194016 True False False 173m ingress 4.2.0-0.nightly-2019-09-24-194016 True False False 20h insights 4.2.0-0.nightly-2019-09-24-194016 True False False 20h kube-apiserver 4.2.0-0.nightly-2019-09-24-194016 True False False 20h kube-controller-manager 4.2.0-0.nightly-2019-09-24-194016 True False False 20h kube-scheduler 4.2.0-0.nightly-2019-09-24-194016 True False False 20h machine-api 4.2.0-0.nightly-2019-09-24-194016 True False False 20h machine-config 4.2.0-0.nightly-2019-09-24-194016 True False False 20h marketplace 4.2.0-0.nightly-2019-09-24-194016 True False False 15m monitoring 4.2.0-0.nightly-2019-09-24-194016 True False False 19m network 4.2.0-0.nightly-2019-09-24-194016 True False False 20h node-tuning 4.2.0-0.nightly-2019-09-24-194016 True False False 68m openshift-apiserver 4.2.0-0.nightly-2019-09-24-194016 True False False 20h openshift-controller-manager 4.2.0-0.nightly-2019-09-24-194016 True False False 20h openshift-samples 4.2.0-0.nightly-2019-09-24-194016 True False False 67m operator-lifecycle-manager 4.2.0-0.nightly-2019-09-24-194016 True False False 20h operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-09-24-194016 True False False 20h operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-09-24-194016 True False False 77m service-ca 4.2.0-0.nightly-2019-09-24-194016 True False False 20h service-catalog-apiserver 4.2.0-0.nightly-2019-09-24-194016 True False False 18h service-catalog-controller-manager 4.2.0-0.nightly-2019-09-24-194016 True False False 18h storage 4.2.0-0.nightly-2019-09-24-194016 True False False 68m Today upgrade successfully for disconnect cluster, verified this bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922