Bug 1753988 - [Upgrade]node become not ready since the "CNI network "" not found"
Summary: [Upgrade]node become not ready since the "CNI network "" not found"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Dan Williams
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-20 13:05 UTC by shahan
Modified: 2019-10-16 06:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:41:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:41:54 UTC

Description shahan 2019-09-20 13:05:36 UTC
Description of problem:
 node not ready when upgrade on Baremetal disconnected cluster for UPI.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-09-18-211009 -> 4.2.0-0.nightly-2019-09-20-014748

How reproducible:


Steps to Reproduce:
1. install Baremetal disconnected cluster for UPI cluster with 4.2.0-0.nightly-2019-09-18-211009, then upgrade to 4.2.0-0.nightly-2019-09-20-014748
2.
3.

Actual results:
$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-09-18-211009   True        True          5h57m   Unable to apply 4.2.0-0.nightly-2019-09-20-014748: the cluster operator kube-apiserver is degraded

$ oc get co 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
cloud-credential                           4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
cluster-autoscaler                         4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
console                                    4.2.0-0.nightly-2019-09-20-014748   True        False         False      4h4m
dns                                        4.2.0-0.nightly-2019-09-20-014748   True        True          True       29h
image-registry                             4.2.0-0.nightly-2019-09-20-014748   True        False         False      9h
ingress                                    4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
insights                                   4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
kube-apiserver                             4.2.0-0.nightly-2019-09-20-014748   True        False         True       29h
kube-controller-manager                    4.2.0-0.nightly-2019-09-20-014748   True        False         True       29h
kube-scheduler                             4.2.0-0.nightly-2019-09-20-014748   True        False         True       29h
machine-api                                4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
machine-config                             4.2.0-0.nightly-2019-09-18-211009   False       False         True       130m
marketplace                                4.2.0-0.nightly-2019-09-20-014748   True        False         False      4h4m
monitoring                                 4.2.0-0.nightly-2019-09-20-014748   True        False         False      9h
network                                    4.2.0-0.nightly-2019-09-18-211009   True        True          False      29h
node-tuning                                4.2.0-0.nightly-2019-09-20-014748   True        False         False      4h5m
openshift-apiserver                        4.2.0-0.nightly-2019-09-20-014748   True        False         False      9h
openshift-controller-manager               4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
openshift-samples                          4.2.0-0.nightly-2019-09-20-014748   True        False         False      4h5m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-09-20-014748   True        False         False      4h4m
service-ca                                 4.2.0-0.nightly-2019-09-20-014748   True        False         False      29h
service-catalog-apiserver                  4.2.0-0.nightly-2019-09-20-014748   True        False         False      9h
service-catalog-controller-manager         4.2.0-0.nightly-2019-09-20-014748   True        False         False      25h
storage                                    4.2.0-0.nightly-2019-09-20-014748   True        False         False      4h5m

$ oc get node
NAME                                  STATUS     ROLES    AGE   VERSION
qe-yapei-uos2-6dbch-compute-0         NotReady   worker   29h   v1.14.6+147115512
qe-yapei-uos2-6dbch-compute-1         NotReady   worker   29h   v1.14.6+147115512
qe-yapei-uos2-6dbch-compute-2         NotReady   worker   29h   v1.14.6+147115512
qe-yapei-uos2-6dbch-control-plane-0   NotReady   master   29h   v1.14.6+147115512
qe-yapei-uos2-6dbch-control-plane-1   NotReady   master   29h   v1.14.6+147115512
qe-yapei-uos2-6dbch-control-plane-2   NotReady   master   29h   v1.14.6+147115512

$ oc describe node qe-yapei-uos2-6dbch-compute-0
Name:               qe-yapei-uos2-6dbch-compute-0
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=qe-yapei-uos2-6dbch-compute-0
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.openshift.io/os_id=rhcos
Annotations:        machineconfiguration.openshift.io/currentConfig: rendered-worker-0b2751ccc4b6d419105a4d3118315f7c
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-0b2751ccc4b6d419105a4d3118315f7c
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 19 Sep 2019 15:37:22 +0800
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 20 Sep 2019 20:57:32 +0800   Fri, 20 Sep 2019 11:31:18 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 20 Sep 2019 20:57:32 +0800   Fri, 20 Sep 2019 11:31:18 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 20 Sep 2019 20:57:32 +0800   Fri, 20 Sep 2019 11:31:18 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Fri, 20 Sep 2019 20:57:32 +0800   Fri, 20 Sep 2019 18:44:08 +0800   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Addresses:
  InternalIP:  10.0.151.128
  Hostname:    qe-yapei-uos2-6dbch-compute-0
Capacity:
 cpu:            4
 hugepages-1Gi:  0
 hugepages-2Mi:  0
 memory:         8163800Ki
 pods:           48
Allocatable:
 cpu:            3500m
 hugepages-1Gi:  0
 hugepages-2Mi:  0
 memory:         7447000Ki
 pods:           48
System Info:
 Machine ID:                         025fd090d4404c4e843337b7196c95fb
 System UUID:                        025fd090-d440-4c4e-8433-37b7196c95fb
 Boot ID:                            b206d692-6251-4ce3-82f8-3f8c8f57087f
 Kernel Version:                     4.18.0-80.11.1.el8_0.x86_64
 OS Image:                           Red Hat Enterprise Linux CoreOS 42.80.20190918.1 (Ootpa)
 Operating System:                   linux
 Architecture:                       amd64
 Container Runtime Version:          cri-o://1.14.10-0.12.dev.rhaos4.2.git819260a.el8
 Kubelet Version:                    v1.14.6+147115512
 Kube-Proxy Version:                 v1.14.6+147115512
Non-terminated Pods:                 (22 in total)
  Namespace                          Name                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                          ----                                          ------------  ----------  ---------------  -------------  ---
  kube-federation-system             kubefed-admission-webhook-7cc9fdbbb5-ghqf6    0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h26m
  kube-federation-system             kubefed-controller-manager-c84b78b46-s4qkn    100m (2%)     100m (2%)   64Mi (0%)        128Mi (1%)     7h27m
  kube-federation-system             kubefed-operator-bh422                        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         7h40m
  minmli                             hello-daemonset-tt592                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         27h
  openshift-dns                      dns-default-sqjgg                             110m (3%)     0 (0%)      70Mi (0%)        512Mi (7%)     133m
  openshift-image-registry           node-ca-jwjbf                                 10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         29h
  openshift-machine-config-operator  machine-config-daemon-h4jmm                   20m (0%)      0 (0%)      50Mi (0%)        0 (0%)         29h
  openshift-marketplace              broker-manifests-hr5l8                        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         8h
  openshift-monitoring               alertmanager-main-1                           100m (2%)     100m (2%)   225Mi (3%)       25Mi (0%)      3h58m
  openshift-monitoring               alertmanager-main-2                           100m (2%)     100m (2%)   225Mi (3%)       25Mi (0%)      3h59m
  openshift-monitoring               grafana-84dfb8ff94-dvl6j                      100m (2%)     0 (0%)      100Mi (1%)       0 (0%)         3h59m
  openshift-monitoring               node-exporter-mpk4c                           10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         29h
  openshift-monitoring               openshift-state-metrics-bd8877d6-55gz4        120m (3%)     0 (0%)      190Mi (2%)       0 (0%)         3h59m
  openshift-monitoring               prometheus-adapter-9b999c5bf-rs6wl            10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         5h42m
  openshift-monitoring               prometheus-adapter-9b999c5bf-z5knb            10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         5h42m
  openshift-multus                   multus-xlj7z                                  10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         29h
  openshift-sdn                      ovs-j9wmz                                     200m (5%)     0 (0%)      400Mi (5%)       0 (0%)         29h
  openshift-sdn                      sdn-5p8jq                                     100m (2%)     0 (0%)      200Mi (2%)       0 (0%)         133m
  test-ns                            test-deployment-95b5bf4cd-c6924               0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h3m
  test-ns                            test-deployment-95b5bf4cd-cmqtn               0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h3m
  test-ns                            test-deployment-95b5bf4cd-nfthx               0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h3m
  wzheng2                            jenkins-1-lg9jl                               0 (0%)        0 (0%)      1Gi (14%)        1Gi (14%)      3h50m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1020m (29%)   300m (8%)
  memory             2868Mi (39%)  1714Mi (23%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:
  Type     Reason                   Age                From                                    Message
  ----     ------                   ----               ----                                    -------
  Normal   NodeNotSchedulable       11h (x2 over 23h)  kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable
  Normal   Starting                 11h                kubelet, qe-yapei-uos2-6dbch-compute-0  Starting kubelet.
  Normal   NodeHasSufficientMemory  11h (x2 over 11h)  kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    11h (x2 over 11h)  kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     11h (x2 over 11h)  kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientPID
  Warning  Rebooted                 11h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 has been rebooted, boot id: 162388fe-d135-48c9-8c28-e7578d4059f5
  Normal   NodeNotReady             11h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  11h                kubelet, qe-yapei-uos2-6dbch-compute-0  Updated Node Allocatable limit across pods
  Normal   NodeReady                11h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeReady
  Normal   NodeSchedulable          11h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeSchedulable
  Normal   NodeNotSchedulable       10h (x2 over 11h)  kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable
  Normal   Starting                 10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Starting kubelet.
  Normal   NodeHasSufficientMemory  10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Updated Node Allocatable limit across pods
  Normal   NodeNotReady             10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady
  Warning  Rebooted                 10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 has been rebooted, boot id: cd54ebb7-f622-427d-ac45-3c3f5c0315b3
  Normal   NodeReady                10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeReady
  Normal   NodeSchedulable          10h                kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeSchedulable
  Normal   NodeNotSchedulable       9h (x2 over 10h)   kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable
  Normal   NodeAllocatableEnforced  9h                 kubelet, qe-yapei-uos2-6dbch-compute-0  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  9h (x2 over 9h)    kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    9h (x2 over 9h)    kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     9h (x2 over 9h)    kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeHasSufficientPID
  Normal   Starting                 9h                 kubelet, qe-yapei-uos2-6dbch-compute-0  Starting kubelet.
  Warning  Rebooted                 9h                 kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 has been rebooted, boot id: b206d692-6251-4ce3-82f8-3f8c8f57087f
  Normal   NodeNotReady             9h                 kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady
  Normal   NodeNotSchedulable       9h                 kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotSchedulable
  Normal   NodeReady                9h                 kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeReady
  Normal   NodeSchedulable          9h                 kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeSchedulable
  Normal   NodeNotReady             133m               kubelet, qe-yapei-uos2-6dbch-compute-0  Node qe-yapei-uos2-6dbch-compute-0 status is now: NodeNotReady


Expected results:


Additional info:

Comment 3 Urvashi Mohnani 2019-09-20 14:22:59 UTC
I am unable to login to the cluster or access the VMs. Can I please get a cluster with the failure to further debug.

Comment 6 Urvashi Mohnani 2019-09-20 17:01:04 UTC
We restarted CRI-O on one of the master nodes and that got the Node back into Ready state. There were no more cni/networking errors in the logs for CRI-O and Kubelet as well.
The version of coreos being used here is about 2 weeks old, and the latest one also has reported issues with CNI, which is described in https://bugzilla.redhat.com/show_bug.cgi?id=1753801.
This BZ depends on https://bugzilla.redhat.com/show_bug.cgi?id=1753801 being fixed.
QE can feel free to try this out with the latest coreos, but will most likely run into the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1753801.

Reassigning to the openshift-sdn folks.

Comment 7 Ben Bennett 2019-09-20 17:55:01 UTC
*** Bug 1753801 has been marked as a duplicate of this bug. ***

Comment 12 Casey Callendrello 2019-09-23 13:09:46 UTC
*** Bug 1753801 has been marked as a duplicate of this bug. ***

Comment 14 shahan 2019-09-25 05:46:40 UTC
4.2.0-0.nightly-2019-09-24-025718  ->  4.2.0-0.nightly-2019-09-24-194016
[hasha@fedora_pc ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-09-24-194016   True        False         9m7s    Cluster version is 4.2.0-0.nightly-2019-09-24-194016
[hasha@fedora_pc ~]$ oc get co 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-09-24-194016   True        False         False      19h
cloud-credential                           4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
cluster-autoscaler                         4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
console                                    4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
dns                                        4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
image-registry                             4.2.0-0.nightly-2019-09-24-194016   True        False         False      173m
ingress                                    4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
insights                                   4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
kube-apiserver                             4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
kube-controller-manager                    4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
kube-scheduler                             4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
machine-api                                4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
machine-config                             4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
marketplace                                4.2.0-0.nightly-2019-09-24-194016   True        False         False      15m
monitoring                                 4.2.0-0.nightly-2019-09-24-194016   True        False         False      19m
network                                    4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
node-tuning                                4.2.0-0.nightly-2019-09-24-194016   True        False         False      68m
openshift-apiserver                        4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
openshift-controller-manager               4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
openshift-samples                          4.2.0-0.nightly-2019-09-24-194016   True        False         False      67m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-09-24-194016   True        False         False      77m
service-ca                                 4.2.0-0.nightly-2019-09-24-194016   True        False         False      20h
service-catalog-apiserver                  4.2.0-0.nightly-2019-09-24-194016   True        False         False      18h
service-catalog-controller-manager         4.2.0-0.nightly-2019-09-24-194016   True        False         False      18h
storage                                    4.2.0-0.nightly-2019-09-24-194016   True        False         False      68m
Today upgrade successfully for disconnect cluster, verified this bug.

Comment 15 errata-xmlrpc 2019-10-16 06:41:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.