Bug 1812800 - fail to initialize machine-config operator during upi/sphere installation due to machine-api-controllers is not ready
Summary: fail to initialize machine-config operator during upi/sphere installation due...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Alberto
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-12 08:30 UTC by liujia
Modified: 2020-08-27 22:35 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-27 22:35:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 528 0 None closed BUG 1812800: Fix vSphere image link 2020-11-24 16:14:39 UTC

Description liujia 2020-03-12 08:30:25 UTC
Description of problem:
Failed to install cluster on vsphere during "wait-for install-complete" stage.
level=info msg="Waiting up to 30m0s for the cluster at https://api.jliu-3367.qe.devcluster.openshift.com:6443 to initialize..."
level=info msg="Cluster operator insights Disabled is False with : "
level=info msg="Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.5.0-0.nightly-2020-03-12-013015"
level=fatal msg="failed to initialize the cluster: Cluster operator machine-api is still updating"
===================================================================
# ./oc describe co machine-api
Name:         machine-api
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-03-12T06:58:47Z
  Generation:          1
  Resource Version:    30005
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/machine-api
  UID:                 1af3d9b2-bbe3-4da7-9bf6-36088e941d6e
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-03-12T07:33:48Z
    Message:               Progressing towards operator: 4.5.0-0.nightly-2020-03-12-013015
    Reason:                SyncingResources
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2020-03-12T06:58:47Z
    Status:                True
    Type:                  Available
    Last Transition Time:  2020-03-12T07:33:48Z
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2020-03-12T06:58:47Z
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:      
    Name:       openshift-machine-api
    Resource:   namespaces
    Group:      machine.openshift.io
    Name:       
    Namespace:  openshift-machine-api
    Resource:   machines
    Group:      machine.openshift.io
    Name:       
    Namespace:  openshift-machine-api
    Resource:   machinesets
    Group:      rbac.authorization.k8s.io
    Name:       
    Namespace:  openshift-machine-api
    Resource:   roles
    Group:      rbac.authorization.k8s.io
    Name:       machine-api-operator
    Resource:   clusterroles
    Group:      rbac.authorization.k8s.io
    Name:       machine-api-controllers
    Resource:   clusterroles
    Group:      rbac.authorization.k8s.io
    Name:       cloud-provider-config-reader
    Namespace:  openshift-config
    Resource:   roles
Events:
  Type     Reason           Age                      From                Message
  ----     ------           ----                     ----                -------
  Normal   Status upgrade   <invalid> (x8 over 34m)  machineapioperator  Progressing towards operator: 4.5.0-0.nightly-2020-03-12-013015
  Warning  Status degraded  <invalid> (x7 over 29m)  machineapioperator  deployment machine-api-controllers is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1)

Checked that deployment machine-api-controllers was not ready and machine-api-controllers pod can not start correctly.
# ./oc describe pod/machine-api-controllers-844c569747-7v9rk -n openshift-machine-api
Name:                 machine-api-controllers-844c569747-7v9rk
Namespace:            openshift-machine-api
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 control-plane-0/139.178.76.26
Start Time:           Thu, 12 Mar 2020 06:58:57 +0000
Labels:               api=clusterapi
                      k8s-app=controller
                      pod-template-hash=844c569747
Annotations:          k8s.v1.cni.cncf.io/networks-status:
                        [{
                            "name": "openshift-sdn",
                            "interface": "eth0",
                            "ips": [
                                "10.128.0.21"
                            ],
                            "dns": {},
                            "default-route": [
                                "10.128.0.1"
                            ]
                        }]
                      openshift.io/scc: restricted
Status:               Pending
IP:                   10.128.0.21
IPs:
  IP:           10.128.0.21
Controlled By:  ReplicaSet/machine-api-controllers-844c569747
Containers:
  controller-manager:
    Container ID:  
    Image:         docker.io/openshift/origin-machine-api-operator:v4.0.0
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --logtostderr=true
      --v=3
      --namespace=openshift-machine-api
    State:          Waiting
      Reason:       CreateContainerError
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from machine-api-controllers-token-kpc6q (ro)
  machine-controller:
    Container ID:  
    Image:         docker.io/openshift/origin-machine-api-operator:v4.0.0
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /machine-controller-manager
    Args:
      --logtostderr=true
      --v=3
      --namespace=openshift-machine-api
    State:          Waiting
      Reason:       CreateContainerError
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     10m
      memory:  20Mi
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from machine-api-controllers-token-kpc6q (ro)
  nodelink-controller:
    Container ID:  cri-o://aed173b02bf30189a8dd4ac744cfbca2bf3843621fc113eb14c9e7e94522556e
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fccc93ce6e0d8e219d32cbab2b1a2549cd024951fc19468e490df345873302f
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fccc93ce6e0d8e219d32cbab2b1a2549cd024951fc19468e490df345873302f
    Port:          <none>
    Host Port:     <none>
    Command:
      /nodelink-controller
    Args:
      --logtostderr=true
      --v=3
      --namespace=openshift-machine-api
    State:          Running
      Started:      Thu, 12 Mar 2020 06:59:24 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from machine-api-controllers-token-kpc6q (ro)
  machine-healthcheck-controller:
    Container ID:  cri-o://2eac921202893463d51d84526554df1cb95edf8be824e2d3dd4acd802f539203
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fccc93ce6e0d8e219d32cbab2b1a2549cd024951fc19468e490df345873302f
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fccc93ce6e0d8e219d32cbab2b1a2549cd024951fc19468e490df345873302f
    Port:          <none>
    Host Port:     <none>
    Command:
      /machine-healthcheck
    Args:
      --logtostderr=true
      --v=3
      --namespace=openshift-machine-api
    State:          Running
      Started:      Thu, 12 Mar 2020 06:59:24 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from machine-api-controllers-token-kpc6q (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  machine-api-controllers-token-kpc6q:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  machine-api-controllers-token-kpc6q
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 120s
                 node.kubernetes.io/unreachable:NoExecute for 120s
Events:
  Type     Reason     Age   From                      Message
  ----     ------     ----  ----                      -------
  Normal   Scheduled  43m   default-scheduler         Successfully assigned openshift-machine-api/machine-api-controllers-844c569747-7v9rk to control-plane-0
  Normal   Pulling    43m   kubelet, control-plane-0  Pulling image "docker.io/openshift/origin-machine-api-operator:v4.0.0"
  Normal   Pulled     42m   kubelet, control-plane-0  Successfully pulled image "docker.io/openshift/origin-machine-api-operator:v4.0.0"
  Warning  Failed     42m   kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:23Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/manager\\\": stat /manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory"
  Normal   Started  42m  kubelet, control-plane-0  Started container machine-healthcheck-controller
  Warning  Failed   42m  kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:24Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/machine-controller-manager\\\": stat /machine-controller-manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/machine-controller-manager\": stat /machine-controller-manager: no such file or directory"
  Normal   Pulled   42m  kubelet, control-plane-0  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fccc93ce6e0d8e219d32cbab2b1a2549cd024951fc19468e490df345873302f" already present on machine
  Normal   Created  42m  kubelet, control-plane-0  Created container nodelink-controller
  Normal   Started  42m  kubelet, control-plane-0  Started container nodelink-controller
  Normal   Pulled   42m  kubelet, control-plane-0  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fccc93ce6e0d8e219d32cbab2b1a2549cd024951fc19468e490df345873302f" already present on machine
  Normal   Created  42m  kubelet, control-plane-0  Created container machine-healthcheck-controller
  Warning  Failed   42m  kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:36Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/manager\\\": stat /manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory"
  Warning  Failed  42m  kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:37Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/machine-controller-manager\\\": stat /machine-controller-manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/machine-controller-manager\": stat /machine-controller-manager: no such file or directory"
  Warning  Failed  42m  kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:38Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/manager\\\": stat /manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory"
  Warning  Failed  42m  kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:38Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/machine-controller-manager\\\": stat /machine-controller-manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/machine-controller-manager\": stat /machine-controller-manager: no such file or directory"
  Normal   Pulled  42m (x4 over 42m)  kubelet, control-plane-0  Container image "docker.io/openshift/origin-machine-api-operator:v4.0.0" already present on machine
  Warning  Failed  42m                kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:52Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/machine-controller-manager\\\": stat /machine-controller-manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/machine-controller-manager\": stat /machine-controller-manager: no such file or directory"
  Warning  Failed  42m  kubelet, control-plane-0  Error: container create failed: time="2020-03-12T06:59:52Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/manager\\\": stat /manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory"
  Warning  Failed  42m  kubelet, control-plane-0  Error: container create failed: time="2020-03-12T07:00:07Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/manager\\\": stat /manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory"
  Warning  Failed  33m (x62 over 42m)  kubelet, control-plane-0  (combined from similar events): Error: container create failed: time="2020-03-12T07:09:00Z" level=error msg="container_linux.go:349: starting container process caused \"exec: \\\"/manager\\\": stat /manager: no such file or directory\""
container_linux.go:349: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory"
  Normal  Pulled  <invalid> (x192 over 42m)  kubelet, control-plane-0  Container image "docker.io/openshift/origin-machine-api-operator:v4.0.0" already present on machine


Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-03-12-013015

How reproducible:


Steps to Reproduce:
1. Trigger upi/vsphere installation
2.
3.

Actual results:
install fail

Expected results:
install succeed

Additional info:

Comment 3 Milind Yadav 2020-03-23 08:40:43 UTC
version   4.5.0-0.nightly-2020-03-22-211241

How reproducible:


Steps to Reproduce:
1. Trigger upi/vsphere installation
2.Build parameters used :
private-openshift-misc/v3-launch-templates/functionality-testing/aos-4_5/upi-on-vsphere/versioned-installer-vsphere_slave
installer_payload_image: registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-03-22-211241
3.set label : vsphere-installer-44

Actual results:
install succeed

Expected results:
install succeed

Additional Info :
logs :
.
.
.

+ oc delete project openshift-operators-redhat
project.project.openshift.io "openshift-operators-redhat" deleted
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-03-22-211241   True        False         3m35s   Cluster version is 4.5.0-0.nightly-2020-03-22-211241
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.0-0.nightly-2020-03-22-211241   True        False         False      7m3s
cloud-credential                           4.5.0-0.nightly-2020-03-22-211241   True        False         False      44m
cluster-autoscaler                         4.5.0-0.nightly-2020-03-22-211241   True        False         False      15m
console                                    4.5.0-0.nightly-2020-03-22-211241   True        False         False      9m5s
csi-snapshot-controller                    4.5.0-0.nightly-2020-03-22-211241   True        False         False      13m
dns                                        4.5.0-0.nightly-2020-03-22-211241   True        False         False      18m
etcd                                       4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
image-registry                             4.5.0-0.nightly-2020-03-22-211241   True        False         False      10m
ingress                                    4.5.0-0.nightly-2020-03-22-211241   True        False         False      11m
insights                                   4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
kube-apiserver                             4.5.0-0.nightly-2020-03-22-211241   True        False         False      17m
kube-controller-manager                    4.5.0-0.nightly-2020-03-22-211241   True        False         False      17m
kube-scheduler                             4.5.0-0.nightly-2020-03-22-211241   True        False         False      17m
kube-storage-version-migrator              4.5.0-0.nightly-2020-03-22-211241   True        False         False      11m
machine-api                                4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
machine-config                             4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
marketplace                                4.5.0-0.nightly-2020-03-22-211241   True        False         False      15m
monitoring                                 4.5.0-0.nightly-2020-03-22-211241   True        False         False      5m1s
network                                    4.5.0-0.nightly-2020-03-22-211241   True        False         False      23m
node-tuning                                4.5.0-0.nightly-2020-03-22-211241   True        False         False      23m
openshift-apiserver                        4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
openshift-controller-manager               4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
openshift-samples                          4.5.0-0.nightly-2020-03-22-211241   True        False         False      14m
operator-lifecycle-manager                 4.5.0-0.nightly-2020-03-22-211241   True        False         False      23m
operator-lifecycle-manager-catalog         4.5.0-0.nightly-2020-03-22-211241   True        False         False      23m
operator-lifecycle-manager-packageserver   4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
service-ca                                 4.5.0-0.nightly-2020-03-22-211241   True        False         False      23m
service-catalog-apiserver                  4.5.0-0.nightly-2020-03-22-211241   True        False         False      23m
service-catalog-controller-manager         4.5.0-0.nightly-2020-03-22-211241   True        False         False      23m
storage                                    4.5.0-0.nightly-2020-03-22-211241   True        False         False      16m
waiting for operation up to 36000 seconds..

[08:33:07] INFO> Exit Status: 0
Flag --config has been deprecated, use --kubeconfig instead
[08:33:08] INFO> HOSTS SPECIFICATION: api.miyadav2303.qe.devcluster.openshift.com:lb
deleting /home/installer4/workspace/Launch Environment Flexy/workdir/awscreds20200323-22936-1fdm771
+ ret=0
+ '[' X0 == X0 ']'
+ result=PASS
+ '[' -n '' ']'
+ exit 0
Archiving artifacts
Recording fingerprints
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
Started calculate disk usage of workspace
Finished Calculation of disk usage of workspace in 0 seconds
Finished: SUCCESS

Comment 4 Luke Meyer 2020-08-27 22:35:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409'


Note You need to log in before you can comment on or make changes to this bug.