Bug 1794366 - Failed to install. kube-controller-manager-master pod on CreateContainerError state
Summary: Failed to install. kube-controller-manager-master pod on CreateContainerError...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.z
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-23 12:23 UTC by David Sanz
Modified: 2020-02-12 09:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-12 09:42:20 UTC
Target Upstream Version:
Embargoed:
maszulik: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0391 0 None None None 2020-02-12 09:42:53 UTC

Description David Sanz 2020-01-23 12:23:51 UTC
Description of problem:

Installation is not completed on baremetal (packet) for 4.3.0-0.nightly-2020-01-23-105702:

$ oc describe co kube-controller-manager
Name:         kube-controller-manager
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-01-23T12:10:55Z
  Generation:          1
  Resource Version:    10190
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/kube-controller-manager
  UID:                 0f567700-458c-4698-93a8-de6a1f6ca39c
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-01-23T12:13:11Z
    Message:               StaticPodsDegraded: nodes/master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com pods/kube-controller-manager-master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com container="cluster-policy-controller-5" is not ready
StaticPodsDegraded: nodes/master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com pods/kube-controller-manager-master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com container="cluster-policy-controller-5" is waiting: "CreateContainerError" - "container create failed: time=\"2020-01-23T12:17:20Z\" level=error msg=\"container_linux.go:346: starting container process caused \\\"exec: \\\\\\\"cluster-policy-controller\\\\\\\": executable file not found in $PATH\\\"\"\ncontainer_linux.go:346: starting container process caused \"exec: \\\"cluster-policy-controller\\\": executable file not found in $PATH\"\n"
StaticPodsDegraded: pods "kube-controller-manager-master-01.mrnd-43-no-delete-e174.qe.devcluster.openshift.com" not found
StaticPodsDegraded: pods "kube-controller-manager-master-00.mrnd-43-no-delete-e174.qe.devcluster.openshift.com" not found
    Reason:                StaticPodsDegradedError
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-01-23T12:11:06Z
    Message:               Progressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 5
    Reason:                Progressing
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2020-01-23T12:10:56Z
    Message:               Available: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 5
    Reason:                AvailableZeroNodesActive
    Status:                False
    Type:                  Available
    Last Transition Time:  2020-01-23T12:10:55Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:     operator.openshift.io
    Name:      cluster
    Resource:  kubecontrollermanagers
    Group:     
    Name:      openshift-config
    Resource:  namespaces
    Group:     
    Name:      openshift-config-managed
    Resource:  namespaces
    Group:     
    Name:      openshift-kube-controller-manager
    Resource:  namespaces
    Group:     
    Name:      openshift-kube-controller-manager-operator
    Resource:  namespaces
  Versions:
    Name:     raw-internal
    Version:  4.3.0-0.nightly-2020-01-23-105702
    Name:     kube-controller-manager
    Version:  1.16.2
    Name:     operator
    Version:  4.3.0-0.nightly-2020-01-23-105702
Events:       <none>



$ oc describe pod kube-controller-manager-master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com -n openshift-kube-controller-manager
Name:                 kube-controller-manager-master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com
Namespace:            openshift-kube-controller-manager
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com/147.75.100.27
Start Time:           Thu, 23 Jan 2020 13:12:10 +0100
Labels:               app=kube-controller-manager
                      kube-controller-manager=true
                      revision=5
Annotations:          kubernetes.io/config.hash: 8109d0dfd71bc70c2f478dc03e54a1bc
                      kubernetes.io/config.mirror: 8109d0dfd71bc70c2f478dc03e54a1bc
                      kubernetes.io/config.seen: 2020-01-23T12:12:10.117745797Z
                      kubernetes.io/config.source: file
Status:               Pending
IP:                   147.75.100.27
IPs:
  IP:  147.75.100.27
Init Containers:
  wait-for-host-port:
    Container ID:  cri-o://fa8fc90cdef30ec586fbd4171ba3092c53b5b7601c27e4e320895fdc0c76a5b6
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/timeout
      30
      /bin/bash
      -c
    Args:
      echo -n "Waiting for port :10257 to be released."
      while [ -n "$(lsof -ni :10257)" ]; do
        echo -n "."
        sleep 1
      done
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Jan 2020 13:12:10 +0100
      Finished:     Thu, 23 Jan 2020 13:12:11 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:         <none>
  wait-for-cpc-host-port:
    Container ID:  cri-o://d4747e66934c5e47340d8dd2165a9910578234ccc547b807c4b6c1de5ad094c3
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/timeout
      30
      /bin/bash
      -c
    Args:
      echo -n "Waiting for port :10357 to be released."
      while [ -n "$(lsof -ni :10357)" ]; do
        echo -n "."
        sleep 1
      done
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Jan 2020 13:12:11 +0100
      Finished:     Thu, 23 Jan 2020 13:12:12 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:         <none>
Containers:
  kube-controller-manager-5:
    Container ID:  cri-o://779aa6240b286dad0dc4c71ba5d54c1a4b959dfe1f08747db57cd11227cfc8c8
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976
    Port:          10257/TCP
    Host Port:     10257/TCP
    Command:
      /bin/bash
      -ec
    Args:
      if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
        echo "Copying system trust bundle"
        cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
      fi
      exec hyperkube kube-controller-manager --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml \
        --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \
        --authentication-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \
        --authorization-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \
        --client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt \
        --requestheader-client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/aggregator-client-ca/ca-bundle.crt -v=2 --tls-cert-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt --tls-private-key-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.key
    State:          Running
      Started:      Thu, 23 Jan 2020 13:12:13 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     200Mi
    Liveness:     http-get https://:10257/healthz delay=45s timeout=10s period=10s #success=1 #failure=3
    Readiness:    http-get https://:10257/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/kubernetes/static-pod-certs from cert-dir (rw)
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
  cluster-policy-controller-5:
    Container ID:  
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bf0eb67346e3c291acb83b1cc25e330f4ccc4561a89e3f936609992692848222
    Image ID:      
    Port:          10357/TCP
    Host Port:     10357/TCP
    Command:
      cluster-policy-controller
      start
    Args:
      --config=/etc/kubernetes/static-pod-resources/configmaps/cluster-policy-controller-config/config.yaml
    State:          Waiting
      Reason:       CreateContainerError
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     200Mi
    Liveness:     http-get https://:10357/healthz delay=45s timeout=10s period=10s #success=1 #failure=3
    Readiness:    http-get https://:10357/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/kubernetes/static-pod-certs from cert-dir (rw)
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
  kube-controller-manager-cert-syncer-5:
    Container ID:  cri-o://a0013b7a22f10364dbf85110bc85237c7906066448d5d92038c893b588d0dbd1
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b76f8f4582d98133993414af5208d0a271f635542f7c553f1f3b49404d3efbb5
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b76f8f4582d98133993414af5208d0a271f635542f7c553f1f3b49404d3efbb5
    Port:          <none>
    Host Port:     <none>
    Command:
      cluster-kube-controller-manager-operator
      cert-syncer
    Args:
      --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-controller-cert-syncer-kubeconfig/kubeconfig
      --namespace=$(POD_NAMESPACE)
      --destination-dir=/etc/kubernetes/static-pod-certs
    State:          Running
      Started:      Thu, 23 Jan 2020 13:12:18 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      POD_NAME:       kube-controller-manager-master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com (v1:metadata.name)
      POD_NAMESPACE:  openshift-kube-controller-manager (v1:metadata.namespace)
    Mounts:
      /etc/kubernetes/static-pod-certs from cert-dir (rw)
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  resource-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-5
    HostPathType:  
  cert-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/static-pod-resources/kube-controller-manager-certs
    HostPathType:  
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       
Events:
  Type     Reason   Age    From                                                                   Message
  ----     ------   ----   ----                                                                   -------
  Normal   Pulled   6m17s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976" already present on machine
  Normal   Created  6m17s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Created container wait-for-host-port
  Normal   Started  6m17s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Started container wait-for-host-port
  Normal   Pulled   6m16s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976" already present on machine
  Normal   Created  6m16s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Created container wait-for-cpc-host-port
  Normal   Started  6m16s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Started container wait-for-cpc-host-port
  Normal   Pulled   6m15s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce10fd84985a2522699e9d1152485e45c4eddb019f6a9b45707116757e115976" already present on machine
  Normal   Created  6m14s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Created container kube-controller-manager-5
  Normal   Started  6m14s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Started container kube-controller-manager-5
  Normal   Pulling  6m14s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bf0eb67346e3c291acb83b1cc25e330f4ccc4561a89e3f936609992692848222"
  Normal   Created  6m10s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Created container kube-controller-manager-cert-syncer-5
  Warning  Failed   6m10s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Error: container create failed: time="2020-01-23T12:12:17Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"cluster-policy-controller\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"cluster-policy-controller\": executable file not found in $PATH"
  Normal   Pulled   6m10s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b76f8f4582d98133993414af5208d0a271f635542f7c553f1f3b49404d3efbb5" already present on machine
  Normal   Pulled   6m10s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bf0eb67346e3c291acb83b1cc25e330f4ccc4561a89e3f936609992692848222"
  Normal   Started  6m9s   kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Started container kube-controller-manager-cert-syncer-5
  Warning  Failed   6m8s   kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Error: container create failed: time="2020-01-23T12:12:19Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"cluster-policy-controller\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"cluster-policy-controller\": executable file not found in $PATH"
  Warning  Failed  6m7s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Error: container create failed: time="2020-01-23T12:12:20Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"cluster-policy-controller\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"cluster-policy-controller\": executable file not found in $PATH"
  Warning  Failed  5m55s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Error: container create failed: time="2020-01-23T12:12:32Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"cluster-policy-controller\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"cluster-policy-controller\": executable file not found in $PATH"
  Warning  Failed  5m40s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Error: container create failed: time="2020-01-23T12:12:47Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"cluster-policy-controller\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"cluster-policy-controller\": executable file not found in $PATH"
  Warning  Failed  5m28s  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Error: container create failed: time="2020-01-23T12:12:59Z" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"cluster-policy-controller\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"cluster-policy-controller\": executable file not found in $PATH"
  Normal  Pulled  67s (x25 over 6m9s)  kubelet, master-02.mrnd-43-no-delete-e174.qe.devcluster.openshift.com  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bf0eb67346e3c291acb83b1cc25e330f4ccc4561a89e3f936609992692848222" already present on machine






Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.Install 4.3.0-0.nightly-2020-01-23-105702:
2.Wait for Waiting up to 30m0s for bootstrapping to complete...
3.oc get pods -n openshift-kube-controller-manager
4.kube-controller-manager-master-xxxxx.qe.devcluster.openshift.com is on CreateContainerError

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 David Sanz 2020-01-23 16:17:46 UTC
Reasigned to kube-controller-manager, as it is also failing on OSP IPI, same version

Comment 7 zhou ying 2020-02-03 00:56:24 UTC
Can't reproduce the issue with latest payload:4.3.0-0.nightly-2020-02-02-175954: 
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/772

Comment 9 errata-xmlrpc 2020-02-12 09:42:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0391


Note You need to log in before you can comment on or make changes to this bug.