Description of problem: After scheduling pod to a node, pvc with default sc is still no bound. Version-Release number of selected component (if applicable): [wduan@MINT config]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-rc.4 True False 7h48m Cluster version is 4.5.0-rc.4 How reproducible: 100% Steps to Reproduce: 1.GCP cluster is installed, it is a disconnect env and with FIPS=on. 2.creating pvc with default sc 3.creating pod to consume pvc 4.$ oc get pod,pvc Actual results: [wduan@MINT 01_general]$ oc get pod,pvc NAME READY STATUS RESTARTS AGE pod/mypod02 0/1 ContainerCreating 0 10m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/mypvc02 Pending standard 10m Expected results: pod should be "running" and pvc should be "bound" Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: [wduan@MINT 01_general]$ oc describe persistentvolumeclaim/mypvc02 Name: mypvc02 Namespace: wduan StorageClass: standard Status: Pending Volume: Labels: <none> Annotations: <none> Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Mounted By: mypod02 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 81s (x43 over 11m) persistentvolume-controller waiting for first consumer to be created before binding StorageClass Dump (if StorageClass used by PV/PVC): [wduan@MINT 01_general]$ oc get sc standard -o yaml allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2020-06-29T01:32:01Z" name: standard ownerReferences: - apiVersion: v1 kind: clusteroperator name: storage uid: c6ec2c1f-bfbe-470b-8a56-4c24e51df792 resourceVersion: "10256" selfLink: /apis/storage.k8s.io/v1/storageclasses/standard uid: 2ef24263-9ebe-451e-b0eb-075f3d73a2d9 parameters: replication-type: none type: pd-standard provisioner: kubernetes.io/gce-pd reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer Additional info: [wduan@MINT 01_general]$ oc describe pod/mypod02 Name: mypod02 Namespace: wduan Priority: 0 Node: yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal/10.0.32.4 Start Time: Mon, 29 Jun 2020 17:44:19 +0800 Labels: name=frontendhttp Annotations: openshift.io/scc: anyuid Status: Pending IP: IPs: <none> Containers: myfrontend: Container ID: Image: quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40 Image ID: Port: 80/TCP Host Port: 0/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt/local from local (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-pwztl (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: local: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mypvc02 ReadOnly: false default-token-pwztl: Type: Secret (a volume populated by a Secret) SecretName: default-token-pwztl Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler persistentvolumeclaim "mypvc02" not found Warning FailedScheduling <unknown> default-scheduler persistentvolumeclaim "mypvc02" not found Normal Scheduled <unknown> default-scheduler Successfully assigned wduan/mypod02 to yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal Warning FailedMount 6m25s (x7 over 11m) kubelet, yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal Unable to attach or mount volumes: unmounted volumes=[local], unattached volumes=[default-token-pwztl local]: error processing PVC wduan/mypvc02: PVC is not bound Warning FailedMount 114s (x35 over 11m) kubelet, yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal Unable to attach or mount volumes: unmounted volumes=[local], unattached volumes=[local default-token-pwztl]: error processing PVC wduan/mypvc02: PVC is not bound
The reason is that kube-controller-manager is degraded: $ oc get clusteroperator kube-controller-manager -o yaml - lastTransitionTime: "2020-06-29T08:15:44Z" message: "ConfigObservationDegraded: .spec.featureSet %!q(*v1.FeatureGateEnabledDisabled=<nil>) not found\nStaticPodsDegraded: pod/kube-controller-manager-yinzhougcp-k2fqv-master-1.c.openshift-qe.internal container \"cluster-policy-controller\" is not ready: unknown reason\nStaticPodsDegraded: pod/kube-controller-manager-yinzhougcp-k2fqv-master-1.c.openshift-qe.internal container \"cluster-policy-controller\" is terminated: Error: I0629 11:22:13.896668 \ 1 policy_controller.go:41] Starting controllers on 0.0.0.0:10357 (d88621be)\nStaticPodsDegraded: I0629 11:22:13.898539 1 standalone_apiserver.go:103] Started health checks at 0.0.0.0:10357\nStaticPodsDegraded: I0629 11:22:13.899177 1 leaderelection.go:242] attempting to acquire leader lease openshift-kube-controller-manager/cluster-policy-controller...\nStaticPodsDegraded: F0629 11:22:13.899898 1 standalone_apiserver.go:119] listen tcp 0.0.0.0:10357: bind: address already in use\nStaticPodsDegraded: \nStaticPodsDegraded: pod/kube-controller-manager-yinzhougcp-k2fqv-master-1.c.openshift-qe.internal container \"kube-controller-manager\" is not ready: unknown reason" reason: ConfigObservation_Error::StaticPods_Error status: "True" type: Degraded After deleting the kube-controller-manager pods (not the operator), I got this from kube-controller-manager: - lastTransitionTime: "2020-06-29T08:15:44Z" message: 'ConfigObservationDegraded: .spec.featureSet %!q(*v1.FeatureGateEnabledDisabled=<nil>) not found' reason: ConfigObservation_Error status: "True" type: Degraded
Can you either provide us with a cluster where this is happening or must-gather dump from that cluster? I'm especially interested in the following resources: oc get featuregates/cluster -oyaml oc get kubecontrollermanager/cluster -oyaml
Sorry I missed the "needinfo" notify and the cluster was removed already. I set up a new cluster with the same flexy template and the 4.5.0-0.nightly-2020-07-02-190154 last friday, I did not hit this issue again.
I'm lowering the priority based on previous comment, when you hit the issue again please let us know.
I removed the TestBlocker tag.
It looks like this might have been fixed with https://github.com/openshift/cluster-kube-controller-manager-operator/pull/415 moving to qa for verification.
Confirmed with payload: 4.6.0-0.nightly-2020-07-07-141639 [root@dhcp-140-138 ~]# oc get po NAME READY STATUS RESTARTS AGE mypod 0/1 ErrImagePull 0 4m24s [zhouying@dhcp-140-138 ~]$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ebs Bound pvc-f439d62c-3611-49cb-8cc8-ca4931998394 1Gi RWO standard 49s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196