Bug 1851874 - In-tree provisioner doesn't work on GCP
Summary: In-tree provisioner doesn't work on GCP
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Tomáš Nožička
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-29 09:59 UTC by Wei Duan
Modified: 2020-10-27 16:10 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:09:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:10:06 UTC

Description Wei Duan 2020-06-29 09:59:25 UTC
Description of problem:
After scheduling pod to a node, pvc with default sc is still no bound.  

Version-Release number of selected component (if applicable):
[wduan@MINT config]$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-rc.4   True        False         7h48m   Cluster version is 4.5.0-rc.4


How reproducible:
100%

Steps to Reproduce:
1.GCP cluster is installed, it is a disconnect env and with FIPS=on.
2.creating pvc with default sc
3.creating pod to consume pvc
4.$ oc get pod,pvc

Actual results:
[wduan@MINT 01_general]$ oc get pod,pvc
NAME          READY   STATUS              RESTARTS   AGE
pod/mypod02   0/1     ContainerCreating   0          10m

NAME                            STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/mypvc02   Pending                                      standard       10m


Expected results:
pod should be "running" and pvc should be "bound"

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:
[wduan@MINT 01_general]$ oc describe persistentvolumeclaim/mypvc02
Name:          mypvc02
Namespace:     wduan
StorageClass:  standard
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Mounted By:    mypod02
Events:
  Type    Reason                Age                 From                         Message
  ----    ------                ----                ----                         -------
  Normal  WaitForFirstConsumer  81s (x43 over 11m)  persistentvolume-controller  waiting for first consumer to be created before binding

StorageClass Dump (if StorageClass used by PV/PVC):
[wduan@MINT 01_general]$ oc get sc standard -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2020-06-29T01:32:01Z"
  name: standard
  ownerReferences:
  - apiVersion: v1
    kind: clusteroperator
    name: storage
    uid: c6ec2c1f-bfbe-470b-8a56-4c24e51df792
  resourceVersion: "10256"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/standard
  uid: 2ef24263-9ebe-451e-b0eb-075f3d73a2d9
parameters:
  replication-type: none
  type: pd-standard
provisioner: kubernetes.io/gce-pd
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer


Additional info:
[wduan@MINT 01_general]$ oc describe pod/mypod02
Name:         mypod02
Namespace:    wduan
Priority:     0
Node:         yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal/10.0.32.4
Start Time:   Mon, 29 Jun 2020 17:44:19 +0800
Labels:       name=frontendhttp
Annotations:  openshift.io/scc: anyuid
Status:       Pending
IP:           
IPs:          <none>
Containers:
  myfrontend:
    Container ID:   
    Image:          quay.io/openshifttest/storage@sha256:a05b96d373be86f46e76817487027a7f5b8b5f87c0ac18a246b018df11529b40
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/local from local (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pwztl (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  local:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mypvc02
    ReadOnly:   false
  default-token-pwztl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-pwztl
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From                                                              Message
  ----     ------            ----                 ----                                                              -------
  Warning  FailedScheduling  <unknown>            default-scheduler                                                 persistentvolumeclaim "mypvc02" not found
  Warning  FailedScheduling  <unknown>            default-scheduler                                                 persistentvolumeclaim "mypvc02" not found
  Normal   Scheduled         <unknown>            default-scheduler                                                 Successfully assigned wduan/mypod02 to yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal
  Warning  FailedMount       6m25s (x7 over 11m)  kubelet, yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal  Unable to attach or mount volumes: unmounted volumes=[local], unattached volumes=[default-token-pwztl local]: error processing PVC wduan/mypvc02: PVC is not bound
  Warning  FailedMount       114s (x35 over 11m)  kubelet, yinzhougcp-k2fqv-worker-c-krtm6.c.openshift-qe.internal  Unable to attach or mount volumes: unmounted volumes=[local], unattached volumes=[local default-token-pwztl]: error processing PVC wduan/mypvc02: PVC is not bound

Comment 2 Jan Safranek 2020-06-29 11:30:08 UTC
The reason is that kube-controller-manager is degraded:


$ oc get clusteroperator kube-controller-manager -o yaml
  - lastTransitionTime: "2020-06-29T08:15:44Z"
    message: "ConfigObservationDegraded: .spec.featureSet %!q(*v1.FeatureGateEnabledDisabled=<nil>)
      not found\nStaticPodsDegraded: pod/kube-controller-manager-yinzhougcp-k2fqv-master-1.c.openshift-qe.internal
      container \"cluster-policy-controller\" is not ready: unknown reason\nStaticPodsDegraded:
      pod/kube-controller-manager-yinzhougcp-k2fqv-master-1.c.openshift-qe.internal
      container \"cluster-policy-controller\" is terminated: Error: I0629 11:22:13.896668
      \      1 policy_controller.go:41] Starting controllers on 0.0.0.0:10357 (d88621be)\nStaticPodsDegraded:
      I0629 11:22:13.898539       1 standalone_apiserver.go:103] Started health checks
      at 0.0.0.0:10357\nStaticPodsDegraded: I0629 11:22:13.899177       1 leaderelection.go:242]
      attempting to acquire leader lease  openshift-kube-controller-manager/cluster-policy-controller...\nStaticPodsDegraded:
      F0629 11:22:13.899898       1 standalone_apiserver.go:119] listen tcp 0.0.0.0:10357:
      bind: address already in use\nStaticPodsDegraded: \nStaticPodsDegraded: pod/kube-controller-manager-yinzhougcp-k2fqv-master-1.c.openshift-qe.internal
      container \"kube-controller-manager\" is not ready: unknown reason"
    reason: ConfigObservation_Error::StaticPods_Error
    status: "True"
    type: Degraded


After deleting the kube-controller-manager pods (not the operator), I got this from kube-controller-manager:
  - lastTransitionTime: "2020-06-29T08:15:44Z"
    message: 'ConfigObservationDegraded: .spec.featureSet %!q(*v1.FeatureGateEnabledDisabled=<nil>)
      not found'
    reason: ConfigObservation_Error
    status: "True"
    type: Degraded

Comment 3 Maciej Szulik 2020-07-01 09:31:10 UTC
Can you either provide us with a cluster where this is happening or must-gather dump from that cluster?
I'm especially interested in the following resources:

oc get featuregates/cluster -oyaml
oc get kubecontrollermanager/cluster -oyaml

Comment 4 Wei Duan 2020-07-06 03:14:46 UTC
Sorry I missed the "needinfo" notify and the cluster was removed already.
I set up a new cluster with the same flexy template and the 4.5.0-0.nightly-2020-07-02-190154 last friday, I did not hit this issue again.

Comment 5 Maciej Szulik 2020-07-06 11:11:01 UTC
I'm lowering the priority based on previous comment, when you hit the issue again please let us know.

Comment 6 Wei Duan 2020-07-07 01:16:04 UTC
I removed the TestBlocker tag.

Comment 7 Maciej Szulik 2020-07-07 08:30:28 UTC
It looks like this might have been fixed with https://github.com/openshift/cluster-kube-controller-manager-operator/pull/415 moving to qa for verification.

Comment 11 zhou ying 2020-07-08 15:02:08 UTC
Confirmed with payload: 4.6.0-0.nightly-2020-07-07-141639

[root@dhcp-140-138 ~]# oc get po 
NAME    READY   STATUS         RESTARTS   AGE
mypod   0/1     ErrImagePull   0          4m24s

[zhouying@dhcp-140-138 ~]$ oc get pvc
NAME   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ebs    Bound    pvc-f439d62c-3611-49cb-8cc8-ca4931998394   1Gi        RWO            standard       49s

Comment 14 errata-xmlrpc 2020-10-27 16:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.