We'll evaluate this next sprint.
There is a pending upstream PR which depends on VMWare folks: https://github.com/kubernetes/kubernetes/pull/90836 It is approved, but it might take time for bot to merge it.
I'm lowering severity and priority here since there's workarounds for when the creds get stale [1], [2]. This has been fixed upstream [3] and back port [4] should get merged anytime now [1] https://bugzilla.redhat.com/show_bug.cgi?id=1821280#c7 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1823782#c7 [3] https://github.com/kubernetes/kubernetes/pull/90836 [4] https://github.com/openshift/origin/pull/25166
*** Bug 1824481 has been marked as a duplicate of this bug. ***
Seems like the issue was not fixed via addition of secret watcher and refactoring in https://github.com/kubernetes/kubernetes/pull/90836 only got to a larger scope. hekumar pointed out possible case in https://bugzilla.redhat.com/show_bug.cgi?id=1863009#c10
The PR https://github.com/kubernetes/kubernetes/pull/93971 might fix the issue.
PR which fixes deadlock in volume provisioning: https://github.com/openshift/origin/pull/25427
All PRs are merged, moving it to modified
VERIFIED ON - [miyadav@miyadav vsphere]$ oc get clusterversion --config vsp NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-08-123737 True False 78m Cluster version is 4.6.0-0.nightly-2020-09-08-123737 Steps: 1.Scaled cvo deployment to 0 oc scale deployment cluster-version-operator --replicas 0 -n openshift-cluster-version --config vsp 2.Changed to password string randomly in cloud-credential secret in openshift-machine-api namespace 3. created a PVC using below yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi Expected - PVC creation fails due to invalid credentials Actual - PVC created successfully Additional Info: Steps are based on https://bugzilla.redhat.com/show_bug.cgi?id=1824481 , which was duplicate of this bug ...
@Milind Yadav The steps are incorrect. The secret located in the "openshift-machine-api" namespace has nothing to do with vSphere volume provisioning plugin. If you change this secret, you'll only see machine-api unable to create and provision a new machine. You should change the secret pointed in the cloud provider config. You could view it with "oc get cm cloud-provider-config -n openshift-config -o yaml" The content should point on the secret: secret-name = "vsphere-creds" secret-namespace = "kube-system" You could therefore test it by doing edit the following way: oc edit secret -n kube-system vsphere-creds
Hey Danil ..sorry about using incorrect namespace to modify creds for testing but even after making changes to the secret in kube-system namspace , I am still able to create PVC .. I guess the issue might be with the build and fix may not be present in it .. Can you confirm if that's the case ? I can confirm it tomorrow after taking the latest build .. this nightly build was created 23 hr ago , hence thought it should be having the fix ..
Thanks Danil for helping out with the steps.... Validated on- [miyadav@miyadav vsphere]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-09-184544 True False 44m Cluster version is 4.6.0-0.nightly-2020-09-09-184544 Steps : 1.update vsphere-secrets in kube-system namespace with invalid credentials (keep the original cred handy) 2. To make sure cache is not used and machine-controller pods are restarted , add blank line to cloud provider config using - oc edit cm -n openshift-config cloud-provider-config (wait for few mins for new machine-controller pods to come up in openshift-machine-api) 3.Create PVC using below - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi 4.Use this PVC to create pod in openshift-machine-api as below : apiVersion: "v1" kind: "Pod" metadata: name: "mypod" labels: name: "frontendhttp" spec: containers: - name: "myfrontend" image: "nginx" ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/var/www/html" name: "pvol" volumes: - name: "pvol" persistentVolumeClaim: claimName: "pvc" 5.Pod will hung into ContainerCreate state : check events : . . 21s Warning FailedAttachVolume pod/mypod AttachVolume.Attach failed for volume "pvc-a79b7e0d-31e2-4ac6-a7cf-f1647158af8d" : ServerFaultCode: Cannot complete login due to an incorrect user name or password. 38s Warning FailedMount pod/mypod Unable to attach or mount volumes: unmounted volumes=[pvol], unattached volumes=[pvol default-token-kzk46]: timed out waiting for the condition 6.Update the cred again to correct value and edit blank line in cloud-provider config (refer step 2 ) 7.Wait for few mins , pod will come up successfully , ensuring PVC able to provision volume Result as expected : oc get events : . . 10m Normal ProvisioningSucceeded persistentvolumeclaim/pvc Successfully provisioned volume pvc-a79b7e0d-31e2-4ac6-a7cf-f1647158af8d using kubernetes.io/vsphere-volume . . Additonal Info: Moved to VERIFIED
*** Bug 1884674 has been marked as a duplicate of this bug. ***
Hi Danil, Any plans of porting these changes into 4.6? That would help get https://github.com/openshift/kubernetes/pull/397 closer to merge
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196