Bug 1821280 - Unable to provision vSphere volume [NEEDINFO]
Summary: Unable to provision vSphere volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.6.0
Assignee: Danil Grigorev
QA Contact: Milind Yadav
URL:
Whiteboard:
: 1824481 1884674 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-06 13:31 UTC by Mohit
Modified: 2023-12-15 17:38 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 15:57:43 UTC
Target Upstream Version:
Embargoed:
vareti: needinfo? (dgrigore)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes pull 90836 0 None closed Added ability for vSphere to reconnect on secret update 2021-06-11 13:46:52 UTC
Github kubernetes kubernetes pull 93971 0 None closed Refactor locks logic on registeredNodesLock to be non-blocking 2021-06-11 13:46:54 UTC
Github openshift origin pull 25166 0 None closed Bug 1821280: Unable to provision vSphere volume 2021-06-11 13:46:54 UTC
Red Hat Bugzilla 1863009 0 high CLOSED vSphere provision failure on ocp46 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 5669581 0 None None None 2021-06-11 13:46:12 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:58:12 UTC

Comment 16 Alberto 2020-06-18 14:48:52 UTC
We'll evaluate this next sprint.

Comment 17 Danil Grigorev 2020-06-18 14:55:04 UTC
There is a pending upstream PR which depends on VMWare folks: https://github.com/kubernetes/kubernetes/pull/90836 It is approved, but it might take time for bot to merge it.

Comment 18 Alberto 2020-06-19 08:41:26 UTC
I'm lowering severity and priority here since there's workarounds for when the creds get stale [1], [2].
This has been fixed upstream [3] and back port [4] should get merged anytime now

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1821280#c7
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1823782#c7
[3] https://github.com/kubernetes/kubernetes/pull/90836
[4] https://github.com/openshift/origin/pull/25166

Comment 19 Danil Grigorev 2020-06-19 08:41:52 UTC
*** Bug 1824481 has been marked as a duplicate of this bug. ***

Comment 23 Danil Grigorev 2020-08-13 19:48:02 UTC
Seems like the issue was not fixed via addition of secret watcher and refactoring in https://github.com/kubernetes/kubernetes/pull/90836 only got to a larger scope.

hekumar pointed out possible case in https://bugzilla.redhat.com/show_bug.cgi?id=1863009#c10

Comment 24 Danil Grigorev 2020-08-13 19:48:49 UTC
The PR https://github.com/kubernetes/kubernetes/pull/93971 might fix the issue.

Comment 25 Danil Grigorev 2020-08-20 13:13:36 UTC
PR which fixes deadlock in volume provisioning: https://github.com/openshift/origin/pull/25427

Comment 26 Danil Grigorev 2020-09-07 09:26:40 UTC
All PRs are merged, moving it to modified

Comment 28 Milind Yadav 2020-09-09 10:29:30 UTC
VERIFIED ON - 
[miyadav@miyadav vsphere]$ oc get clusterversion --config vsp
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-08-123737   True        False         78m     Cluster version is 4.6.0-0.nightly-2020-09-08-123737


Steps:
1.Scaled cvo deployment to 0
oc scale deployment cluster-version-operator --replicas 0 -n openshift-cluster-version --config vsp

2.Changed to password string randomly in cloud-credential secret in openshift-machine-api namespace 

3. created a PVC using below yaml 
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Expected - PVC creation fails due to invalid credentials 
Actual - PVC created successfully 

Additional Info:
Steps are based on https://bugzilla.redhat.com/show_bug.cgi?id=1824481 , which was duplicate of this bug ...

Comment 29 Danil Grigorev 2020-09-09 10:40:21 UTC
@Milind Yadav The steps are incorrect. The secret located in the "openshift-machine-api" namespace has nothing to do with vSphere volume provisioning plugin. If you change this secret, you'll only see machine-api unable to create and provision a new machine.

You should change the secret pointed in the cloud provider config. You could view it with

"oc get cm cloud-provider-config -n openshift-config -o yaml"

The content should point on the secret:

    secret-name = "vsphere-creds"
    secret-namespace = "kube-system"

You could therefore test it by doing edit the following way:

oc edit secret -n kube-system  vsphere-creds

Comment 31 Milind Yadav 2020-09-09 12:57:59 UTC
Hey Danil ..sorry about using incorrect namespace to modify creds for testing 
but even after making changes to the secret in kube-system namspace , I am still able to create PVC ..
I guess the issue might be with the build and fix may not be present in it  ..

Can you confirm if that's the case ?
 I can confirm it tomorrow after taking the latest build .. this nightly build was created 23 hr ago , hence thought it should be having the fix ..

Comment 32 Milind Yadav 2020-09-10 03:03:58 UTC
Thanks Danil for helping out with the steps....

Validated on- 
[miyadav@miyadav vsphere]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-09-184544   True        False         44m     Cluster version is 4.6.0-0.nightly-2020-09-09-184544


Steps :
1.update vsphere-secrets in kube-system namespace with invalid credentials (keep the original cred handy)

2. To make sure cache is not used and machine-controller pods are restarted , add blank line to cloud provider config using - oc edit cm -n openshift-config cloud-provider-config (wait for few mins for new machine-controller pods to come up in openshift-machine-api)
3.Create PVC using below -

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
4.Use this PVC to create pod in openshift-machine-api as below :

apiVersion: "v1"
kind: "Pod"
metadata:
  name: "mypod"
  labels:
    name: "frontendhttp"
spec:
  containers:
    -
      name: "myfrontend"
      image: "nginx"
      ports:
        -
          containerPort: 80
          name: "http-server"
      volumeMounts:
        -
          mountPath: "/var/www/html"
          name: "pvol"
  volumes:
    -
      name: "pvol"
      persistentVolumeClaim:
        claimName: "pvc"


5.Pod will hung into ContainerCreate state : check events :
.
.

21s         Warning   FailedAttachVolume       pod/mypod                                           AttachVolume.Attach failed for volume "pvc-a79b7e0d-31e2-4ac6-a7cf-f1647158af8d" : ServerFaultCode: Cannot complete login due to an incorrect user name or password.
38s         Warning   FailedMount              pod/mypod                                           Unable to attach or mount volumes: unmounted volumes=[pvol], unattached volumes=[pvol default-token-kzk46]: timed out waiting for the condition

6.Update the cred again to correct value and edit blank line in cloud-provider config (refer step 2 )

7.Wait for few mins , pod will come up successfully , ensuring PVC able to provision volume 
                   
Result as expected : oc get events :

.
.
10m         Normal    ProvisioningSucceeded    persistentvolumeclaim/pvc                           Successfully provisioned volume pvc-a79b7e0d-31e2-4ac6-a7cf-f1647158af8d using kubernetes.io/vsphere-volume
.
.
Additonal Info:
Moved to VERIFIED

Comment 33 Hemant Kumar 2020-10-05 14:39:15 UTC
*** Bug 1884674 has been marked as a duplicate of this bug. ***

Comment 34 Venkata Siva Teja Areti 2020-10-13 17:09:25 UTC
Hi Danil,

Any plans of porting these changes into 4.6? That would help get https://github.com/openshift/kubernetes/pull/397 closer to merge

Comment 36 errata-xmlrpc 2020-10-27 15:57:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.