Bug 1771358

Summary: Unable to provision the cinder volume in OCP 4.2 on OSP 13
Product: OpenShift Container Platform Reporter: Sudarshan Chaudhari <suchaudh>
Component: InstallerAssignee: Mike Fedosin <mfedosin>
Installer sub component: OpenShift on OpenStack QA Contact: David Sanz <dsanzmor>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: andcosta, aos-bugs, chuffman, jialiu, lxia, mfedosin, mfojtik, openshift-bugs-escalate, peli, pprinett, rhowe, rsunog
Version: 4.2.z   
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1772002 (view as bug list) Environment:
Last Closed: 2020-01-07 17:55:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1772002    
Bug Blocks:    

Description Sudarshan Chaudhari 2019-11-12 09:07:13 UTC
Description of problem:

Unable to provision the dynamic volume from Cinder. 
Getting No error in the events of PVC while we observe errors as [1] in kube-controller-manager logs.

- This is IPI installation.
- Verified clouds.yaml file and everything looks correct.
- No event is captured in Stack server to verify if request is reaching openstack server.
- Attaching the clouds.yaml and mustgather information.

Version-Release number of selected component (if applicable):

Actual results:
THe cinder volume provisioning is failing with error [1]

Expected results:
Should be completed.

Master Log:

2019-11-10T13:19:13.642273749Z E1110 13:19:13.642213       1 openstack.go:295] Cannot initialize cloud provider using data from the secret.
2019-11-10T13:19:13.642273749Z E1110 13:19:13.642250       1 openstack_instances.go:69] cannot initialize cloud provider, only limited functionality is available : cloud provider is not initialized
2019-11-10T13:19:13.990799329Z E1110 13:19:13.990710       1 openstack.go:295] Cannot initialize cloud provider using data from the secret.
2019-11-10T13:19:13.990799329Z E1110 13:19:13.990755       1 core.go:77] Failed to start service controller: the cloud provider does not support external load balancers
2019-11-10T13:19:14.333330569Z E1110 13:19:14.333298       1 goroutinemap.go:150] Operation for "provision-openshift-logging/test[f746c14b-03b9-11ea-9868-fa163e7e4b6e]" failed. No retries permitted until 2019-11-10 13:19:14.833242519 +0000 UTC m=+54.363015165 (durationBeforeRetry 500ms). Error: "recovered from panic \"runtime error: invalid memory address or nil pointer dereference\". (err=<nil>) Call stack:\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:158\n/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522\n/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:82\n/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/signal_unix.go:390\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/gophercloud/gophercloud/openstack/client.go:300\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/gophercloud/gophercloud/openstack/client.go:344\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack_client.go:72\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go:919\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack_volumes.go:459\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/volume/cinder/cinder_util.go:218\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/volume/cinder/cinder.go:561\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go:1483\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go:1346\n/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/goroutinemap/goroutinemap.go:115\n/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337\n"

StorageClass Dump (if StorageClass used by PV/PVC):
cat ./cluster-scoped-resources/storage.k8s.io/storageclasses/gp2.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
  creationTimestamp: 2019-11-10T12:41:34Z
  name: gp2
  resourceVersion: "468334"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
  uid: 6db49b6b-03b7-11ea-9868-fa163e7e4b6e
  availability: nova
  type: standarad
provisioner: kubernetes.io/cinder
reclaimPolicy: Delete
volumeBindingMode: Immediate


Additional info:
Similar Links:

Comment 3 Ryan Howe 2019-11-12 17:19:18 UTC
Taking a guess it seems as thought our cloud credentials are not set right

   openstack.go:295] Cannot initialize cloud provider using data from the secret.

Cloud config is seen here: 
# oc get cm -n openshift-config cloud-provider-config -o yaml 
Copied here for the api to use: 
# ocs get cm -n openshift-kube-apiserver cloud-config -o yaml

By default the openstack credentials are set via a secret: 

    secret-name = openstack-credentials
    secret-namespace = kube-system

# oc get secret -n  kube-system  openstack-credentials --template='{{index .data "clouds.yaml"}}' | base64 -d
# oc get secret -n  kube-system  openstack-credentials --template='{{index .data "clouds.conf"}}' | base64 -d

If those are not correct then you will likely need to update them with the correct credentials and auth to make api calls to OpenStack.

Comment 4 Mike Fedosin 2019-11-12 18:01:08 UTC
In the StorageClass description I see a typo in "type" section:
  availability: nova
  type: standarad

I think it should be standarad -> standard
Maybe that's not the reason for the bug, but you probably want to fix it.

As Ryan Howe said this error happens when kube-controller-manager tries to authenticate in OpenStack using the auth data from the secret. Looks like it happens here: https://github.com/openshift/origin/blob/release-4.2/vendor/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go#L420
Please, compare data from clouds.conf with clouds.yaml, probably there are some differences
# oc get secret -n  kube-system  openstack-credentials --template='{{index .data "clouds.yaml"}}' | base64 -d
# oc get secret -n  kube-system  openstack-credentials --template='{{index .data "clouds.conf"}}' | base64 -d

Also my thoughts: now it's not clear what's exactly the cause of the failure, because the original bug is hidden under the message https://github.com/openshift/origin/blob/release-4.2/vendor/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go#L296 
I'm going to create a bz for this and propose a pr to wrap the original err.

Comment 5 Mike Fedosin 2019-11-12 19:01:19 UTC
As I promised, I've proposed a patch that gives a clearer understanding of the initial error https://github.com/openshift/origin/pull/24136 and its 4.2 version https://github.com/openshift/origin/pull/24137

Comment 11 David Sanz 2019-12-02 15:15:30 UTC
$ oc get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Bound    pvc-3a4d4c56-1516-11ea-996d-fa163eb648be   8Gi        RWO            standard       47s

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
pvc-3a4d4c56-1516-11ea-996d-fa163eb648be   8Gi        RWO            Delete           Bound    default/myclaim   standard                46s
$ openstack volume list | grep -i morenod-ocp
| 29f2ad31-4e11-4c53-b251-c71f02c55031 | morenod-ocp-79sdx-dynamic-pvc-3a4d4c56-1516-11ea-996d-fa163eb648be     | in-use    |    8 | Attached to morenod-ocp-79sdx-worker-fx4cb on /dev/vdb        |

Comment 14 David Sanz 2019-12-19 16:12:34 UTC
Verified on 4.2.0-0.nightly-2019-12-15-230238

(overcloud) [stack@undercloud-0 4.2]$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                   STORAGECLASS   REASON   AGE
pvc-b3556846-2279-11ea-9e5c-fa163e5e17e3   3Gi        RWO            Delete           Bound    default/task-pv-claim   standard                55s
(overcloud) [stack@undercloud-0 4.2]$ oc get pvc
NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
task-pv-claim   Bound    pvc-b3556846-2279-11ea-9e5c-fa163e5e17e3   3Gi        RWO            standard       4m13s

Comment 16 errata-xmlrpc 2020-01-07 17:55:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.