Bug 1985847

Summary: Storage CO is not available after enabling the vSphere CSI Driver featuregate due to "Failed to create \"cnsvspherevolumemigrations.cns.vmware.com\" CRD
Product: OpenShift Container Platform Reporter: Wei Duan <wduan>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, hekumar, jsafrane
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-22 21:47:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wei Duan 2021-07-26 04:01:10 UTC
Description of problem:
After enabling the vsphere-csi-driver featuregate, Storage CO become not available, the vmware-vsphere-csi-driver-controller pod is "CrashLoopBackOff", checking the log, the csi-driver is not ready and reports error like:
"Failed to create \"cnsvspherevolumemigrations.cns.vmware.com\" CRD
"failed to get migration service. Err: the server could not find the requested resource"

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-07-25-125326

How reproducible:
1/1

Steps to Reproduce:
1. Install OCP cluster on vSphere
2. Enable the featuregate to install the vmware-vsphere-csi-driver
$ oc patch featuregate cluster -p '{"spec": {"featureSet": "TechPreviewNoUpgrade"}}' --type merge
3. Storage CO become not available, the vmware-vsphere-csi-driver-controller pod is "CrashLoopBackOff"
$ oc -n openshift-cluster-csi-drivers get pod vmware-vsphere-csi-driver-controller-b89488c7c-5h5hl 
NAME                                                   READY   STATUS             RESTARTS   AGE
vmware-vsphere-csi-driver-controller-b89488c7c-5h5hl   8/9     CrashLoopBackOff   31         91m


4. Checking the log of container csi-driver:
$ oc -n openshift-cluster-csi-drivers logs vmware-vsphere-csi-driver-controller-b89488c7c-5h5hl -c csi-driver
...
{"level":"error","time":"2021-07-26T03:34:29.340800533Z","caller":"kubernetes/kubernetes.go:368","msg":"Failed to create \"cnsvspherevolumemigrations.cns.vmware.com\" CRD with err: the server could not find the requested resource","TraceId":"c3fb7043-1a56-4385-8262-fa158ceb3efb","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/kubernetes.createCustomResourceDefinition\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/kubernetes/kubernetes.go:368\nsigs.k8s.io/vsphere-csi-driver/pkg/kubernetes.CreateCustomResourceDefinitionFromSpec\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/kubernetes/kubernetes.go:332\nsigs.k8s.io/vsphere-csi-driver/pkg/apis/migration.GetVolumeMigrationService\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/apis/migration/migration.go:117\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).Init\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:205\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service.(*service).BeforeServe\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/service.go:130\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/lib/golang/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/lib/golang/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:130\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:64\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:225"}
{"level":"error","time":"2021-07-26T03:34:29.340943411Z","caller":"migration/migration.go:120","msg":"failed to create volume migration CRD. Error: the server could not find the requested resource","TraceId":"c3fb7043-1a56-4385-8262-fa158ceb3efb","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/apis/migration.GetVolumeMigrationService\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/apis/migration/migration.go:120\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).Init\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:205\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service.(*service).BeforeServe\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/service.go:130\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/lib/golang/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/lib/golang/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:130\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:64\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:225"}
{"level":"error","time":"2021-07-26T03:34:29.340971587Z","caller":"vanilla/controller.go:207","msg":"failed to get migration service. Err: the server could not find the requested resource","TraceId":"c3fb7043-1a56-4385-8262-fa158ceb3efb","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).Init\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:207\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service.(*service).BeforeServe\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/service.go:130\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/lib/golang/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/lib/golang/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:130\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:64\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:225"}
{"level":"error","time":"2021-07-26T03:34:29.340996463Z","caller":"service/service.go:131","msg":"failed to init controller. Error: the server could not find the requested resource","TraceId":"14cde6db-8996-47b1-a72f-2ae1d85a3a05","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service.(*service).BeforeServe\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/service.go:131\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/lib/golang/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/lib/golang/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/gocsi.go:130\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:64\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:225"}
{"level":"info","time":"2021-07-26T03:34:29.34103361Z","caller":"service/service.go:106","msg":"configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"VANILLA\" and mode: \"controller\"","TraceId":"14cde6db-8996-47b1-a72f-2ae1d85a3a05"}
time="2021-07-26T03:34:29Z" level=info msg="removed sock file" path=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2021-07-26T03:34:29Z" level=fatal msg="grpc failed" error="the server could not find the requested resource"

Actual results:
The vmware-vsphere-csi-driver-controller pod is "CrashLoopBackOff"

Expected results:
The vmware-vsphere-csi-driver-controller pod should be "Ready"

Comment 2 Fabio Bertinatto 2021-11-09 17:52:37 UTC
This has been fixed already.

Comment 4 Wei Duan 2021-11-10 03:33:08 UTC
The bug was fixed when the 4.9 released.
Also double checked with the latest 4.9 nightly(4.9.0-0.nightly-2021-11-09-154007) passed.
Marked it as "Verified"

Comment 7 errata-xmlrpc 2021-11-22 21:47:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.8 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4712