Bug 2101139 - [Tracker for OCP BZ #2102782] topolvm-controller get into CrashLoopBackOff few minutes after install
Summary: [Tracker for OCP BZ #2102782] topolvm-controller get into CrashLoopBackOff fe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: lvm-operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.11.0
Assignee: N Balachandran
QA Contact: Shay Rozen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-25 23:21 UTC by Shay Rozen
Modified: 2023-08-09 16:46 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-24 13:55:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:6156 0 None None None 2022-08-24 13:55:50 UTC

Description Shay Rozen 2022-06-25 23:21:59 UTC
Description of problem (please be detailed as possible and provide log
snippests):
When installing lvmcluster the following occurs:
oc get pods
NAME                                               READY   STATUS             RESTARTS      AGE
lvm-operator-controller-manager-666d94dfd6-l2w26   3/3     Running            0             12m
topolvm-controller-7f54b85768-4czjk                4/5     CrashLoopBackOff   2 (19s ago)   4m36s
topolvm-node-bjq9n                                 0/4     Init:0/1           0             4m36s
vg-manager-l4q2v                                   1/1     Running            0             4m36s




Version of all relevant components (if applicable):
odf-operator quay.io/rhceph-dev/ocs-registry:4.11.0-103

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Can't install lvmcluster

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
2/2

Can this issue reproduce from the UI?
NR

If this is a regression, please provide more details to justify this:
Cluster could be installed and now not

Steps to Reproduce:
1. install SNO
2. Create namespace openshift-storage. Create operator group and sub for lvm
3. Install LVM cluster


Actual results:
(381venv) [srozen@srozen-lap 4.11]$ oc get pods
NAME                                               READY   STATUS             RESTARTS      AGE
lvm-operator-controller-manager-666d94dfd6-l2w26   3/3     Running            0             12m
topolvm-controller-7f54b85768-4czjk                4/5     CrashLoopBackOff   2 (19s ago)   4m36s
topolvm-node-bjq9n                                 0/4     Init:0/1           0             4m36s
vg-manager-l4q2v                                   1/1     Running            0             4m36s


Expected results:
All pod should be in running state.

Additional info:
(381venv) [srozen@srozen-lap 4.11]$ oc logs topolvm-node-bjq9n --all-containers
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
waiting for lvmd config file
Error from server (BadRequest): container "lvmd" in pod "topolvm-node-bjq9n" is waiting to start: PodInitializing

Comment 3 Shay Rozen 2022-06-25 23:39:31 UTC
(381venv) [srozen@srozen-lap 4.11]$ oc logs topolvm-controller-5b6d5b9d65-kbzpj --all-containers
I0625 23:32:02.480078       1 main.go:93] Version : v4.11.0-202206171549.p0.g2cea576.assembly.stream-0-gcc1e0d5-dirty
I0625 23:32:02.481980       1 common.go:111] Probing CSI driver for readiness
I0625 23:32:02.580180       1 controller.go:120] Register Pod informer for resizer topolvm.cybozu.com
I0625 23:32:02.580242       1 controller.go:255] Starting external resizer topolvm.cybozu.com
I0625 23:32:12.214180       1 main.go:149] calling CSI driver to discover driver name
I0625 23:32:12.215269       1 main.go:155] CSI driver name: "topolvm.cybozu.com"
I0625 23:32:12.215286       1 main.go:183] ServeMux listening at "0.0.0.0:9808"
I0625 23:32:21.979691       1 main.go:104] Version: v4.11.0-202206152147.p0.g0afdf73.assembly.stream-0-g739e722-dirty
I0625 23:32:21.982936       1 common.go:111] Probing CSI driver for readiness
I0625 23:32:21.984088       1 snapshot_controller_base.go:133] Starting CSI snapshotter
Generating a RSA private key
..++++
...........................................................................................................................++++
writing new private key to '/certs/tls.key'
-----
I0625 23:31:42.891642       1 request.go:665] Waited for 1.003123771s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/samples.operator.openshift.io/v1?timeout=32s
{"level":"info","ts":1656199904.3999352,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1656199904.400483,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/pod/mutate"}
{"level":"info","ts":1656199904.4006224,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/pvc/mutate"}
{"level":"info","ts":1656199904.401577,"logger":"setup","msg":"starting manager"}
{"level":"info","ts":1656199904.401758,"logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":1656199904.4019148,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
{"level":"info","ts":1656199904.40196,"msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":1656199904.402467,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1656199904.4027317,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
{"level":"info","ts":1656199904.4027257,"logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
I0625 23:31:44.503133       1 leaderelection.go:248] attempting to acquire leader lease openshift-storage/topolvm...
I0625 23:31:44.523636       1 leaderelection.go:258] successfully acquired lease openshift-storage/topolvm
{"level":"info","ts":1656199904.524109,"logger":"controller.persistentvolumeclaim","msg":"Starting EventSource","reconciler group":"","reconciler kind":"PersistentVolumeClaim","source":"kind source: *v1.PersistentVolumeClaim"}
{"level":"info","ts":1656199904.5241795,"logger":"controller.persistentvolumeclaim","msg":"Starting Controller","reconciler group":"","reconciler kind":"PersistentVolumeClaim"}
{"level":"info","ts":1656199904.5241122,"logger":"controller.node","msg":"Starting EventSource","reconciler group":"","reconciler kind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":1656199904.524206,"logger":"controller.node","msg":"Starting Controller","reconciler group":"","reconciler kind":"Node"}
{"level":"info","ts":1656199904.6255922,"logger":"controller.node","msg":"Starting workers","reconciler group":"","reconciler kind":"Node","worker count":1}
{"level":"info","ts":1656199904.6256433,"logger":"controller.persistentvolumeclaim","msg":"Starting workers","reconciler group":"","reconciler kind":"PersistentVolumeClaim","worker count":1}
{"level":"info","ts":1656199911.182282,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656199911.1829681,"logger":"driver.identity","msg":"GetPluginInfo","req":""}
{"level":"info","ts":1656199911.1833794,"logger":"driver.identity","msg":"GetPluginCapabilities","req":""}
{"level":"info","ts":1656199922.4825711,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656199922.4831116,"logger":"driver.identity","msg":"GetPluginInfo","req":""}
{"level":"info","ts":1656199922.4834502,"logger":"driver.identity","msg":"GetPluginCapabilities","req":""}
{"level":"info","ts":1656199932.2147274,"logger":"driver.identity","msg":"GetPluginInfo","req":""}
{"level":"info","ts":1656199941.9824266,"logger":"driver.identity","msg":"GetPluginInfo","req":""}
{"level":"info","ts":1656199941.9832141,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656199957.6922042,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656199989.77995,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656199989.7804866,"logger":"driver.identity","msg":"GetPluginInfo","req":""}
{"level":"info","ts":1656199989.7809036,"logger":"driver.identity","msg":"GetPluginCapabilities","req":""}
{"level":"info","ts":1656200017.691909,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656200077.6913662,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656200080.9826586,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656200080.983292,"logger":"driver.identity","msg":"GetPluginInfo","req":""}
{"level":"info","ts":1656200080.9836836,"logger":"driver.identity","msg":"GetPluginCapabilities","req":""}
{"level":"info","ts":1656200137.6923773,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656200184.881307,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656200184.8818867,"logger":"driver.identity","msg":"GetPluginInfo","req":""}
{"level":"info","ts":1656200184.8822117,"logger":"driver.identity","msg":"GetPluginCapabilities","req":""}
{"level":"info","ts":1656200197.6981711,"logger":"driver.identity","msg":"Probe","req":""}
{"level":"info","ts":1656200257.6925313,"logger":"driver.identity","msg":"Probe","req":""}
W0625 23:36:24.787233       1 feature_gate.go:237] Setting GA feature gate Topology=true. It will be removed in a future release.
I0625 23:36:24.787366       1 csi-provisioner.go:150] Version: v4.11.0-202206171549.p0.g0c7c708.assembly.stream-0-gafdafed-dirty
I0625 23:36:24.787372       1 csi-provisioner.go:173] Building kube configs for running in cluster...
I0625 23:36:24.880623       1 common.go:111] Probing CSI driver for readiness
I0625 23:36:24.882889       1 csi-provisioner.go:289] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
I0625 23:36:26.884097       1 request.go:601] Waited for 1.902234476s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/packages.operators.coreos.com/v1?timeout=32s
I0625 23:36:36.884209       1 request.go:601] Waited for 11.902273878s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v2?timeout=32s
I0625 23:36:47.883969       1 request.go:601] Waited for 22.901659377s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
I0625 23:36:57.884419       1 request.go:601] Waited for 32.902035222s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/topolvm.cybozu.com/v1?timeout=32s
I0625 23:37:08.883819       1 request.go:601] Waited for 43.901384999s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/ingress.operator.openshift.io/v1?timeout=32s
I0625 23:37:19.884173       1 request.go:601] Waited for 54.901708959s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/batch/v1?timeout=32s
I0625 23:37:30.884112       1 request.go:601] Waited for 1m5.901616892s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/console.openshift.io/v1alpha1?timeout=32s
I0625 23:37:41.884074       1 request.go:601] Waited for 1m16.897008129s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/monitoring.coreos.com/v1alpha1?timeout=32s
I0625 23:37:41.897447       1 csi-provisioner.go:442] using apps/v1/Deployment topolvm-controller as owner of CSIStorageCapacity objects
I0625 23:37:41.979537       1 nodes.go:205] Started node topology worker
F0625 23:37:41.981066       1 csi-provisioner.go:509] unexpected error when checking for the V1 CSIStorageCapacity API: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-storage:topolvm-controller" cannot create resource "csistoragecapacities" in API group "storage.k8s.io" in the namespace "default"
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/klog/v2/klog.go:860 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x29b3580, 0x3, 0x0, 0xc0003fcbd0, 0x1, {0x2120a0a?, 0x1?}, 0xc00058e400?, 0x0)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/klog/v2/klog.go:825 +0x686
k8s.io/klog/v2.(*loggingT).printfDepth(0x29b3580, 0x0?, 0x0, {0x0, 0x0}, 0x0?, {0x1ac4d5d, 0x44}, {0xc0005bd680, 0x1, ...})
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/klog/v2/klog.go:630 +0x1f2
k8s.io/klog/v2.(*loggingT).printf(...)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/klog/v2/klog.go:612
k8s.io/klog/v2.Fatalf(...)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/klog/v2/klog.go:1516
main.main()
	/go/src/github.com/kubernetes-csi/external-provisioner/cmd/csi-provisioner/csi-provisioner.go:509 +0x38c5

goroutine 43 [chan receive, 1 minutes]:
k8s.io/client-go/util/workqueue.(*Type).updateUnfinishedWorkLoop(0xc000317080)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/queue.go:271 +0xa7
created by k8s.io/client-go/util/workqueue.newQueue
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/queue.go:63 +0x1aa

goroutine 13 [select, 1 minutes]:
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0000d6fa0)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:77 +0xa5
created by google.golang.org/grpc.newCCBalancerWrapper
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:67 +0x245

goroutine 44 [select, 1 minutes]:
k8s.io/client-go/util/workqueue.(*delayingType).waitingLoop(0xc000317200)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:233 +0x305
created by k8s.io/client-go/util/workqueue.newDelayingQueue
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:70 +0x24f

goroutine 15 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7ff6b3840918, 0x72)
	/usr/lib/golang/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00005c480?, 0xc00047c000?, 0x0)
	/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc00005c480, {0xc00047c000, 0x8000, 0x8000})
	/usr/lib/golang/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00005c480, {0xc00047c000?, 0x197a320?, 0x1?})
	/usr/lib/golang/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0004e6140, {0xc00047c000?, 0x44f300?, 0x800010601?})
	/usr/lib/golang/src/net/net.go:183 +0x45
bufio.(*Reader).Read(0xc00011b5c0, {0xc0000f4820, 0x9, 0x18?})
	/usr/lib/golang/src/bufio/bufio.go:236 +0x1b4
io.ReadAtLeast({0x1cc97c0, 0xc00011b5c0}, {0xc0000f4820, 0x9, 0x9}, 0x9)
	/usr/lib/golang/src/io/io.go:331 +0x9a
io.ReadFull(...)
	/usr/lib/golang/src/io/io.go:350
golang.org/x/net/http2.readFrameHeader({0xc0000f4820?, 0x9?, 0x11eca345?}, {0x1cc97c0?, 0xc00011b5c0?})
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/golang.org/x/net/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc0000f47e0)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/golang.org/x/net/http2/frame.go:498 +0x95
google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0000021e0)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/internal/transport/http2_client.go:1499 +0x414
created by google.golang.org/grpc/internal/transport.newHTTP2Client
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/internal/transport/http2_client.go:364 +0x191f

goroutine 16 [select, 1 minutes]:
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0000d72c0, 0x1)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/internal/transport/controlbuf.go:407 +0x115
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc00011b800)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/internal/transport/controlbuf.go:534 +0x85
google.golang.org/grpc/internal/transport.newHTTP2Client.func3()
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/internal/transport/http2_client.go:414 +0x65
created by google.golang.org/grpc/internal/transport.newHTTP2Client
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/google.golang.org/grpc/internal/transport/http2_client.go:412 +0x1f71

goroutine 45 [chan receive, 1 minutes]:
k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop(0xc000513a80)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/apimachinery/pkg/watch/mux.go:247 +0x57
created by k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/apimachinery/pkg/watch/mux.go:89 +0x116

goroutine 46 [chan receive, 1 minutes]:
k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher.func1()
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/tools/record/event.go:304 +0x73
created by k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/tools/record/event.go:302 +0x8c

goroutine 47 [chan receive, 1 minutes]:
k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher.func1()
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/tools/record/event.go:304 +0x73
created by k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/tools/record/event.go:302 +0x8c

goroutine 84 [IO wait]:
internal/poll.runtime_pollWait(0x7ff6b3840828, 0x72)
	/usr/lib/golang/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000190700?, 0xc000ac0000?, 0x0)
	/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc000190700, {0xc000ac0000, 0x7f78, 0x7f78})
	/usr/lib/golang/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000190700, {0xc000ac0000?, 0xc0003c3f40?, 0xc000ac0005?})
	/usr/lib/golang/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc00031e0a8, {0xc000ac0000?, 0x7ff6b3d2ffff?, 0x7ff6b3b88500?})
	/usr/lib/golang/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0005c4660, {0xc000ac0000?, 0x0?, 0x0?})
	/usr/lib/golang/src/crypto/tls/conn.go:784 +0x3d
bytes.(*Buffer).ReadFrom(0xc000051af8, {0x1cc9960, 0xc0005c4660})
	/usr/lib/golang/src/bytes/buffer.go:204 +0x98
crypto/tls.(*Conn).readFromUntil(0xc000051880, {0x1ccd540?, 0xc00031e0a8}, 0x7f78?)
	/usr/lib/golang/src/crypto/tls/conn.go:806 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc000051880, 0x0)
	/usr/lib/golang/src/crypto/tls/conn.go:613 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/lib/golang/src/crypto/tls/conn.go:581
crypto/tls.(*Conn).Read(0xc000051880, {0xc000566000, 0x1000, 0x4b3000801?})
	/usr/lib/golang/src/crypto/tls/conn.go:1284 +0x16f
bufio.(*Reader).Read(0xc000374900, {0xc0006aaac0, 0x9, 0x843302?})
	/usr/lib/golang/src/bufio/bufio.go:236 +0x1b4
io.ReadAtLeast({0x1cc97c0, 0xc000374900}, {0xc0006aaac0, 0x9, 0x9}, 0x9)
	/usr/lib/golang/src/io/io.go:331 +0x9a
io.ReadFull(...)
	/usr/lib/golang/src/io/io.go:350
golang.org/x/net/http2.readFrameHeader({0xc0006aaac0?, 0x9?, 0x0?}, {0x1cc97c0?, 0xc000374900?})
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/golang.org/x/net/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc0006aaa80)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/golang.org/x/net/http2/frame.go:498 +0x95
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc00047af98)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/golang.org/x/net/http2/transport.go:2124 +0x130
golang.org/x/net/http2.(*ClientConn).readLoop(0xc000531080)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/golang.org/x/net/http2/transport.go:2020 +0x6f
created by golang.org/x/net/http2.(*Transport).newClientConn
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/golang.org/x/net/http2/transport.go:727 +0xa65

goroutine 316 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc000664ad0, 0x0)
	/usr/lib/golang/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0xc0001c0500?)
	/usr/lib/golang/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc00082cf60)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/queue.go:157 +0x9e
github.com/kubernetes-csi/external-provisioner/pkg/capacity/topology.(*nodeTopology).processNextWorkItem(0xc00067fa70, {0x1ce4dd0, 0xc00012e000})
	/go/src/github.com/kubernetes-csi/external-provisioner/pkg/capacity/topology/nodes.go:224 +0x4e
github.com/kubernetes-csi/external-provisioner/pkg/capacity/topology.(*nodeTopology).RunWorker(0x0?, {0x1ce4dd0, 0xc00012e000})
	/go/src/github.com/kubernetes-csi/external-provisioner/pkg/capacity/topology/nodes.go:208 +0xf0
created by main.main
	/go/src/github.com/kubernetes-csi/external-provisioner/cmd/csi-provisioner/csi-provisioner.go:464 +0x3405

goroutine 314 [chan receive]:
k8s.io/client-go/util/workqueue.(*Type).updateUnfinishedWorkLoop(0xc00082cf60)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/queue.go:271 +0xa7
created by k8s.io/client-go/util/workqueue.newQueue
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/queue.go:63 +0x1aa

goroutine 315 [select]:
k8s.io/client-go/util/workqueue.(*delayingType).waitingLoop(0xc00082d0e0)
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:233 +0x305
created by k8s.io/client-go/util/workqueue.newDelayingQueue
	/go/src/github.com/kubernetes-csi/external-provisioner/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:70 +0x24f

Comment 4 Shay Rozen 2022-06-25 23:40:12 UTC
quay.io/rhceph-dev/ocs-registry:4.11.0-101 doesn't install too. The logs above are from 101

Comment 6 Shay Rozen 2022-06-26 08:39:37 UTC
I think this is the main reason:
F0625 23:37:41.981066       1 csi-provisioner.go:509] unexpected error when checking for the V1 CSIStorageCapacity API: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-storage:topolvm-controller" cannot create resource "csistoragecapacities" in API group "storage.k8s.io" in the namespace "default"

Comment 7 Shay Rozen 2022-06-26 10:21:45 UTC
Last working version quay.io/rhceph-dev/ocs-registry:4.11.0-98

Comment 8 Humble Chirammal 2022-06-27 08:08:33 UTC
Considering this is an issue  (details: https://github.com/kubernetes-csi/external-provisioner/pull/710#issuecomment-1166905128 ) with Capacity controller shipped with OCP  external provisioner sidecar 
we need to  clone this bug for OCP storage for tracking it in OCP.

Comment 9 N Balachandran 2022-06-27 08:44:14 UTC
Root Cause Analysis:
--------------------

F0627 05:15:02.104473       1 csi-provisioner.go:509] unexpected error when checking for the V1 CSIStorageCapacity API: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-storage:topolvm-controller" cannot create resource "csistoragecapacities" in API group "storage.k8s.io" in the namespace "default"


The CSI external-provisioner is trying to create a CSIStorageCapacity object in the "default" namespace. This operation fails as the topolvm-controller pod does not have permissions to this namespace, causing the container to exit with a fatal error.

The Topolvm-controller has permissions only for the install namespace openshift-storage.


[nibalach@localhost lvm-operator]$ kubectl auth can-i create csistoragecapacities --namespace openshift-storage --as system:serviceaccount:openshift-storage:topolvm-controller
yes
[nibalach@localhost lvm-operator]$ kubectl auth can-i create csistoragecapacities --namespace default --as system:serviceaccount:openshift-storage:topolvm-controller
no


This is a bug introduced in the recently released external-provisioner v3.2.0. The OCP 4.11 CSI sidecar images were updated last week after which the LVMO build started to fail.


Upstream Issue: https://github.com/kubernetes-csi/external-provisioner/issues/752

Upstream Fix: https://github.com/kubernetes-csi/external-provisioner/pull/753


This will need to be backported downstream once merged.

Comment 10 N Balachandran 2022-06-27 09:53:07 UTC
Shay,

QE can use a workaround to continue testing the LVMO builds while we wait for this fix downstream.

What I propose is that we temporarily provide the topolvm-controller with permissions to the default namespace in order to prevent the crash. I have attached a yaml which needs to be applied to the cluster after installing the LVMO and before creating the LVMCluster.

Comment 14 N Balachandran 2022-06-27 11:12:34 UTC
I have cloned this BZ to OCP Storage.
https://bugzilla.redhat.com/show_bug.cgi?id=2101343

Both the upstream master PR and its backport have been merged.

https://github.com/kubernetes-csi/external-provisioner/pull/753
https://github.com/kubernetes-csi/external-provisioner/pull/754

Comment 15 Mudit Agarwal 2022-07-05 06:44:53 UTC
OCP fix is already merged and verified, moving it to ON_QA

Comment 16 N Balachandran 2022-07-05 06:58:12 UTC
The latest stable build 4.11.0-109 does not contain this fix. Please wait for a newer table build to test this.

Comment 17 Shay Rozen 2022-07-05 07:02:43 UTC
Thanks for the update Nithya.Changing the status back to POST till it can be verified.

Comment 21 errata-xmlrpc 2022-08-24 13:55:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156


Note You need to log in before you can comment on or make changes to this bug.