Description of problem: cluster-storage-operator pod is panic when upgrading OCP from 4.1.0-0.nightly-2020-07-29-210856 to 4.6.0-0.nightly-2020-08-04-193041 Version-Release number of selected component (if applicable): upgrade from original build: 4.1.0-0.nightly-2020-07-29-210856 to target_build: 4.2.0-0.nightly-2020-08-04-161322,4.3.0-0.nightly-2020-08-04-163159,4.4.0-0.nightly-2020-08-03-123644,4.5.0-0.nightly-2020-08-03-123303,4.6.0-0.nightly-2020-08-04-193041 How reproducible: Hit once in QE upgrade ci Steps to Reproduce: 1.profile: 05_UPI on Baremetal with RHCOS (FIPS off) 2. 3. Actual results: 2020-08-05T02:49:32.316531208Z E0805 02:49:32.316485 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) 2020-08-05T02:49:32.316531208Z goroutine 608 [running]: 2020-08-05T02:49:32.316531208Z k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1e69120, 0x36e6df0) 2020-08-05T02:49:32.316531208Z k8s.io/apimachinery.0-rc.2/pkg/util/runtime/runtime.go:74 +0xa3 2020-08-05T02:49:32.316531208Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) 2020-08-05T02:49:32.316531208Z k8s.io/apimachinery.0-rc.2/pkg/util/runtime/runtime.go:48 +0x82 2020-08-05T02:49:32.316531208Z panic(0x1e69120, 0x36e6df0) 2020-08-05T02:49:32.316531208Z runtime/panic.go:969 +0x166 2020-08-05T02:49:32.316531208Z github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass.newStorageClassForCluster(0xc0012bc000, 0x215e6a3, 0x7, 0xc0012bc000) 2020-08-05T02:49:32.316531208Z github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass/controller.go:163 +0x2d 2020-08-05T02:49:32.316531208Z github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass.(*Controller).syncStorageClass(0xc000619310, 0x2183a12, 0x1d) 2020-08-05T02:49:32.316531208Z github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass/controller.go:138 +0x7f 2020-08-05T02:49:32.316531208Z github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass.(*Controller).sync(0xc000619310, 0x2581320, 0xc000972b80, 0x257b2a0, 0xc00050aab0, 0x0, 0x0) 2020-08-05T02:49:32.316531208Z github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass/controller.go:87 +0x33d 2020-08-05T02:49:32.316531208Z github.com/openshift/library-go/pkg/controller/factory.(*baseController).reconcile(0xc0001747e0, 0x2581320, 0xc000972b80, 0x257b2a0, 0xc00050aab0, 0x427745, 0xc001c52660) 2020-08-05T02:49:32.316531208Z github.com/openshift/library-go.0-20200724235449-b4f9ae5f0c51/pkg/controller/factory/base_controller.go:175 +0x76 2020-08-05T02:49:32.316531208Z github.com/openshift/library-go/pkg/controller/factory.(*baseController).processNextWorkItem(0xc0001747e0, 0x2581320, 0xc000972b80) 2020-08-05T02:49:32.316531208Z github.com/openshift/library-go.0-20200724235449-b4f9ae5f0c51/pkg/controller/factory/base_controller.go:215 +0x230 2020-08-05T02:49:32.316531208Z github.com/openshift/library-go/pkg/controller/factory.(*baseController).runWorker.func1(0xc001c76710, 0x2581320, 0xc000972b80, 0xc0001747e0) 2020-08-05T02:49:32.316531208Z github.com/openshift/library-go.0-20200724235449-b4f9ae5f0c51/pkg/controller/factory/base_controller.go:166 +0x99 2020-08-05T02:49:32.316531208Z created by github.com/openshift/library-go/pkg/controller/factory.(*baseController).runWorker 2020-08-05T02:49:32.316531208Z github.com/openshift/library-go.0-20200724235449-b4f9ae5f0c51/pkg/controller/factory/base_controller.go:158 +0x8d Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
It looks like the issue here is because the Infrastructure.Status.PlatformStatus is nil. From the must-gather, here is the Infrastructure resource: apiVersion: config.openshift.io/v1 items: - apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-08-04T22:09:18Z" generation: 1 name: cluster resourceVersion: "402" selfLink: /apis/config.openshift.io/v1/infrastructures/cluster uid: 24430978-d69f-11ea-b1fc-fa163e8ef816 spec: cloudConfig: name: "" status: apiServerInternalURI: https://api-int.ugdci05054712.qe.devcluster.openshift.com:6443 apiServerURL: https://api.ugdci05054712.qe.devcluster.openshift.com:6443 etcdDiscoveryDomain: ugdci05054712.qe.devcluster.openshift.com infrastructureName: ugdci05054712-zxmkz platform: None We throw an unsupportedPlatformError on default, but don't check for a `nil` value here. I've submitted [1] which should address this issue. [1] https://github.com/openshift/cluster-storage-operator/pull/77
Verified with: 4.1.41->4.2.36->4.3.33->4.4.18->4.5.7->4.6.0-0.nightly-2020-08-26-032807
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196