Description of problem: Installing RHACM 2.2.2 with Observability enabled. There is always a failed pod in namespace "open-cluster-management-addon-observability": ```sh $ kgp -n open-cluster-management-addon-observability NAME READY STATUS RESTARTS AGE endpoint-observability-operator-548bbb7dc4-wv9gb 1/2 CrashLoopBackOff 25 111m ``` The logs are like these: ``` ... 2021-05-03T06:30:07.918Z ERROR lease-controller failed to get lease observability-lease/local-cluster {"error": "Get \"https://c100-e.au-syd.containers.cloud.ibm.com:30700/apis/coordination.k8s.io/v1/namespaces/local-cluster/leases/observability-lease\": x509: certificate signed by unknown authority"} github.com/go-logr/zapr.(*zapLogger).Error /remote-source/deps/gomod/pkg/mod/github.com/go-logr/zapr.0/zapr.go:132 github.com/open-cluster-management/klusterlet-addon-lease-controller/controllers.CheckLeaseUpdaterClient /remote-source/app/controllers/lease_controller.go:340 github.com/open-cluster-management/klusterlet-addon-lease-controller/controllers.(*LeaseReconciler).Reconcile /remote-source/app/controllers/lease_controller.go:116 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:235 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:209 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.2/pkg/internal/controller/controller.go:188 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.0/pkg/util/wait/wait.go:90 2021-05-03T06:30:07.918Z INFO lease-controller Failed to use the current client for lease update. Requeue after 10 seconds. 2021-05-03T06:30:17.919Z INFO lease-controller processing hub-kube-config 2021-05-03T06:30:17.919Z INFO lease-controller Wait until pod endpoint-observability-operator-548bbb7dc4-wv9gb/open-cluster-management-addon-observability is ready 2021-05-03T06:30:27.920Z INFO lease-controller processing hub-kube-config 2021-05-03T06:30:27.920Z INFO lease-controller Wait until pod endpoint-observability-operator-548bbb7dc4-wv9gb/open-cluster-management-addon-observability is ready ... ``` Version-Release number of selected component (if applicable): v2.2.2 How reproducible: Steps to Reproduce: 1. Install RHACM core components 2. Install RHACM Observability: ``` oc -n $RHACM_OBSERVABILITY_NAMESPACE apply -f - <<EOF apiVersion: observability.open-cluster-management.io/v1beta1 kind: MultiClusterObservability metadata: name: observability spec: availabilityConfig: Basic # Available values are High or Basic enableDownSampling: false # The default value is false. This is not recommended as querying long-time ranges without non-downsampled data is not efficient and useful. imagePullPolicy: Always imagePullSecret: rhacm-pull-secret # The pull secret generated above observabilityAddonSpec: # The ObservabilityAddonSpec defines the global settings for all managed clusters which have observability add-on enabled enableMetrics: true # EnableMetrics indicates the observability addon push metrics to hub server interval: 60 # Interval for the observability addon push metrics to hub server retentionResolution1h: 5d # How long to retain samples of 1 hour in bucket retentionResolution5m: 2d retentionResolutionRaw: 2d storageConfigObject: # Specifies the storage to be used by Observability metricObjectStorage: name: thanos-object-storage key: thanos.yaml statefulSetSize: 10Gi # The amount of storage applied to the Observability StatefulSets, i.e. Amazon S3 store, Rule, compact and receiver. statefulSetStorageClass: $CP4MCM_BLOCK_STORAGECLASS EOF ``` 3. Check pods: ```sh $ kgp -n open-cluster-management-addon-observability NAME READY STATUS RESTARTS AGE endpoint-observability-operator-548bbb7dc4-wv9gb 1/2 CrashLoopBackOff 25 111m ``` Please note that there has no desired "local-cluster/leases/observability-lease" object the code is looking for. So that error could be a false alarm too. ```sh $ kg leases -n local-cluster NAME HOLDER AGE application-manager 138m cert-policy-controller 137m cluster-lease-local-cluster cluster-lease-local-cluster 141m iam-policy-controller 137m policy-controller 138m work-manager 138m ``` Actual results: The pod will be in `CrashLoopBackOff` forever. Expected results: All pods including above one are in running state. Additional info: While accessing RHACM (on ROKS), no obvious errors had been found yet so not sure how much impact this would cause.
If the rhacm installed in ROKS, that's a known issue(https://github.com/open-cluster-management/backlog/issues/11125), and a fix will be available in 2.2.3. btw, in another container which is running in pod endpoint-observability-operator-548bbb7dc4-wv9gb, there should be similiar error messag in its' logs
I don't have visibility on that issue #11125. Anyway, any impact will this issue cause?
impact for this issue: observability cannot work in ROKS
2.2.3 is delivered. You can upgrade to 2.2.3 to use observability in ROKS. Thanks.
Cool, and I've verified that this patch has been applied automatically and it works. Good job guys!