Bug 1956169 - x509: certificate signed by unknown authority while trying to retrieve local-cluster/leases/observability-lease
Summary: x509: certificate signed by unknown authority while trying to retrieve local-...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Core Services / Observability
Version: rhacm-2.2.z
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Chunlin Yang
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 06:58 UTC by Bright Zheng
Modified: 2021-05-08 10:09 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
ming: rhacm-2.2.z+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 12149 0 None None None 2021-05-03 13:21:10 UTC

Description Bright Zheng 2021-05-03 06:58:03 UTC
Description of problem:

Installing RHACM 2.2.2 with Observability enabled.

There is always a failed pod in namespace "open-cluster-management-addon-observability":

```sh
$ kgp -n open-cluster-management-addon-observability
NAME                                               READY   STATUS             RESTARTS   AGE
endpoint-observability-operator-548bbb7dc4-wv9gb   1/2     CrashLoopBackOff   25         111m
```

The logs are like these:

```
...
2021-05-03T06:30:07.918Z	ERROR	lease-controller	failed to get lease observability-lease/local-cluster	{"error": "Get \"https://c100-e.au-syd.containers.cloud.ibm.com:30700/apis/coordination.k8s.io/v1/namespaces/local-cluster/leases/observability-lease\": x509: certificate signed by unknown authority"}
github.com/go-logr/zapr.(*zapLogger).Error
	/remote-source/deps/gomod/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
github.com/open-cluster-management/klusterlet-addon-lease-controller/controllers.CheckLeaseUpdaterClient
	/remote-source/app/controllers/lease_controller.go:340
github.com/open-cluster-management/klusterlet-addon-lease-controller/controllers.(*LeaseReconciler).Reconcile
	/remote-source/app/controllers/lease_controller.go:116
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.0/pkg/util/wait/wait.go:90
2021-05-03T06:30:07.918Z	INFO	lease-controller	Failed to use the current client for lease update. Requeue after 10 seconds.
2021-05-03T06:30:17.919Z	INFO	lease-controller	processing hub-kube-config
2021-05-03T06:30:17.919Z	INFO	lease-controller	Wait until pod endpoint-observability-operator-548bbb7dc4-wv9gb/open-cluster-management-addon-observability is ready
2021-05-03T06:30:27.920Z	INFO	lease-controller	processing hub-kube-config
2021-05-03T06:30:27.920Z	INFO	lease-controller	Wait until pod endpoint-observability-operator-548bbb7dc4-wv9gb/open-cluster-management-addon-observability is ready
...
```

Version-Release number of selected component (if applicable):

v2.2.2


How reproducible:


Steps to Reproduce:

1. Install RHACM core components

2. Install RHACM Observability:

```
oc -n $RHACM_OBSERVABILITY_NAMESPACE apply -f - <<EOF
apiVersion: observability.open-cluster-management.io/v1beta1
kind: MultiClusterObservability
metadata:
  name: observability
spec:
  availabilityConfig: Basic             # Available values are High or Basic
  enableDownSampling: false             # The default value is false. This is not recommended as querying long-time ranges without non-downsampled data is not efficient and useful.
  imagePullPolicy: Always
  imagePullSecret: rhacm-pull-secret    # The pull secret generated above
  observabilityAddonSpec:               # The ObservabilityAddonSpec defines the global settings for all managed clusters which have observability add-on enabled
    enableMetrics: true                 # EnableMetrics indicates the observability addon push metrics to hub server
    interval: 60                        # Interval for the observability addon push metrics to hub server
  retentionResolution1h: 5d             # How long to retain samples of 1 hour in bucket
  retentionResolution5m: 2d
  retentionResolutionRaw: 2d
  storageConfigObject:                  # Specifies the storage to be used by Observability
    metricObjectStorage:
      name: thanos-object-storage
      key: thanos.yaml
    statefulSetSize: 10Gi               # The amount of storage applied to the Observability StatefulSets, i.e. Amazon S3 store, Rule, compact and receiver.
    statefulSetStorageClass: $CP4MCM_BLOCK_STORAGECLASS
EOF
```

3. Check pods:

```sh
$ kgp -n open-cluster-management-addon-observability
NAME                                               READY   STATUS             RESTARTS   AGE
endpoint-observability-operator-548bbb7dc4-wv9gb   1/2     CrashLoopBackOff   25         111m
```

Please note that there has no desired "local-cluster/leases/observability-lease" object the code is looking for.
So that error could be a false alarm too.

```sh
$ kg leases -n local-cluster
NAME                          HOLDER                        AGE
application-manager                                         138m
cert-policy-controller                                      137m
cluster-lease-local-cluster   cluster-lease-local-cluster   141m
iam-policy-controller                                       137m
policy-controller                                           138m
work-manager                                                138m
```

Actual results:

The pod will be in `CrashLoopBackOff` forever.

Expected results:

All pods including above one are in running state.

Additional info:

While accessing RHACM (on ROKS), no obvious errors had been found yet so not sure how much impact this would cause.

Comment 1 llan 2021-05-03 14:46:33 UTC
If the rhacm installed in ROKS, that's a known issue(https://github.com/open-cluster-management/backlog/issues/11125), and a fix will be available in 2.2.3.
btw, in another container which is running in pod endpoint-observability-operator-548bbb7dc4-wv9gb, there should be similiar error messag in its' logs

Comment 2 Bright Zheng 2021-05-04 08:59:21 UTC
I don't have visibility on that issue #11125.

Anyway, any impact will this issue cause?

Comment 3 llan 2021-05-04 17:30:06 UTC
impact for this issue: observability cannot work in ROKS

Comment 4 Chunlin Yang 2021-05-08 03:26:56 UTC
2.2.3 is delivered. You can upgrade to 2.2.3 to use observability in ROKS. Thanks.

Comment 5 Bright Zheng 2021-05-08 10:09:48 UTC
Cool, and I've verified that this patch has been applied automatically and it works.
Good job guys!


Note You need to log in before you can comment on or make changes to this bug.