Bug 1853600 - RHACM install failed if cert manager already installed in OCP
Summary: RHACM install failed if cert manager already installed in OCP
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Installer
Version: rhacm-2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Nathan Weatherly
QA Contact: Thuy Nguyen
Christopher Dawson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-03 09:55 UTC by Guang Ya Liu
Modified: 2023-09-14 06:03 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-30 13:44:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 3280 0 None None None 2020-09-30 13:44:10 UTC
Github open-cluster-management backlog issues 4107 0 None None None 2020-09-30 13:44:04 UTC

Description Guang Ya Liu 2020-07-03 09:55:19 UTC
Description of problem:

When install RHACM after install IBM Cloud Pak for Multicloud Manager , many RHACM pods failed to start due to lots of secrets not found.

pods that having secret not found issue:
```
console-chart-6dea2-consoleui
console-header
grc-843a6-grcuiapi
kui-web-terminal
management-ingress
search-prod-0aa9d-redisgraph
search-prod-0aa9d-search-aggregator
search-prod-0aa9d-search-api
search-prod-0aa9d-search-collector
topology-0f17d-topologyapi
```

the miss secrets reported by different pods.
```
console-chart-6dea2-uiapi-secrets
grc-843a6-grc-secrets
kui-proxy-secret
management-ingress-47353-tls-secret
multicloud-ca-cert
search-aggregator-secrets
search-prod-0aa9d-redisgraph-secrets
search-prod-0aa9d-search-api-secrets
topology-0f17d-topology-secrets
```

After some investigation, we found that the problem is caused by certmanger conflict. 

RHCAM has different definitions for these CRDs , and it will treat cp4mcm certmanger CRDs as invalid, and this causes the issue.

```
oc get certificates
NAME                               READY   SECRET                                 AGE   EXPIRATION
console-chart-6dea2-ca-cert        False   console-chart-6dea2-uiapi-secrets      16h
grc-843a6-ca-cert                  False   grc-843a6-grc-secrets                  16h
kui-proxy                          False   kui-proxy-secret                       16h
management-ingress-47353-cert      False   management-ingress-47353-tls-secret    16h
search-aggregator-ca-cert          False   search-aggregator-secrets              16h
search-prod-0aa9d-redis-ca-cert    False   search-prod-0aa9d-redisgraph-secrets   16h
search-prod-0aa9d-search-ca-cert   False   search-prod-0aa9d-search-api-secrets   16h
topology-0f17d-ca-cert             False   topology-0f17d-topology-secrets        16h
```

RHCAM requires some specific labeling. for example, the `app.kubernetes.io/managed-by` must equal `Helm`.

during RHCAM install, the CRD validation failed. log in `multicluster-operators-standalone-subscription`

```
E0702 15:25:27.073228       1 helmrelease_controller.go:262] failed to install release: rendered manifests contain a resource that already exists. 
Unable to continue with install: CustomResourceDefinition "certificates.certmanager.k8s.io" in namespace "" exists and cannot be imported into the current release: 
invalid ownership metadata; label validation error: key "app.kubernetes.io/managed-by" must equal "Helm": current value is "ibm-cert-manager-operator"; 
annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager-7bdaf"; 
annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "open-cluster-management"- Failed during HelmOperator Reconcile.

E0702 15:25:32.086950       1 helmrelease_controller.go:262] failed to install release: rendered manifests contain a resource that already exists. 
Unable to continue with install: APIService "v1beta1.webhook.certmanager.k8s.io" in namespace "" exists and cannot be imported into the current release: 
invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; 
annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager-webhook-42b84"; 
annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "open-cluster-management"- Failed during HelmOperator Reconcile.
```

RHCAM should be able to tolerant the pre-exist certmanager CRD settings in an OCP Cluster.

Version-Release number of selected component (if applicable):

RHACM version: 2.0.0-SNAPSHOT-2020-07-02-00-34-59
OCP version: 4.4.5

How reproducible:

If you do not have CP4MCM, you can install a cert manager from open source, and then install RHACM, it will be failed.

Actual results:


Expected results:

RHACM should be able to work with existing cert manager in OCP cluster.


Additional info:

Comment 2 Gus Parvin 2020-07-08 14:29:02 UTC
I have moved this to the install team.  Note that the RHACM product uses an old version of cert-manager which does not conflict with any of the recent cert-manager releases.  You should only have this issue with additional old versions of cert-manager.

Comment 4 Guang Ya Liu 2020-07-27 13:50:29 UTC
Gus Parvin, can you show me some detail for the exact version of the new cert manager I should use for test? Thanks.

Comment 5 Gus Parvin 2020-07-27 18:24:41 UTC
Yes, Here's the latest cert-manager release: https://github.com/jetstack/cert-manager/releases/tag/v0.16.0
If that version is installed on the cluster, it will not conflict with the RHACM installed cert-manager.
Thanks!

Comment 6 Guang Ya Liu 2020-07-29 13:11:35 UTC
Good, thanks Gus, let me do some test to see if this works.

Comment 7 Guang Ya Liu 2020-07-29 13:23:24 UTC
Gus, we are currently using cert manager 0.10.5 based on PR here https://github.com/IBM/ibm-cert-manager-operator/pull/76/files , can you help share some detail for what is the minimum version that we should upgrade? Does this work with 0.15.1 or other versions?

Comment 8 morningspace 2020-08-04 03:04:53 UTC
I would like raise it here for the two cert-managers conflict issue, it looks worse than what we originally imagine. It's not just a matter of install order, I am afraid that will block the RHACM integration scenario completely.

Recently, we are seeing on some envs w/ CP4MCM and RHACM integrated, after the system runs for a while, some pods under RHACM will fail due to secrets not found, even they work before, and the root cause is the same. This will happen even if people did not touch the env. It looks Common Services will take over the control, when RHACM modules "refreshed" for some reason.

kubectl describe helmrelease cert-manager-b39a6 -n open-cluster-management
...
Message:        failed to install release: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition “certificates.certmanager.k8s.io” in namespace “” exists and cannot be imported into the current release: invalid ownership metadata; label validation error: key “app.kubernetes.io/managed-by” must equal “Helm”: current value is “operator”; annotation validation error: missing key “meta.helm.sh/release-name”: must be set to “cert-manager-b39a6”; annotation validation error: missing key “meta.helm.sh/release-namespace”: must be set to “open-cluster-management”

Also, this will break the RHACM build upgrade, where people want to upgrade RHACM from an old build to a new build, they have to uninstall CP4MCM first, then upgrade RHACM build, in order to avoid the cert-manager conflicts. This is unacceptable from customer's perspective.

Comment 9 Guang Ya Liu 2020-08-11 06:34:43 UTC
(In reply to morningspace from comment #8)
> I would like raise it here for the two cert-managers conflict issue, it
> looks worse than what we originally imagine. It's not just a matter of
> install order, I am afraid that will block the RHACM integration scenario
> completely.
> 
> Recently, we are seeing on some envs w/ CP4MCM and RHACM integrated, after
> the system runs for a while, some pods under RHACM will fail due to secrets
> not found, even they work before, and the root cause is the same. This will
> happen even if people did not touch the env. It looks Common Services will
> take over the control, when RHACM modules "refreshed" for some reason.
> 
> kubectl describe helmrelease cert-manager-b39a6 -n open-cluster-management
> ...
> Message:        failed to install release: rendered manifests contain a
> resource that already exists. Unable to continue with install:
> CustomResourceDefinition “certificates.certmanager.k8s.io” in namespace “”
> exists and cannot be imported into the current release: invalid ownership
> metadata; label validation error: key “app.kubernetes.io/managed-by” must
> equal “Helm”: current value is “operator”; annotation validation error:
> missing key “meta.helm.sh/release-name”: must be set to
> “cert-manager-b39a6”; annotation validation error: missing key
> “meta.helm.sh/release-namespace”: must be set to “open-cluster-management”
> 
> Also, this will break the RHACM build upgrade, where people want to upgrade
> RHACM from an old build to a new build, they have to uninstall CP4MCM first,
> then upgrade RHACM build, in order to avoid the cert-manager conflicts. This
> is unacceptable from customer's perspective.


This has been fixed in CP4MCM

Comment 10 Ginny Ghezzo 2020-09-22 02:18:21 UTC
@Guang Ya Liu, 
Based on your previous comment. Can we close this issue? 
Thank you
Ginny Ghezzo

Comment 11 Nathan Weatherly 2020-09-23 14:17:28 UTC
@Guang Ya Liu @morningspace please see above

Comment 12 Nathan Weatherly 2020-09-30 13:44:27 UTC
Closing

Comment 13 morningspace 2020-10-22 15:42:00 UTC
Yes, this can be closed. Thanks.

Comment 14 Red Hat Bugzilla 2023-09-14 06:03:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.