Description of problem: When install RHACM after install IBM Cloud Pak for Multicloud Manager , many RHACM pods failed to start due to lots of secrets not found. pods that having secret not found issue: ``` console-chart-6dea2-consoleui console-header grc-843a6-grcuiapi kui-web-terminal management-ingress search-prod-0aa9d-redisgraph search-prod-0aa9d-search-aggregator search-prod-0aa9d-search-api search-prod-0aa9d-search-collector topology-0f17d-topologyapi ``` the miss secrets reported by different pods. ``` console-chart-6dea2-uiapi-secrets grc-843a6-grc-secrets kui-proxy-secret management-ingress-47353-tls-secret multicloud-ca-cert search-aggregator-secrets search-prod-0aa9d-redisgraph-secrets search-prod-0aa9d-search-api-secrets topology-0f17d-topology-secrets ``` After some investigation, we found that the problem is caused by certmanger conflict. RHCAM has different definitions for these CRDs , and it will treat cp4mcm certmanger CRDs as invalid, and this causes the issue. ``` oc get certificates NAME READY SECRET AGE EXPIRATION console-chart-6dea2-ca-cert False console-chart-6dea2-uiapi-secrets 16h grc-843a6-ca-cert False grc-843a6-grc-secrets 16h kui-proxy False kui-proxy-secret 16h management-ingress-47353-cert False management-ingress-47353-tls-secret 16h search-aggregator-ca-cert False search-aggregator-secrets 16h search-prod-0aa9d-redis-ca-cert False search-prod-0aa9d-redisgraph-secrets 16h search-prod-0aa9d-search-ca-cert False search-prod-0aa9d-search-api-secrets 16h topology-0f17d-ca-cert False topology-0f17d-topology-secrets 16h ``` RHCAM requires some specific labeling. for example, the `app.kubernetes.io/managed-by` must equal `Helm`. during RHCAM install, the CRD validation failed. log in `multicluster-operators-standalone-subscription` ``` E0702 15:25:27.073228 1 helmrelease_controller.go:262] failed to install release: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "certificates.certmanager.k8s.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: key "app.kubernetes.io/managed-by" must equal "Helm": current value is "ibm-cert-manager-operator"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager-7bdaf"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "open-cluster-management"- Failed during HelmOperator Reconcile. E0702 15:25:32.086950 1 helmrelease_controller.go:262] failed to install release: rendered manifests contain a resource that already exists. Unable to continue with install: APIService "v1beta1.webhook.certmanager.k8s.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager-webhook-42b84"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "open-cluster-management"- Failed during HelmOperator Reconcile. ``` RHCAM should be able to tolerant the pre-exist certmanager CRD settings in an OCP Cluster. Version-Release number of selected component (if applicable): RHACM version: 2.0.0-SNAPSHOT-2020-07-02-00-34-59 OCP version: 4.4.5 How reproducible: If you do not have CP4MCM, you can install a cert manager from open source, and then install RHACM, it will be failed. Actual results: Expected results: RHACM should be able to work with existing cert manager in OCP cluster. Additional info:
I have moved this to the install team. Note that the RHACM product uses an old version of cert-manager which does not conflict with any of the recent cert-manager releases. You should only have this issue with additional old versions of cert-manager.
Gus Parvin, can you show me some detail for the exact version of the new cert manager I should use for test? Thanks.
Yes, Here's the latest cert-manager release: https://github.com/jetstack/cert-manager/releases/tag/v0.16.0 If that version is installed on the cluster, it will not conflict with the RHACM installed cert-manager. Thanks!
Good, thanks Gus, let me do some test to see if this works.
Gus, we are currently using cert manager 0.10.5 based on PR here https://github.com/IBM/ibm-cert-manager-operator/pull/76/files , can you help share some detail for what is the minimum version that we should upgrade? Does this work with 0.15.1 or other versions?
I would like raise it here for the two cert-managers conflict issue, it looks worse than what we originally imagine. It's not just a matter of install order, I am afraid that will block the RHACM integration scenario completely. Recently, we are seeing on some envs w/ CP4MCM and RHACM integrated, after the system runs for a while, some pods under RHACM will fail due to secrets not found, even they work before, and the root cause is the same. This will happen even if people did not touch the env. It looks Common Services will take over the control, when RHACM modules "refreshed" for some reason. kubectl describe helmrelease cert-manager-b39a6 -n open-cluster-management ... Message: failed to install release: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition “certificates.certmanager.k8s.io” in namespace “” exists and cannot be imported into the current release: invalid ownership metadata; label validation error: key “app.kubernetes.io/managed-by” must equal “Helm”: current value is “operator”; annotation validation error: missing key “meta.helm.sh/release-name”: must be set to “cert-manager-b39a6”; annotation validation error: missing key “meta.helm.sh/release-namespace”: must be set to “open-cluster-management” Also, this will break the RHACM build upgrade, where people want to upgrade RHACM from an old build to a new build, they have to uninstall CP4MCM first, then upgrade RHACM build, in order to avoid the cert-manager conflicts. This is unacceptable from customer's perspective.
(In reply to morningspace from comment #8) > I would like raise it here for the two cert-managers conflict issue, it > looks worse than what we originally imagine. It's not just a matter of > install order, I am afraid that will block the RHACM integration scenario > completely. > > Recently, we are seeing on some envs w/ CP4MCM and RHACM integrated, after > the system runs for a while, some pods under RHACM will fail due to secrets > not found, even they work before, and the root cause is the same. This will > happen even if people did not touch the env. It looks Common Services will > take over the control, when RHACM modules "refreshed" for some reason. > > kubectl describe helmrelease cert-manager-b39a6 -n open-cluster-management > ... > Message: failed to install release: rendered manifests contain a > resource that already exists. Unable to continue with install: > CustomResourceDefinition “certificates.certmanager.k8s.io” in namespace “” > exists and cannot be imported into the current release: invalid ownership > metadata; label validation error: key “app.kubernetes.io/managed-by” must > equal “Helm”: current value is “operator”; annotation validation error: > missing key “meta.helm.sh/release-name”: must be set to > “cert-manager-b39a6”; annotation validation error: missing key > “meta.helm.sh/release-namespace”: must be set to “open-cluster-management” > > Also, this will break the RHACM build upgrade, where people want to upgrade > RHACM from an old build to a new build, they have to uninstall CP4MCM first, > then upgrade RHACM build, in order to avoid the cert-manager conflicts. This > is unacceptable from customer's perspective. This has been fixed in CP4MCM
@Guang Ya Liu, Based on your previous comment. Can we close this issue? Thank you Ginny Ghezzo
@Guang Ya Liu @morningspace please see above
Closing
Yes, this can be closed. Thanks.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days