Bug 1907290 - Certificat error on operator upgrade
Summary: Certificat error on operator upgrade
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.7
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Kevin Rizza
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-14 06:29 UTC by Nahshon Unna-Tsameret
Modified: 2023-09-15 01:31 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-26 16:01:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OLM operator log (807.88 KB, application/gzip)
2020-12-14 06:29 UTC, Nahshon Unna-Tsameret
no flags Details

Description Nahshon Unna-Tsameret 2020-12-14 06:29:59 UTC
Created attachment 1738865 [details]
OLM operator log

Description of problem:
HCO openshift-ci failes on AWS, After upgrading CNV, while trying to send a request to hco-webhook, with the following error:

> Error from server (InternalError): Internal error occurred: failed calling webhook "mutate-ns-hco.kubevirt.io": Post "https://hco-webhook-service.kubevirt-hyperconverged.svc:4343/mutate-ns-hco-kubevirt-io?timeout=30s": x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "Red Hat, Inc.")

For example: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/kubevirt_hyperconverged-cluster-operator/991/pull-ci-kubevirt-hyperconverged-cluster-operator-master-hco-e2e-upgrade-prev-azure/1336649438625533952
 

Here is where the test installs the "old" version:
https://github.com/kubevirt/hyperconverged-cluster-operator/blob/d1efad33eb3e6624e4ee337f593218239e86ea48/hack/upgrade-test.sh#L165-L177

And here is where the test updates the CSV version for upgrade:
https://github.com/kubevirt/hyperconverged-cluster-operator/blob/d1efad33eb3e6624e4ee337f593218239e86ea48/hack/upgrade-test.sh#L215


It was reporoduced in different scenario: using this index image: 

> quay.io/nunnatsa/hyperconverged-cluster-index:1.3.0 

1. We installed the 1.3.0 channel, then had to fix its CSV manually (another issue) to remove an annotations.description field from one of the templates, then it completed the installation.
2. uninstall 1.3.0 (originally, in order to start the upgrade scenario from 1.2.0)
3. install 1.2.0
4. trying to deploy the HyperConverged CR:
> cat <<EOF | oc} create -n kubevirt-hyperconverged -f -
> apiVersion: hco.kubevirt.io/v1beta1
> kind: HyperConverged
> metadata:
>  name: kubevirt-hyperconverged
> spec:
>   infra: {}
>   workloads: {}
> EOF

At this point, we got this error:
> Error from server (InternalError): error when creating "deploy/hco.cr.yaml": Internal error occurred: failed calling webhook "validate-hco.kubevirt.io": Post "https://hco-webhook-service.kubevirt-hyperconverged.svc:4343/validate-hco-kubevirt-io-v1beta1-hyperconverged?timeout=30s": x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "Red Hat, Inc.")


OLM operator log is attached.

Comment 9 Kevin Rizza 2021-02-08 19:43:41 UTC
Moving this BZ to medium due to the fact that it is not reproducible and does not block any active workflows.

Comment 10 Kevin Rizza 2021-02-22 18:35:54 UTC
Closing this due to the lack of a reproduction method. If the HCO operator CI starts seeing this problem again feel free to reopen and we can try to investigate again. Given the transient nature of this error, it's very possible this is already resolved due to an unrelated change.

Comment 15 Debarati Basu-Nag 2021-09-17 13:26:54 UTC
This was hit against cnv 4.8.1->4.8.2

ocp version:
===========
[cnv-qe-jenkins@infra-debug3-twzz6-executor ~]$ oc version
Client Version: 4.8.0-202109080022.p0.git.a0c12be.assembly.stream-a0c12be
Server Version: 4.8.12
Kubernetes Version: v1.21.1+d8043e1
[cnv-qe-jenkins@infra-debug3-twzz6-executor ~]$ 
============

Comment 19 Kevin Rizza 2022-01-05 19:06:47 UTC
We will prioritize this now that there is a must gather to determine if there's anything obvious we can see.

For now, moving the status back to NEW

Comment 25 Krzysztof Majcher 2022-05-26 16:01:45 UTC
It seems we have not hit the issue in our recent upgrade tests, therefore I'm closing it for now.
Once we will hit it - we'll reopen and secure the must-gather so it won't dissapear.

Comment 26 Ruth Netser 2022-06-08 16:43:32 UTC
@dbasunag If you encounter this issue again; please re-open.

Comment 27 Red Hat Bugzilla 2023-09-15 01:31:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.