Version: 4.6.1 $ openshift-install version <your output here> Platform: RHV 4.4.1.10-0.1.el8ev #Please specify the platform type: aws, libvirt, openstack or baremetal etc. Please specify: IPI What happened? When the cluster was in version 4.5.16, the cluster was stable with all cluster-operators available. In an upgrade towards 4.6.1, the cluster operator "storage" is stuck on "Updating", with "Available=false" and "Progressing=true" with the following message: OVirtCSIDriverOperatorCRProgressing: Waiting for OVirt operator to report status Checking "openshift-cluster-csi-drivers", the pod "ovirt-csi-driver-operator-*" is crashing in loop. pod logs: http://pastebin.test.redhat.com/917008 deployment manifest: http://pastebin.test.redhat.com/917010 Note: it is also the same outcome when I turned off the operator pod "cluster-storage-operator" in "openshift-cluster-storage-operator" namespace, and switched the image of ovirt-csi-driver-operator to: quay.io/openshift/origin-ovirt-csi-driver-operator:latest for both of its containers. There are no connection issues between the pod and the ovirt engine (tested with curl when the pod was in debug mode). What did you expect to happen? Expect to complete the upgrade successfully and have the cluster operator "storage" in "Available" status on version 4.6.1 How to reproduce it (as minimally and precisely as possible)? Start with an OCP-over-RHV (IPI) cluster, version 4.5.16, and perform an upgrade to 4.6.1 Anything else we need to know? This is preventing the cluster to complete an upgrade towards 4.6.1
So the logs are extremely confusing, but the relevant error is: E1110 09:21:15.641660 1 starter.go:36] yaml: line 3: found character that cannot start any token Ultimately the issue is caused by the ovirt-api password starting with a reserved character, this can be resolved by editing the ovirt-credentials object and wrap the password with quotes. I created a PR to fail earlier so the logs aren't as difficult to read
It turned out that ovirt-csi-driver-node DaemonSet's pods are colliding with nmstate-handler DaemonSet's pods (part of CNV). They are both listening to port 8080 on the host level. Meaning, the issue is reproducing only on OCP-over-RHV clusters, version 4.6, with OpenShift Virtualization installed, at least from version 2.4. From what I gathered from CNV network team, this port on nmstate is used for metrics and can be disabled.
Hi, we are trying to release CNAO https://github.com/kubevirt/cluster-network-addons-operator/pull/667 but looks like we have some issues in the CI, it includes the fixes at kubernetes-nmstate to close port 8080.
(In reply to Benny Zlotnik from comment #1) > So the logs are extremely confusing, but the relevant error is: > E1110 09:21:15.641660 1 starter.go:36] yaml: line 3: found character > that cannot start any token > > Ultimately the issue is caused by the ovirt-api password starting with a > reserved character, this can be resolved by editing the ovirt-credentials > object and wrap the password with quotes. > > I created a PR to fail earlier so the logs aren't as difficult to read Thanks for making the logs more readable. However the important thing is that the ovirt_password secret must accept any printable character. It is encrypted in base64 armor exactly to allow this. After ovirt-csi-driver reads it, it should quote it according to the destination. From what you say it seems that I cannot have a password that starts with quotes, either.
due to capacity constraints we will be revisiting this bug in the upcoming sprint
Moving this Bug to https://bugzilla.redhat.com/show_bug.cgi?id=1933028