Description of problem: This failure showed up while monitoring CI signal for the 4.7 release and seems to be a new consistent failure since ~ 1/5/2021. There was one passing job since then, but 9 of 10 jobs failed the same way. job: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#canary-release-openshift-origin-installer-e2e-aws-4.7-cnv there is some discussion in this slack thread: https://coreos.slack.com/archives/C01CQA76KMX/p1611618946032100 but, to also put that info here: the build log shows this error 35 times: {"level":"error","ts":1611609679.651495,"logger":"controller_hyperconverged","msg":"Failed to update HCO Status","Request.Namespace":"kubevirt-hyperconverged","Request.Name":"kubevirt-hyperconverged","error":"Operation cannot be fulfilled on hyperconvergeds.hco.kubevirt.io \"kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/github.com/go-logr/zapr/zapr.go:132\ngithub.com/kubevirt/hyperconverged-cluster-operator/pkg/controller/hyperconverged.(*ReconcileHyperConverged).Reconcile\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/pkg/controller/hyperconverged/hyperconverged_controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"} - failure to set NetworkAddonsConfig cluster status [0], but I think it eventually succeeds and probably working as expected (described in that doc you linked) - lots of controller-runtime.healthz failures, but maybe expected? [1] - some nmstatectl.py script is barfing [2] with some json validation? ValidationError: {'name': 'vxlan0', 'type': 'ovs-port', 'state': 'down', 'ipv4': {'enabled': False}, 'ipv6': {'enabled': False}, 'lldp': {'enabled': False}} is not valid under any of the given schemas\n\nFailed validating 'oneOf' in schema['properties']['interfaces']['items']['allOf'][5]: [0] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/canary-release-openshi[…]95f4c-4z6tj_cluster-network-addons-operator.log [1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/canary-release-openshi[…]fbc6b-b2slk_hyperconverged-cluster-operator.log [2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/canary-release-openshi[…]erged_nmstate-handler-nfwbh_nmstate-handler.log Version-Release number of selected component (if applicable): 4.7 I do not see this trouble in 4.6: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#canary-release-openshift-origin-installer-e2e-aws-4.6-cnv How reproducible: 10/11 runs have failed, but one was infra related. so 9/10 runs hit this problem.
I really don't have a good handle on what component to use for this bug. I assigned it to Networking/openshift-sdn because I saw some networking related logs in the errors I found. This may need to be moved to some other component, if someone knows better.
@
Current theory from @
Current theory from @stirabos is that a regression was introduced with https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1047 and they will try to fix asap
Moving to installation based on https://bugzilla.redhat.com/show_bug.cgi?id=1920610#c4
@stirabos Does this require a release note for 2.6? It came up in my search because of the requires_release_note? flag but it looks like it's fixed. Thanks!
(In reply to Pan Ousley from comment #6) > @stirabos Does this require a release note for 2.6? It came up in > my search because of the requires_release_note? flag but it looks like it's > fixed. Thanks! Hi Pan, this was just a small regression hotfix. We don't think this needs a release note for 2.6. But thanks for checking back! :)
Thanks, Nico!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799