Bug 1920610 - e2e-aws-4.7-cnv consistently failing on Hyperconverged Cluster Operator
Summary: e2e-aws-4.7-cnv consistently failing on Hyperconverged Cluster Operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2.6.0
Assignee: Nico Schieder
QA Contact: Inbar Rose
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-26 17:54 UTC by jamo luhrsen
Modified: 2021-03-10 11:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-10 11:23:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift release pull 15315 0 None closed 4.7 hotfix for failing HCO tests 2021-02-18 19:07:14 UTC
Red Hat Product Errata RHSA-2021:0799 0 None None None 2021-03-10 11:24:48 UTC

Description jamo luhrsen 2021-01-26 17:54:40 UTC
Description of problem:

This failure showed up while monitoring CI signal for the 4.7 release and seems to be a new
consistent failure since ~ 1/5/2021. There was one passing job since then, but 9 of 10
jobs failed the same way.

job:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#canary-release-openshift-origin-installer-e2e-aws-4.7-cnv

there is some discussion in this slack thread:
https://coreos.slack.com/archives/C01CQA76KMX/p1611618946032100

but, to also put that info here:


the build log shows this error 35 times:

{"level":"error","ts":1611609679.651495,"logger":"controller_hyperconverged","msg":"Failed to update HCO Status","Request.Namespace":"kubevirt-hyperconverged","Request.Name":"kubevirt-hyperconverged","error":"Operation cannot be fulfilled on hyperconvergeds.hco.kubevirt.io \"kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/github.com/go-logr/zapr/zapr.go:132\ngithub.com/kubevirt/hyperconverged-cluster-operator/pkg/controller/hyperconverged.(*ReconcileHyperConverged).Reconcile\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/pkg/controller/hyperconverged/hyperconverged_controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"}


-  failure to set NetworkAddonsConfig cluster status [0], but I think it eventually succeeds and probably working as expected (described in that doc you linked)

-  lots of controller-runtime.healthz failures, but maybe expected? [1]

-  some nmstatectl.py script is barfing [2] with some json validation? ValidationError: {'name': 'vxlan0', 'type': 'ovs-port', 'state': 'down', 'ipv4': {'enabled': False}, 'ipv6': {'enabled': False}, 'lldp': {'enabled': False}} is not valid under any of the given schemas\n\nFailed validating 'oneOf' in schema['properties']['interfaces']['items']['allOf'][5]:

[0] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/canary-release-openshi[…]95f4c-4z6tj_cluster-network-addons-operator.log
[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/canary-release-openshi[…]fbc6b-b2slk_hyperconverged-cluster-operator.log
[2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/canary-release-openshi[…]erged_nmstate-handler-nfwbh_nmstate-handler.log


Version-Release number of selected component (if applicable):

4.7

I do not see this trouble in 4.6:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#canary-release-openshift-origin-installer-e2e-aws-4.6-cnv


How reproducible:

10/11 runs have failed, but one was infra related. so 9/10 runs hit this problem.

Comment 1 jamo luhrsen 2021-01-26 17:55:55 UTC
I really don't have a good handle on what component to use for this bug. I assigned it to Networking/openshift-sdn because I saw some networking related logs
in the errors I found. This may need to be moved to some other component, if someone knows better.

Comment 2 jamo luhrsen 2021-01-26 23:40:40 UTC
@

Comment 3 jamo luhrsen 2021-01-26 23:43:39 UTC
Current theory from @

Comment 4 jamo luhrsen 2021-01-26 23:44:45 UTC
Current theory from @stirabos is that a regression was introduced with https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1047 and they will try to fix asap

Comment 5 Petr Horáček 2021-02-01 16:13:56 UTC
Moving to installation based on https://bugzilla.redhat.com/show_bug.cgi?id=1920610#c4

Comment 6 Pan Ousley 2021-02-12 16:32:50 UTC
@stirabos Does this require a release note for 2.6? It came up in my search because of the requires_release_note? flag but it looks like it's fixed. Thanks!

Comment 7 Nico Schieder 2021-02-15 09:56:06 UTC
(In reply to Pan Ousley from comment #6)
> @stirabos Does this require a release note for 2.6? It came up in
> my search because of the requires_release_note? flag but it looks like it's
> fixed. Thanks!

Hi Pan,
this was just a small regression hotfix.
We don't think this needs a release note for 2.6.

But thanks for checking back! :)

Comment 8 Pan Ousley 2021-02-18 19:07:48 UTC
Thanks, Nico!

Comment 11 errata-xmlrpc 2021-03-10 11:23:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799


Note You need to log in before you can comment on or make changes to this bug.