Bug 1997050
Summary: | CNO panic: runtime error: invalid memory address or nil pointer dereference | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | errordeveloper <errordeveloper> |
Component: | Networking | Assignee: | Douglas Smith <dosmith> |
Networking sub component: | multus | QA Contact: | Weibin Liang <weliang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | vrutkovs |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:48:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
errordeveloper@gmail.com
2021-08-24 10:20:22 UTC
By the way, I set subcomponent to 'multus' because there was no CNO category, and the stacktrace points at multus-related code... I think adding CNO subcomponent would be quite handy. Reproducible in CI as well - https://github.com/openshift/release/pull/21293 After a quick look at recent changes, I found this: diff --git a/pkg/network/render.go b/pkg/network/render.go index 6295d816..ef1b4508 100644 --- a/pkg/network/render.go +++ b/pkg/network/render.go @@ -31,7 +31,7 @@ func Render(conf *operv1.NetworkSpec, bootstrapResult *bootstrap.BootstrapResult objs = append(objs, o...) // render MultusAdmissionController - o, err = renderMultusAdmissionController(conf, manifestDir) + o, err = renderMultusAdmissionController(conf, manifestDir, bootstrapResult.ExternalControlPlane) if err != nil { return nil, err } @@ -559,7 +559,7 @@ func renderAdditionalNetworks(conf *operv1.NetworkSpec, manifestDir string) ([]* } (see https://github.com/openshift/cluster-network-operator/commit/403c4ff06400be0879c81d310da4600b497159fb#diff-e68f94c90e1c01972187483984b3da57967f404e4e2a85a48433af6ca9d34c8cR34) The issue is that `bootstrapResult` is passed unchecked. It's possible the are other changes that relate to this, this is just the first one I've noticed that can be connected to stack trace. Looks like an easy fix: https://github.com/openshift/cluster-network-operator/pull/1188 errordeveloper I tried to use "networkType: Cilium" in cluster-network-02-config.yml when I use openshift-install to install a AWS IPI cluster, and installation failed, the installation failed log is attached. So far this is the way QE can try to verify this bug, but I do not think it is a correct way. Could you help to verify this bug? Thanks! (In reply to Weibin Liang from comment #8) > errordeveloper > > I tried to use "networkType: Cilium" in cluster-network-02-config.yml when I > use openshift-install to install a AWS IPI cluster, and installation failed, > the installation failed log is attached. You'd also have to add your own CNI provider. See https://github.com/openshift/release/blob/master/ci-operator/step-registry/cilium/conf/cilium-conf-commands.sh (In reply to Vadim Rutkovsky from comment #10) > (In reply to Weibin Liang from comment #8) > > errordeveloper > > > > I tried to use "networkType: Cilium" in cluster-network-02-config.yml when I > > use openshift-install to install a AWS IPI cluster, and installation failed, > > the installation failed log is attached. > > You'd also have to add your own CNI provider. See > https://github.com/openshift/release/blob/master/ci-operator/step-registry/ > cilium/conf/cilium-conf-commands.sh errordeveloper Thanks for your suggestion, follow above steps to add Cilium CNI plugin, but the installation still failed. The failed log(Cilium-log.txt) is attached, please help to check if I still miss anything for the installation, thanks! @ Weibin Liang, I think you can reproduce this without installing Cilium as such. Please take a look at changes in https://github.com/openshift/cluster-network-operator/commit/0d9f70fa1e1af6b18385065f645a778679d7edcc. Prior to 0d9f70fa1, `TestRenderUnknownNetwork` was calling `Render(prev, &bootstrap.BootstrapResult{}, manifestDir)`, and it passed. However, in reality `Bootstrap` returned `nil, nil` for unknown, so that meant `Render(prev, nil, manifestDir)` and it panicked. So the fix was to test what `Bootstrap` returns. If you change the code to make `Bootstrap` return `nil, nil` again and run unit tests, the unit test will panic. >I tried to use "networkType: Cilium" in cluster-network-02-config.yml when I use openshift-install to install a AWS IPI cluster, and installation failed, the installation failed log is attached. > So far this is the way QE can try to verify this bug, but I do not think it is a correct way. Cilium requires additional ports opened, so AWS IPI install would need LB changes. We use Azure on CI to work this around. Lets move this to VERIFIED as https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-cilium/1435320456222609408 is now reaching test phase again Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |