Bug 1887488
Summary: | OCP 4.6: Topology Manager OpenShift E2E test fails: gu workload attached to SRIOV networks should let resource-aligned PODs have working SRIOV network interface | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Walid A. <wabouham> |
Component: | Node | Assignee: | Francesco Romani <fromani> |
Node sub component: | Topology manager | QA Contact: | Walid A. <wabouham> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aos-bugs, ddharwar, fromani, jokerman, mifiedle, rphillips, tsweeney |
Version: | 4.6 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:25:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1903467 |
Description
Walid A.
2020-10-12 15:35:59 UTC
I looked at Walid's setup and the most likely culprits are: 1. a OCP change got unnoticed because the test don't run often enough (that's a sore point and a long due fixes) 2. surely the tests fail to pin the images they need, so maybe an image got updated and an unwanted change sneaked in 2. needs to be fixed anyway, so I will. I'm confident the issue is in the tests, which, because of an oversight, do not pin the image they need to a specific tag. A simple change in the test code should fix. This is actually on tests, but "topology-manager" seems the closest in the current sub-component list. Reassigned. Tested on same OCP 4.6 baremetal cluster from the latest master branch of origin: Tests are being skipped as I have one of the two worker nodes enabled for topology manager, that was not a requirement before AFAIK: started: (0/5/11) "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu workload attached to SRIOV networks should let resource-aligned PODs have working SRIOV network interface [Suite:openshift/conformance/serial]" skip [github.com/openshift/origin/test/extended/topology_manager/utils.go:69]: topology manager not configured on all nodes skipped: (16.2s) 2020-12-04T14:17:19 "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu workload attached to SRIOV networks should let resource-aligned PODs have working SRIOV network interface [Suite:openshift/conformance/serial]" ----- And also skipped because NETWORK_CHECK_IMAGE env var is not defined. According to origin docs: The testsuite runs a basic connectivity test to ensure the NUMA-aligned devices are functional. Use this variable to set the image URL to use to check the network is working between pods which requested, and got, aligned resources. If this value is not set (default), the connectivity test will skip. What is a good example to use ? started: (0/10/11) "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu workload attached to SRIOV networks should let resource-aligned PODs have working SRIOV network interface [Suite:openshift/conformance/serial]" skip [github.com/openshift/origin/test/extended/topology_manager/resourcealign.go:140]: no network check image provided (use NETWORK_CHECK_IMAGE) skipped: (16.4s) 2020-12-04T16:42:30 "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu workload attached to SRIOV networks should let resource-aligned PODs have working SRIOV network interface [Suite:openshift/conformance/serial]" (In reply to Walid A. from comment #9) > Tested on same OCP 4.6 baremetal cluster from the latest master branch of > origin: > > Tests are being skipped as I have one of the two worker nodes enabled for > topology manager, that was not a requirement before AFAIK: > > started: (0/5/11) "[Serial][sig-node][Feature:TopologyManager] Configured > cluster with gu workload attached to SRIOV networks should let > resource-aligned PODs have working SRIOV network interface > [Suite:openshift/conformance/serial]" > > skip > [github.com/openshift/origin/test/extended/topology_manager/utils.go:69]: > topology manager not configured on all nodes > > skipped: (16.2s) 2020-12-04T14:17:19 > "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu > workload attached to SRIOV networks should let resource-aligned PODs have > working SRIOV network interface [Suite:openshift/conformance/serial]" > > ----- > > And also skipped because NETWORK_CHECK_IMAGE env var is not defined. > According to origin docs: > > The testsuite runs a basic connectivity test to ensure the NUMA-aligned > devices are functional. Use this variable to set the image URL to use to > check the network is working between pods which requested, and got, aligned > resources. If this value is not set (default), the connectivity test will > skip. > > What is a good example to use ? > > started: (0/10/11) "[Serial][sig-node][Feature:TopologyManager] Configured > cluster with gu workload attached to SRIOV networks should let > resource-aligned PODs have working SRIOV network interface > [Suite:openshift/conformance/serial]" > > skip > [github.com/openshift/origin/test/extended/topology_manager/resourcealign.go: > 140]: no network check image provided (use NETWORK_CHECK_IMAGE) > > skipped: (16.4s) 2020-12-04T16:42:30 > "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu > workload attached to SRIOV networks should let resource-aligned PODs have > working SRIOV network interface [Suite:openshift/conformance/serial]" For the time being: quay.io/openshift-kni/cnf-tests:$OCP_VERSION (e.g. "4.7" to test OCP 4.7.z and so forth) This probably need a bit better docs somewhere. In the next weeks we are looking to have a suitable image available in OCP and a better default for this option The image could be simple as $ cat Dockerfile FROM registry.access.redhat.com/ubi8/ubi-minimal:latest RUN microdnf install -y iputils && microdnf clean all ENTRYPOINT [ "/bin/ping" ] but we need to have it automatically built (and maintained) and this will take a little bit. Verified on same OCP 4.6 baremetal cluster from the latest master branch of origin: Set env var: export NETWORK_CHECK_IMAGE=quay.io/openshift-kni/cnf-tests:4.6 . . . started: (0/10/11) "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu workload attached to SRIOV networks should let resource-aligned PODs have working SRIOV network interface [Suite:openshift/conformance/serial]" passed: (1m51s) 2020-12-04T18:39:53 "[Serial][sig-node][Feature:TopologyManager] Configured cluster with gu workload attached to SRIOV networks should let resource-aligned PODs have working SRIOV network interface [Suite:openshift/conformance/serial]" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |