Bug 1970409

Summary: CNF test container: [SNO] [dpdk] All dpdk tests failed : Couldn`t create NetworkAttachmentDefinition CR
Product: OpenShift Container Platform Reporter: elevin
Component: CNF Platform ValidationAssignee: Sebastian Scheinkman <sscheink>
Status: CLOSED CURRENTRELEASE QA Contact: Nikita <nkononov>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.8CC: aos-bugs, sscheink, yjoseph
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Release Note text: When running CNF tests in regular mode on a single node, the logic in place to understand if the cluster is ready is missing details. In particular, creating an SR-IOV network will not create a network attachment definition until at least one minute elapses. All the DPDK tests fail in cascade. Run the CNF tests in regular mode skipping the DPDK feature when running against an installation on a single node, with the `-ginkgo.skip` parameter. Run CNF tests in Discovery mode to execute tests against an installation on a single node. ------- Cause: When running cnf-tests on SNO and regular mode, the logic in place to understand if the cluster is ready is missing some details. In particular, creating a sriov network will not create a network attachment definition if not after a minute or more. Consequence: All the dpdk tests fail in cascade Workaround (if any): Run the cnf tests skipping the dpdk feature when running against SNO in regular mode, with the -ginkgo.skip parameter Running in discovery mode is the recommended way to execute tests against SNO Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-24 12:52:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description elevin 2021-06-10 12:40:21 UTC
Description of problem:
All DPDK tests failed in regular mode on SNO cluster.

2021-06-10T12:25:17.724Z	ERROR	controllers.SriovNetwork	Couldn't create NetworkAttachmentDefinition CR	{"sriovnetwork": "openshift-sriov-network-operator/test-dpdk-network", "Namespace": "dpdk-testing", "Name": "test-dpdk-network", "error": "Internal error occurred: failed calling webhook \"multus-validating-config.k8s.io\": Post \"https://multus-admission-controller.openshift-multus.svc:443/validate?timeout=30s\": x509: certificate signed by unknown authority"}


Version-Release number of selected component (if applicable):
Client Version: 4.8.0-0.nightly-2021-06-08-034312
Server Version: 4.8.0-0.nightly-2021-06-08-034312
Kubernetes Version: v1.21.0-rc.0+fec6fbc
DPDK_TESTS_IMAGE=dpdk-base:v4.8.0-8
CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.8.0-48


How reproducible:


Steps to Reproduce:
1.SNO cluster
2.podman run  --net=host -v /root/ocp/auth:/kubeconfig:Z  -e KUBECONFIG=/kubeconfig/kubeconfig -e IMAGE_REGISTRY=cnfdc8-installer:5000/rh-osbs/ -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.8.0-48 -e DPDK_TESTS_IMAGE=dpdk-base:v4.8.0-8 -e ROLE_WORKER_CNF=master -e SCTPTEST_HAS_NON_CNF_WORKERS=false registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-48 /usr/bin/test-run.sh -ginkgo.focus="dpdk"


Actual results:

• Failure in Spec Setup (BeforeEach) [1988.664 seconds]
dpdk
/remote-source/app/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  VFS allocated for dpdk
  /remote-source/app/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:151
    Validate the build [BeforeEach]
    /remote-source/app/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:173
      Should forward and receive packets from a pod running dpdk base on a image created by building config
      /remote-source/app/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:174
      Timed out after 10.001s.
      Unexpected error:
          <*errors.StatusError | 0xc0004d8960>: {
              ErrStatus: {
                  TypeMeta: {Kind: "", APIVersion: ""},
                  ListMeta: {
                      SelfLink: "",
                      ResourceVersion: "",
                      Continue: "",
                      RemainingItemCount: nil,
                  },
                  Status: "Failure",
                  Message: "network-attachment-definitions.k8s.cni.cncf.io \"test-dpdk-network\" not found",
                  Reason: "NotFound",
                  Details: {
                      Name: "test-dpdk-network",
                      Group: "k8s.cni.cncf.io",
                      Kind: "network-attachment-definitions",
                      UID: "",
                      Causes: nil,
                      RetryAfterSeconds: 0,
                  },
                  Code: 404,
              },
          }
          network-attachment-definitions.k8s.cni.cncf.io "test-dpdk-network" not found
      occurred
      /remote-source/app/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:892
Expected results:


Additional info:

Comment 1 Federico Paolinelli 2021-06-10 12:41:49 UTC
This is a bug in the test themselves and not in the feature.
The network attachement definition gets eventually created, it's an issue due to the timing and the reboots of the single node openshift.

Comment 2 Sebastian Scheinkman 2021-06-14 15:36:56 UTC
*** Bug 1970410 has been marked as a duplicate of this bug. ***

Comment 3 elevin 2021-07-08 14:36:01 UTC
Client Version: 4.8.0-0.nightly-2021-06-19-005119
Server Version: 4.8.0-0.nightly-2021-06-19-005119
Kubernetes Version: v1.21.0-rc.0+120883f
==========================================================

podman run  --net=host -v /root/ocp/auth:/kubeconfig:Z  -v ~/reports/:/report:Z -e KUBECONFIG=/kubeconfig/kubeconfig -e IMAGE_REGISTRY=cnfdd2-installer:5000/openshift-kni -e CNF_TESTS_IMAGE=cnf-tests:4.9 -e DPDK_TESTS_IMAGE=dpdk:4.9 -e ROLE_WORKER_CNF=master -e SCTPTEST_HAS_NON_CNF_WORKERS=false quay.io/openshift-kni/cnf-tests:4.9 /usr/bin/test-run.sh -ginkgo.focus="dpdk" --report=/report --junit=/report

=========================================================


SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
S [SKIPPING] [1973.991 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  VFS allocated for dpdk
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:151
    Validate the build
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:173
      Should forward and receive packets from a pod running dpdk base on a image created by building config [It]
      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:174

      skip test as we can't find a dpdk workload running with a s2i build

      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:180
------------------------------
• [SLOW TEST:56.221 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  VFS allocated for dpdk
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:151
    Validate a DPDK workload running inside a pod
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:194
      Should forward and receive packets
      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:195
------------------------------
••
------------------------------
• [SLOW TEST:5.527 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  VFS allocated for dpdk
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:151
    Validate HugePages
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:249
      should allocate the amount of hugepages requested
      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:269
------------------------------
•
------------------------------
• [SLOW TEST:135.668 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  VFS split for dpdk and netdevice
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:322
    should forward and receive packets from a pod running dpdk base
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:347
------------------------------
•
------------------------------
• [SLOW TEST:69.216 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  dpdk application on different vendors
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:371
    Test connectivity using the requested nic
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:399
      Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28
      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:430
------------------------------
S [SKIPPING] [19.061 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  dpdk application on different vendors
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:371
    Test connectivity using the requested nic
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:399
      Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking [It]
      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:431

      skip nic validate as wasn't able to find a nic with vendorID 8086 and deviceID 0d58

      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:403
------------------------------
• [SLOW TEST:47.182 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  dpdk application on different vendors
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:371
    Test connectivity using the requested nic
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:399
      Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:432
------------------------------
S [SKIPPING] [13.048 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  dpdk application on different vendors
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:371
    Test connectivity using the requested nic
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:399
      Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] [It]
      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:433

      skip nic validate as wasn't able to find a nic with vendorID 15b3 and deviceID 1017

      /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:403
------------------------------

• [SLOW TEST:69.986 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  Downward API
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:436
    Volume is readable in container
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:460
------------------------------
• [SLOW TEST:14.061 seconds]
dpdk
/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:92
  restoring configuration
  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:489
    should restore the cluster to the original status
    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/dpdk/dpdk.go:490
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
JUnit report was created: /report/cnftests-junit.xml

Ran 11 of 134 Specs in 3099.620 seconds
SUCCESS! -- 11 Passed | 0 Failed | 0 Pending | 123 Skipped

Comment 5 Carlos Goncalves 2022-08-24 12:52:49 UTC
Bulk closing of all "CNF Platform Validation" component BZs assigned to CNF Network team members and in VERIFIED status for longer than 1 month.