Bug 2130326
| Summary: | unable to run subctl benchmark latency, pods fail with ImagePullBackOff | ||
|---|---|---|---|
| Product: | Red Hat Advanced Cluster Management for Kubernetes | Reporter: | Jason Kincl <jkincl> |
| Component: | Submariner | Assignee: | Mike Kolesnik <mkolesni> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Noam Manos <nmanos> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhacm-2.6 | CC: | dfarrell, ecai, maafried, mbabushk, mkolesni, nmanos, nyechiel, skitt, tpanteli |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | rhacm-2.7 | Flags: | bot-tracker-sync:
rhacm-2.7+
nyechiel: rhacm-2.7.z+ |
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-31 21:49:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jason Kincl
2022-09-27 19:45:33 UTC
@tpanteli since you recently looked at the image overrides in the operator, could you take care of this? The problem is that subctl wants to deploy registry.redhat.io/rhacm2/nettest:v0.13.0, but the image is really registry.redhat.io/rhacm2/nettest-rhel8:v0.13.0. With ACM, the operator gets its images overrides through SubmarinerConfig, using that to populate the Submariner CR (see https://github.com/stolostron/submariner-addon/blob/main/pkg/hub/submarineragent/manifests/operator/submariner.io-submariners-cr.yaml for the template). I suspect we need to add a nettest entry in the template. We need to merge https://github.com/submariner-io/subctl/pull/316 and then we can compile `subctl` downstream with an override directive to suffix the nettes image with `-rhel8`. This should also help with other usages of `nettest` by subctl, such as diagnose. With this fix, it would also be possible to specify the image in the overrides on the `Subctl` CR, but not necessarily mandatory. QE is waiting for 0.14.0 to be downstream Was it backported to 0.13.1 for ACM 2.6.2 ? On a test run I got: https://qe-jenkins-csb-skynet.apps.ocp-c1.prod.psi.redhat.com/job/ACM-2.6.2-Submariner-0.13.1-AWS-OSP-OVN/Test-Report/ $▶ oc get all -n submariner-operator NAME READY STATUS RESTARTS AGE pod/130259ce215f8646cf7a92686a732803718658cb32c71c36fc23cff5aa5htgt 0/1 Completed 0 118m pod/3995ff715a639884baf12b984ddd3e2d0b65894d48654f55438cab15a5kr6hj 0/1 Completed 0 117m pod/query-iface-listlxnln 0/1 ErrImagePull 0 88m pod/submariner-addon-675984b497-vkv4b 1/1 Running 0 118m pod/submariner-gateway-4zprj 1/1 Running 0 114m pod/submariner-lighthouse-agent-7ccffc979d-64vhs 1/1 Running 0 116m pod/submariner-lighthouse-coredns-65d9bb8488-ptpmm 1/1 Running 0 116m pod/submariner-lighthouse-coredns-65d9bb8488-xhbn2 1/1 Running 0 116m pod/submariner-networkplugin-syncer-7d49598784-cssjl 1/1 Running 0 116m pod/submariner-operator-7b597fd5df-mw4cr 1/1 Running 0 117m pod/submariner-routeagent-44kj9 1/1 Running 0 116m pod/submariner-routeagent-5nz8l 1/1 Running 0 114m pod/submariner-routeagent-84lbv 1/1 Running 0 116m pod/submariner-routeagent-9j7gn 1/1 Running 0 116m pod/submariner-routeagent-hrp27 1/1 Running 0 116m pod/submariner-routeagent-jq8mk 1/1 Running 0 116m pod/submariner-routeagent-jwc7p 1/1 Running 0 116m pod/submariner-stable-0-13-catalog-fs8w8 1/1 Running 0 124m pod/validate-sniffer79jsh 0/1 ImagePullBackOff 0 85m query-iface-list and validate-sniffer pods failed on: Failed to pull image "registry.redhat.io/rhacm2/nettest:v0.13.1": rpc error: code = Unknown desc = (Mirrors also failed: [brew.registry.redhat.io/rh-osbs/rhacm2/nettest:v0.13.1 This was fixed for 0.14 (2.7) and was backported to 0.13, awaiting 0.13.2 release. Once 0.13.2 is available, ACM 2.6 should consume it and then you can expect it to be fixed for 2.6 Mike, also for 0.14.0 I'm getting nettest ImagePullBackOff with 0.14.0:
$ subctl benchmark latency "/mnt/skynet-data/skynet-env-1/aws-nmanos-a1/auth/kubeconfig" "/mnt/skynet-data/skynet-env-1/gcp-nmanos-c1/auth/kubeconfig" --verbose
Performing latency tests
Creating kubernetes clients
Setting new cluster ID "acm-aws-nmanos-a1", previous cluster ID was "api-aws-nmanos-a1-devcluster-openshift-com:6443"
Setting new cluster ID "acm-gcp-nmanos-c1", previous cluster ID was "api-gcp-nmanos-c1-gcp-subm-red-chesterfield-com:6443"
Creating lighthouse clients
Creating submariner clients
Creating namespace objects with basename "latency"
Generated namespace "e2e-tests-latency-vgdtx" in cluster "acm-aws-nmanos-a1" to execute the tests in
Creating namespace "e2e-tests-latency-vgdtx" in cluster "acm-gcp-nmanos-c1"
Latency test is not supported with Globalnet enabled, skipping the test...
Deleting namespace "e2e-tests-latency-vgdtx" on cluster "acm-aws-nmanos-a1"
Deleting namespace "e2e-tests-latency-vgdtx" on cluster "acm-gcp-nmanos-c1"
$ subctl benchmark throughput "/mnt/skynet-data/skynet-env-1/aws-nmanos-a1/auth/kubeconfig" "/mnt/skynet-data/skynet-env-1/gcp-nmanos-c1/auth/kubeconfig" --verbose
Performing throughput tests
Creating kubernetes clients
Setting new cluster ID "acm-aws-nmanos-a1", previous cluster ID was "api-aws-nmanos-a1-devcluster-openshift-com:6443"
Setting new cluster ID "acm-gcp-nmanos-c1", previous cluster ID was "api-gcp-nmanos-c1-gcp-subm-red-chesterfield-com:6443"
Creating lighthouse clients
Creating submariner clients
Creating namespace objects with basename "throughput"
Generated namespace "e2e-tests-throughput-vztjs" in cluster "acm-aws-nmanos-a1" to execute the tests in
Creating namespace "e2e-tests-throughput-vztjs" in cluster "acm-gcp-nmanos-c1"
Performing throughput tests from Gateway pod on cluster "acm-aws-nmanos-a1" to Gateway pod on cluster "acm-gcp-nmanos-c1"
Creating a Nettest Server Pod on "acm-gcp-nmanos-c1"
Deleting namespace "e2e-tests-throughput-vztjs" on cluster "acm-aws-nmanos-a1"
Deleting namespace "e2e-tests-throughput-vztjs" on cluster "acm-gcp-nmanos-c1"
panic: Failed to await pod ready. Pod "nettest-server-podw7nk4" is still pending: status:
{
"phase": "Pending",
"conditions": [
{
"type": "Initialized",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T19:49:48Z"
},
{
"type": "Ready",
"status": "False",
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T19:49:48Z",
"reason": "ContainersNotReady",
"message": "containers with unready status: [nettest-server-pod]"
},
{
"type": "ContainersReady",
"status": "False",
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T19:49:48Z",
"reason": "ContainersNotReady",
"message": "containers with unready status: [nettest-server-pod]"
},
{
"type": "PodScheduled",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T19:49:48Z"
}
],
"hostIP": "10.16.128.5",
"podIP": "10.218.2.5",
"podIPs": [
{
"ip": "10.218.2.5"
}
],
"startTime": "2022-11-16T19:49:48Z",
"containerStatuses": [
{
"name": "nettest-server-pod",
"state": {
"waiting": {
"reason": "ImagePullBackOff",
"message": "Back-off pulling image \"registry.redhat.io/rhacm2/nettest-rhel8:v0.14.0\""
}
},
"lastState": {},
"ready": false,
"restartCount": 0,
"image": "registry.redhat.io/rhacm2/nettest-rhel8:v0.14.0",
"imageID": "",
"started": false
}
],
"qosClass": "BestEffort"
}
Unexpected error:
<*errors.errorString | 0xc000348ba0>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
occurred
goroutine 1 [running]:
github.com/submariner-io/subctl/internal/benchmark.StartThroughputTests.func1({0xc0004bc300, 0x6c3}, {0xc000348ba0?, 0xc00006d810?, 0xc00052f150?})
/remote-source/app/internal/benchmark/throughput.go:46 +0x54
github.com/onsi/gomega/internal.(*Assertion).match(0xc00153ab00, {0x3345c80, 0x46c9800}, 0x0, {0xc000bb0780, 0x1, 0x1})
/remote-source/app/vendor/github.com/onsi/gomega/internal/assertion.go:105 +0x1f0
github.com/onsi/gomega/internal.(*Assertion).NotTo(0xc00153ab00, {0x3345c80, 0x46c9800}, {0xc000bb0780, 0x1, 0x1})
/remote-source/app/vendor/github.com/onsi/gomega/internal/assertion.go:73 +0xb2
github.com/submariner-io/shipyard/test/e2e/framework.AwaitUntil({0x2eb1c64?, 0xc00060adc0?}, 0x1a?, 0x0?)
/remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:562 +0xd4
github.com/submariner-io/shipyard/test/e2e/framework.(*NetworkPod).AwaitReady(0xc0005269b0)
/remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:137 +0xd9
github.com/submariner-io/shipyard/test/e2e/framework.(*NetworkPod).buildThroughputServerPod(0xc0005269b0)
/remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:425 +0x4d3
github.com/submariner-io/shipyard/test/e2e/framework.(*Framework).NewNetworkPod(0xc0008dd980, 0xc000993a70)
/remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:120 +0x205
github.com/submariner-io/subctl/internal/benchmark.runThroughputTest(0xc0008dd980, {0xc000010018?, 0x2f44bb2?, 0x58?, 0xc000cb7990?}, 0x1)
/remote-source/app/internal/benchmark/throughput.go:116 +0x159
github.com/submariner-io/subctl/internal/benchmark.StartThroughputTests(0x0, 0x1)
/remote-source/app/internal/benchmark/throughput.go:65 +0x23d
github.com/submariner-io/subctl/cmd/subctl.runBenchmark(0x2fef830, 0xc000c384c0, 0xc00123e5c0, 0xd0?)
/remote-source/app/cmd/subctl/benchmark.go:157 +0x32d
github.com/submariner-io/subctl/cmd/subctl.buildBenchmarkRunner.func1.1.1(0xc0012b05f0?, {0x34?, 0xc000f91440?}, {0x0?, 0x0?})
/remote-source/app/cmd/subctl/benchmark.go:92 +0x2e
github.com/submariner-io/subctl/internal/restconfig.(*Producer).RunOnSelectedContext(0xc000ef9b38, 0xc000ef9b20, {0x3354710, 0xc00029a9d0})
/remote-source/app/internal/restconfig/restconfig.go:283 +0x1ca
github.com/submariner-io/subctl/cmd/subctl.buildBenchmarkRunner.func1.1(0xc000c384c0, {0x2f?, 0xc000c6a900?}, {0x3354710, 0xc00029a9d0})
/remote-source/app/cmd/subctl/benchmark.go:90 +0xf6
github.com/submariner-io/subctl/internal/restconfig.(*Producer).RunOnSelectedContext(0xc00139fcb0, 0xc00139fc88, {0x3354710, 0xc00029a9d0})
/remote-source/app/internal/restconfig/restconfig.go:283 +0x1ca
github.com/submariner-io/subctl/cmd/subctl.buildBenchmarkRunner.func1(0x4667b60?, {0xc000725350?, 0x2, 0x3})
/remote-source/app/cmd/subctl/benchmark.go:88 +0x1af
github.com/spf13/cobra.(*Command).execute(0x4667b60, {0xc0007252c0, 0x3, 0x3})
/remote-source/app/vendor/github.com/spf13/cobra/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0x4665600)
/remote-source/app/vendor/github.com/spf13/cobra/command.go:1044 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
/remote-source/app/vendor/github.com/spf13/cobra/command.go:968
github.com/submariner-io/subctl/cmd/subctl.Execute()
/remote-source/app/cmd/subctl/root.go:49 +0x25
main.main()
/remote-source/app/cmd/main.go:20 +0x17
@Mike, can you please take a look at the last comment? The image name is correct now - registry.redhat.io/rhacm2/nettest-rhel8:v0.14.0, is it already published in the registry? I would think it's still not published.. On https://qe-jenkins-csb-skynet.apps.ocp-c1.prod.psi.redhat.com/view/ACM%202.7/job/ACM-2.7.0-Submariner-0.14.0-AWS-GCP-Globalnet/36/Test-Report/ I got the "Back-off pulling image registry.redhat.io/rhacm2/nettest-rhel8:v0.14.0" When running subctl benchmark command. However, trying to pull Nettest image directly from "registry-proxy.engineering.redhat.com/rh-osbs/rhacm2-nettest-rhel8:v0.14.0" works good: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 12s default-scheduler Successfully assigned test-submariner/netshoot-cl-a to ip-10-16-214-254.us-west-1.compute.internal by ip-10-16-170-71 Normal AddedInterface 10s multus Add eth0 [10.216.2.7/23] from openshift-sdn Normal Pulling 10s kubelet Pulling image "registry-proxy.engineering.redhat.com/rh-osbs/rhacm2-nettest-rhel8:v0.14.0" Normal Pulled 2s kubelet Successfully pulled image "registry-proxy.engineering.redhat.com/rh-osbs/rhacm2-nettest-rhel8:v0.14.0" in 8.464611409s Normal Created 1s kubelet Created container netshoot Normal Started 1s kubelet Started container netshoot Note that the subctl binary was pulled from: oc image extract "registry-proxy.engineering.redhat.com/rh-osbs/rhacm2-subctl-rhel8:v0.14.0" But (as reported on a Jira issue recently), the subctl there seems unbaked still, at least for the version and filename: 1 -rw-r----- 12599528 /mnt/skynet-data/skynet-env-1/subctl-vsubctl-darwin-amd64.tar.xz 1 -rw-r----- 12000016 /mnt/skynet-data/skynet-env-1/subctl-vsubctl-linux-amd64.tar.xz 1 -rw-r----- 12002148 /mnt/skynet-data/skynet-env-1/subctl-vsubctl-windows-amd64.exe.tar.xz Seems that we need to wait for the official image then, unless you want to test this with "registry-proxy.engineering.redhat.com/rh-osbs/rhacm2-nettest-rhel8:v0.14.0" but this wont validate the default `subctl benchmark` behavior. I'm switching back to MODIFIED as we're waiting for the image to be available. @nmanos you won’t get registry.redhat.io/rhacm2/nettest-rhel8:v0.14.0 until 2.7 goes GA with 0.14.0. I thought you mirrored those images into a QE-specific repository; isn’t that the case? @mbabushk can you reproduce this too? @skitt yes, you're right. We are mirroring the images into the cluster internal registry. @nmanos please, make sure to set the name of the nettest image as "nettest-rhel8" when you are importing the image into the cluster internal registry. I believe that's the issue. I verified the use of the nettest-rhel8 image for 0.14.0 release and it works fine. We (qe) just need to make sure we are importing the image with the right name. Thanks, I imported it to the local registry, and it now works:
oc import-image -n submariner-operator nettest-rhel8:v0.14.0 --from=brew.registry.redhat.io/rh-osbs/rhacm2-nettest-rhel8:v0.14.0 --confirm
Name: nettest-rhel8
Namespace: submariner-operator
Created: Less than a second ago
Labels: <none>
Annotations: openshift.io/image.dockerRepositoryCheck=2022-12-01T14:51:07Z
Image Repository: image-registry.openshift-image-registry.svc:5000/submariner-operator/nettest-rhel8
Image Lookup: local=false
Unique Images: 1
Tags: 1
v0.14.0
tagged from brew.registry.redhat.io/rh-osbs/rhacm2-nettest-rhel8:v0.14.0
* brew.registry.redhat.io/rh-osbs/rhacm2-nettest-rhel8@sha256:efed4fca8735e8ad1cfc02091969876bb961c46ad6fcff1b02d4a68a4c464834
Less than a second ago
Image Name: nettest-rhel8:v0.14.0
Docker Image: brew.registry.redhat.io/rh-osbs/rhacm2-nettest-rhel8@sha256:efed4fca8735e8ad1cfc02091969876bb961c46ad6fcff1b02d4a68a4c464834
Name: sha256:efed4fca8735e8ad1cfc02091969876bb961c46ad6fcff1b02d4a68a4c464834
Created: Less than a second ago
Annotations: image.openshift.io/dockerLayersOrder=ascending
Image Size: 110.6MB in 2 layers
Layers: 39.38MB sha256:725a55c4212630f1b818ee1e82c5f7a9e4ed42456f19ea13b052da427cfdff82
71.25MB sha256:f3db4525bee5d3f5de0e069d826414f256f0a3f6b2fb4c2df79c52c854bd08b5
Image Created: 22 hours ago
Author: <none>
Arch: amd64
Command: /bin/bash -l
Working Dir: /app
User: <none>
Exposes Ports: <none>
Docker Labels: architecture=x86_64
build-date=2022-11-30T17:05:39
com.github.commit=f6ef77489d735185ad7d200926bc151c11e03200
com.github.url=https://github.com/submariner-io/shipyard.git
com.redhat.component=nettest-container
com.redhat.license_terms=https://www.redhat.com/agreements
description=nettest
distribution-scope=public
io.buildah.version=1.27.1
io.k8s.description=nettest
io.k8s.display-name=nettest
io.openshift.expose-services=
io.openshift.non-scalable=true
io.openshift.tags=submariner,nettest,rhel8
io.openshift.wants=
maintainer=['multi-cluster-networking']
name=rhacm2/nettest-rhel8
release=13
summary=nettest
url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhacm2/nettest-rhel8/images/v0.14.0-13
vcs-ref=66421474326c7aa6138ee3087329b87e83462408
vcs-type=git
vendor=Red Hat, Inc.
version=v0.14.0
Environment: PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
container=oci
|