Bug 1882845
| Summary: | proxy CI: pod "connectivity-test" failed with status... containers with unready status... connectivity-test-container | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
| Component: | Networking | Assignee: | Victor Pickard <vpickard> |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aconstan |
| Version: | 4.6 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-02 15:56:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
W. Trevor King
2020-09-25 22:14:05 UTC
I am not able to reproduce this failure. I spun up 2 different AWS clusters on cluster-bot with the proxy option as follows: launch 4.6.0-0.nightly aws,proxy Then, I ran this specific test (4 times), and it passed each time: [vpickard@rippleRider$][~/go/src/github.com/openshift/origin] (master +$%>)$ KUBE_SSH_USER=core KUBE_SSH_KEY_PATH=~/openshift/.ssh/id_rsa.pub ./openshift-tests run-test "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]" I1021 14:54:34.476129 242803 test_context.go:429] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready Oct 21 14:54:34.514: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable Oct 21 14:54:34.560: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready Oct 21 14:54:34.709: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed) Oct 21 14:54:34.709: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready. Oct 21 14:54:34.709: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start Oct 21 14:54:34.759: INFO: e2e test version: v0.0.0-master+$Format:%h$ Oct 21 14:54:34.792: INFO: kube-apiserver version: v1.19.0+d59ce34 Oct 21 14:54:34.836: INFO: Cluster IP family: ipv4 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1429 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1429 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:59 [BeforeEach] [sig-network] Networking k8s.io/kubernetes.0/test/e2e/framework/framework.go:174 STEP: Creating a kubernetes client STEP: Building a namespace api object, basename nettest Oct 21 14:54:34.990: INFO: About to run a Kube e2e test, ensuring namespace is privileged Oct 21 14:54:35.418: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled. STEP: Waiting for a default service account to be provisioned in namespace [BeforeEach] [sig-network] Networking k8s.io/kubernetes.0/test/e2e/network/networking.go:94 STEP: Executing a successful http request from the external internet [It] should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s] k8s.io/kubernetes.0/test/e2e/network/networking.go:108 STEP: Running container which tries to connect to 8.8.8.8 Oct 21 14:54:35.794: INFO: Waiting up to 5m0s for pod "connectivity-test" in namespace "e2e-nettest-4122" to be "Succeeded or Failed" Oct 21 14:54:35.832: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 38.812047ms Oct 21 14:54:37.879: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 2.085627985s Oct 21 14:54:39.924: INFO: Pod "connectivity-test": Phase="Succeeded", Reason="", readiness=false. Elapsed: 4.130012739s STEP: Saw pod success Oct 21 14:54:39.924: INFO: Pod "connectivity-test" satisfied condition "Succeeded or Failed" [AfterEach] [sig-network] Networking k8s.io/kubernetes.0/test/e2e/framework/framework.go:175 Oct 21 14:54:39.924: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-nettest-4122" for this suite. Oct 21 14:54:40.099: INFO: Running AfterSuite actions on all nodes Oct 21 14:54:40.099: INFO: Running AfterSuite actions on node 1 Finally, I also ran the entire conformance suite, and verfied that this test passed in the suite also: started: (5/962/2431) "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]" passed: (16.6s) 2020-10-21T14:48:21 "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]" I'm going to close this as works for me. Please reopen if you see this issue again. Thanks. Re-opening this bz to re-enable the skipped test. I see the job failed again on the PR: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/12956/rehearse-12956-pull-ci-openshift-installer-master-e2e-aws-proxy/1321119347678121984 [It] should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s] k8s.io/kubernetes.0/test/e2e/network/networking.go:108 STEP: Running container which tries to connect to 8.8.8.8 Oct 27 17:31:50.431: INFO: Waiting up to 5m0s for pod "connectivity-test" in namespace "e2e-nettest-352" to be "Succeeded or Failed" Oct 27 17:31:50.516: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 84.491459ms Oct 27 17:31:52.602: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 2.170169559s Oct 27 17:31:54.687: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 4.255197172s Oct 27 17:31:56.785: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 6.353546045s Oct 27 17:31:58.870: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 8.438446377s Oct 27 17:32:00.956: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 10.524378568s Oct 27 17:32:03.043: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 12.61148537s Oct 27 17:32:05.128: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 14.696920292s Oct 27 17:32:07.214: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 16.782603616s Oct 27 17:32:09.299: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 18.867396968s Oct 27 17:32:11.384: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 20.952635126s Oct 27 17:32:13.469: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 23.037589928s Oct 27 17:32:15.554: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 25.122667677s Oct 27 17:32:17.639: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 27.207716518s Oct 27 17:32:19.724: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 29.292483653s Oct 27 17:32:21.810: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 31.378323027s Oct 27 17:32:23.919: INFO: Pod "connectivity-test": Phase="Running", Reason="", readiness=true. Elapsed: 33.487076444s Oct 27 17:32:26.005: INFO: Pod "connectivity-test": Phase="Failed", Reason="", readiness=false. Elapsed: 35.573639804s Oct 27 17:32:26.103: INFO: pod e2e-nettest-352/connectivity-test logs: nc: connect to 8.8.8.8 port 53 (tcp) timed out: Operation in progress I spun up another AWS cluster, with the proxy option, and ran the test by itself, as well as the entire suite of tests. Both passed. So, what is different in the cluster config in the CI job where the PR is failing vs. the AWS cluster with proxy option, like below? Trevor, any ideas on what the difference in the cluster config may be between the two clusters? Launch cluster =============== launch 4.7.0-0.ci aws,proxy Test passing when running entire suite ======================================= KUBE_SSH_USER=core KUBE_SSH_KEY_PATH=~/openshift/.ssh/id_rsa.pub ./openshift-tests run --junit-dir $HOME/openshift-test-runs -o test.out --include-success openshift/network/stress [It] should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s] k8s.io/kubernetes.0/test/e2e/network/networking.go:108 STEP: Running container which tries to connect to 8.8.8.8 Oct 28 10:47:42.602: INFO: Waiting up to 5m0s for pod "connectivity-test" in namespace "e2e-nettest-7906" to be "Succeeded or Failed" Oct 28 10:47:42.651: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 49.053086ms Oct 28 10:47:44.856: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 2.254526735s Oct 28 10:47:46.907: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 4.305084648s Oct 28 10:47:48.976: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 6.374075053s Oct 28 10:47:51.038: INFO: Pod "connectivity-test": Phase="Succeeded", Reason="", readiness=false. Elapsed: 8.435686744s STEP: Saw pod success Oct 28 10:47:51.038: INFO: Pod "connectivity-test" satisfied condition "Succeeded or Failed" [AfterEach] [sig-network] Networking k8s.io/kubernetes.0/test/e2e/framework/framework.go:175 Oct 28 10:47:51.038: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-nettest-7906" for this suite. Oct 28 10:47:51.311: INFO: Running AfterSuite actions on all nodes Oct 28 10:47:51.311: INFO: Running AfterSuite actions on node 1 passed: (12.9s) 2020-10-28T14:47:51 "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]" Run just the one test ===================== KUBE_SSH_USER=core KUBE_SSH_KEY_PATH=~/openshift/.ssh/id_rsa.pub ./openshift-tests run-test "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]" KUBE_SSH_USER=core KUBE_SSH_KEY_PATH=~/openshift/.ssh/id_rsa.pub ./openshift-tests run-test "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]" I1028 11:14:32.989870 1596097 test_context.go:429] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready Oct 28 11:14:33.039: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable Oct 28 11:14:33.139: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready Oct 28 11:14:33.299: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed) Oct 28 11:14:33.299: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready. Oct 28 11:14:33.299: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start Oct 28 11:14:33.356: INFO: e2e test version: v0.0.0-master+$Format:%h$ Oct 28 11:14:33.408: INFO: kube-apiserver version: v1.19.0-rc.2.1099+e67f5dcb92ff67-dirty Oct 28 11:14:33.456: INFO: Cluster IP family: ipv4 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1429 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1429 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:59 [BeforeEach] [sig-network] Networking k8s.io/kubernetes.0/test/e2e/framework/framework.go:174 STEP: Creating a kubernetes client STEP: Building a namespace api object, basename nettest Oct 28 11:14:33.625: INFO: About to run a Kube e2e test, ensuring namespace is privileged Oct 28 11:14:34.590: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled. STEP: Waiting for a default service account to be provisioned in namespace [BeforeEach] [sig-network] Networking k8s.io/kubernetes.0/test/e2e/network/networking.go:94 STEP: Executing a successful http request from the external internet [It] should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s] k8s.io/kubernetes.0/test/e2e/network/networking.go:108 STEP: Running container which tries to connect to 8.8.8.8 Oct 28 11:14:35.085: INFO: Waiting up to 5m0s for pod "connectivity-test" in namespace "e2e-nettest-4395" to be "Succeeded or Failed" Oct 28 11:14:35.132: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 46.998249ms Oct 28 11:14:37.189: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 2.104139622s Oct 28 11:14:39.248: INFO: Pod "connectivity-test": Phase="Succeeded", Reason="", readiness=false. Elapsed: 4.163717026s STEP: Saw pod success Oct 28 11:14:39.249: INFO: Pod "connectivity-test" satisfied condition "Succeeded or Failed" [AfterEach] [sig-network] Networking k8s.io/kubernetes.0/test/e2e/framework/framework.go:175 Oct 28 11:14:39.249: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-nettest-4395" for this suite. Oct 28 11:14:39.467: INFO: Running AfterSuite actions on all nodes Oct 28 11:14:39.469: INFO: Running AfterSuite actions on node 1 > launch 4.7.0-0.ci aws,proxy cluster-bot does not create clusters with the blackholed private subnets we get in the CI proxy job [1]. That means that proxy requests in cluster-bot clusters are currently "we would like you to go through this proxy, but if you go directly that will work too". The CI jobs, on the other hand, are "only way off the cluster is through the proxy. You will fail if you try to go directly". [1]: https://github.com/openshift/release/tree/92ab42c725b305355dd0c250cf59608914f95712/ci-operator/step-registry/ipi/conf/aws/blackholenetwork This test that is failing is part of the K8s E2E test network suite. This test suite is not expected to consume the proxy settings. |