Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1882850

Summary: Proxy CI: event for webserver-...: Readiness probe failed.. dial tcp 10.0.73.42:9000: connect: connection refused
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: NetworkingAssignee: Victor Pickard <vpickard>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED WORKSFORME Docs Contact:
Severity: medium    
Priority: medium CC: aconstan
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-21 18:57:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2020-09-25 22:24:12 UTC
Test:
[sig-network] Internal connectivity for TCP and UDP on ports 9000-9999 is allowed [Suite:openshift/conformance/parallel]

Is failing frequently in proxy CI, and occasionally in some other flavors:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=Internal+connectivity+for+TCP+and+UDP+on+ports+9000-9999+is+allowed&maxAge=24h' | grep 'failures match' | sortendurance-e2e-aws-4.5 - 1 runs, 100% failed, 100% of failures match
periodic-ci-openshift-release-master-ocp-4.5-e2e-aws-proxy - 1 runs, 100% failed, 100% of failures match
periodic-ci-openshift-release-master-ocp-4.6-e2e-aws-proxy - 11 runs, 100% failed, 82% of failures match
pull-ci-openshift-cluster-api-provider-gcp-master-e2e-gcp - 4 runs, 100% failed, 25% of failures match
...
pull-ci-openshift-sdn-release-4.5-e2e-gcp - 4 runs, 100% failed, 25% of failures match
release-openshift-ocp-installer-e2e-gcp-ovn-4.6 - 11 runs, 27% failed, 33% of failures match
release-openshift-ocp-installer-e2e-gcp-rt-4.6 - 10 runs, 60% failed, 33% of failures match
release-openshift-ocp-installer-e2e-openstack-4.4 - 5 runs, 80% failed, 25% of failures match
release-openshift-ocp-installer-e2e-openstack-4.6 - 12 runs, 42% failed, 20% of failures match

A recent proxy failure was [1], which had the not-very-useful failure message:

fail [github.com/openshift/origin/test/extended/networking/internal_ports.go:113]: Unexpected error:
    <*errors.errorString | 0xc0003528a0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

stdout for the test-case was more useful:

Sep 25 18:23:12.952: INFO: At 2020-09-25 18:18:46 +0000 UTC - event for webserver-sxfzx: {kubelet ip-10-0-73-42.us-west-1.compute.internal} Unhealthy: Readiness probe failed: Get "http://10.0.73.42:9000/": dial tcp 10.0.73.42:9000: connect: connection refused

I dunno why 9000 connection is being rejected.  Perhaps the pod should have the HTTP(S)_PROXY environment variables set and trusted CAs mounted, like bug 1882486?

Once we get this bug sorted, we should drop this skip [2].

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.6-e2e-aws-proxy/1309544921245421568
[2]: https://github.com/openshift/release/blob/4c9dd40104656afb73e609e3c3d39c0c86bc57b4/ci-operator/step-registry/openshift/e2e/aws/proxy/openshift-e2e-aws-proxy-workflow.yaml#L14

Comment 1 Victor Pickard 2020-10-21 18:57:43 UTC
I am not able to reproduce this failure. I spun up 2 different AWS clusters on cluster-bot with the proxy option as follows:

launch 4.6.0-0.nightly aws,proxy

Then, I ran this specific test (4 times), and it passed each time:

[vpickard@rippleRider$][~/go/src/github.com/openshift/origin] (master +$%>)$ KUBE_SSH_USER=core KUBE_SSH_KEY_PATH=~/openshift/.ssh/id_rsa.pub ./openshift-tests run-test "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]"
I1021 14:54:34.476129  242803 test_context.go:429] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
Oct 21 14:54:34.514: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Oct 21 14:54:34.560: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Oct 21 14:54:34.709: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Oct 21 14:54:34.709: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Oct 21 14:54:34.709: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Oct 21 14:54:34.759: INFO: e2e test version: v0.0.0-master+$Format:%h$
Oct 21 14:54:34.792: INFO: kube-apiserver version: v1.19.0+d59ce34
Oct 21 14:54:34.836: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1429
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1429
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:59
[BeforeEach] [sig-network] Networking
  k8s.io/kubernetes.0/test/e2e/framework/framework.go:174
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename nettest
Oct 21 14:54:34.990: INFO: About to run a Kube e2e test, ensuring namespace is privileged
Oct 21 14:54:35.418: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-network] Networking
  k8s.io/kubernetes.0/test/e2e/network/networking.go:94
STEP: Executing a successful http request from the external internet
[It] should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]
  k8s.io/kubernetes.0/test/e2e/network/networking.go:108
STEP: Running container which tries to connect to 8.8.8.8
Oct 21 14:54:35.794: INFO: Waiting up to 5m0s for pod "connectivity-test" in namespace "e2e-nettest-4122" to be "Succeeded or Failed"
Oct 21 14:54:35.832: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 38.812047ms
Oct 21 14:54:37.879: INFO: Pod "connectivity-test": Phase="Pending", Reason="", readiness=false. Elapsed: 2.085627985s
Oct 21 14:54:39.924: INFO: Pod "connectivity-test": Phase="Succeeded", Reason="", readiness=false. Elapsed: 4.130012739s
STEP: Saw pod success
Oct 21 14:54:39.924: INFO: Pod "connectivity-test" satisfied condition "Succeeded or Failed"
[AfterEach] [sig-network] Networking
  k8s.io/kubernetes.0/test/e2e/framework/framework.go:175
Oct 21 14:54:39.924: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-nettest-4122" for this suite.
Oct 21 14:54:40.099: INFO: Running AfterSuite actions on all nodes
Oct 21 14:54:40.099: INFO: Running AfterSuite actions on node 1


Finally, I also ran the entire conformance suite, and verfied that this test passed in the suite also:

started: (5/962/2431) "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]"
passed: (16.6s) 2020-10-21T14:48:21 "[sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure] [Suite:openshift/conformance/parallel] [Suite:k8s]"


I'm going to close this as works for me. Please reopen if you see this issue again. Thanks.

Comment 2 Victor Pickard 2020-10-21 19:01:48 UTC
(In reply to Victor Pickard from comment #1)
> I am not able to reproduce this failure. I spun up 2 different AWS clusters
> on cluster-bot with the proxy option as follows:
> 
> launch 4.6.0-0.nightly aws,proxy
> 
> Then, I ran this specific test (4 times), and it passed each time:
> 
> [vpickard@rippleRider$][~/go/src/github.com/openshift/origin] (master +$%>)$
> KUBE_SSH_USER=core KUBE_SSH_KEY_PATH=~/openshift/.ssh/id_rsa.pub
> ./openshift-tests run-test "[sig-network] Networking should provide Internet
> connection for containers [Feature:Networking-IPv4] [Skipped:azure]
> [Suite:openshift/conformance/parallel] [Suite:k8s]"
> I1021 14:54:34.476129  242803 test_context.go:429] Tolerating taints
> "node-role.kubernetes.io/master" when considering if nodes are ready
> Oct 21 14:54:34.514: INFO: Waiting up to 30m0s for all (but 100) nodes to be
> schedulable
> Oct 21 14:54:34.560: INFO: Waiting up to 10m0s for all pods (need at least
> 0) in namespace 'kube-system' to be running and ready
> Oct 21 14:54:34.709: INFO: 0 / 0 pods in namespace 'kube-system' are running
> and ready (0 seconds elapsed)
> Oct 21 14:54:34.709: INFO: expected 0 pod replicas in namespace
> 'kube-system', 0 are Running and Ready.
> Oct 21 14:54:34.709: INFO: Waiting up to 5m0s for all daemonsets in
> namespace 'kube-system' to start
> Oct 21 14:54:34.759: INFO: e2e test version: v0.0.0-master+$Format:%h$
> Oct 21 14:54:34.792: INFO: kube-apiserver version: v1.19.0+d59ce34
> Oct 21 14:54:34.836: INFO: Cluster IP family: ipv4
> [BeforeEach] [Top Level]
>   github.com/openshift/origin/test/extended/util/framework.go:1429
> [BeforeEach] [Top Level]
>   github.com/openshift/origin/test/extended/util/framework.go:1429
> [BeforeEach] [Top Level]
>   github.com/openshift/origin/test/extended/util/test.go:59
> [BeforeEach] [sig-network] Networking
>   k8s.io/kubernetes.0/test/e2e/framework/framework.go:174
> STEP: Creating a kubernetes client
> STEP: Building a namespace api object, basename nettest
> Oct 21 14:54:34.990: INFO: About to run a Kube e2e test, ensuring namespace
> is privileged
> Oct 21 14:54:35.418: INFO: No PodSecurityPolicies found; assuming
> PodSecurityPolicy is disabled.
> STEP: Waiting for a default service account to be provisioned in namespace
> [BeforeEach] [sig-network] Networking
>   k8s.io/kubernetes.0/test/e2e/network/networking.go:94
> STEP: Executing a successful http request from the external internet
> [It] should provide Internet connection for containers
> [Feature:Networking-IPv4] [Skipped:azure]
> [Suite:openshift/conformance/parallel] [Suite:k8s]
>   k8s.io/kubernetes.0/test/e2e/network/networking.go:108
> STEP: Running container which tries to connect to 8.8.8.8
> Oct 21 14:54:35.794: INFO: Waiting up to 5m0s for pod "connectivity-test" in
> namespace "e2e-nettest-4122" to be "Succeeded or Failed"
> Oct 21 14:54:35.832: INFO: Pod "connectivity-test": Phase="Pending",
> Reason="", readiness=false. Elapsed: 38.812047ms
> Oct 21 14:54:37.879: INFO: Pod "connectivity-test": Phase="Pending",
> Reason="", readiness=false. Elapsed: 2.085627985s
> Oct 21 14:54:39.924: INFO: Pod "connectivity-test": Phase="Succeeded",
> Reason="", readiness=false. Elapsed: 4.130012739s
> STEP: Saw pod success
> Oct 21 14:54:39.924: INFO: Pod "connectivity-test" satisfied condition
> "Succeeded or Failed"
> [AfterEach] [sig-network] Networking
>   k8s.io/kubernetes.0/test/e2e/framework/framework.go:175
> Oct 21 14:54:39.924: INFO: Waiting up to 7m0s for all (but 100) nodes to be
> ready
> STEP: Destroying namespace "e2e-nettest-4122" for this suite.
> Oct 21 14:54:40.099: INFO: Running AfterSuite actions on all nodes
> Oct 21 14:54:40.099: INFO: Running AfterSuite actions on node 1
> 
> 
> Finally, I also ran the entire conformance suite, and verfied that this test
> passed in the suite also:
> 
> started: (5/962/2431) "[sig-network] Networking should provide Internet
> connection for containers [Feature:Networking-IPv4] [Skipped:azure]
> [Suite:openshift/conformance/parallel] [Suite:k8s]"
> passed: (16.6s) 2020-10-21T14:48:21 "[sig-network] Networking should provide
> Internet connection for containers [Feature:Networking-IPv4] [Skipped:azure]
> [Suite:openshift/conformance/parallel] [Suite:k8s]"
> 
> 
> I'm going to close this as works for me. Please reopen if you see this issue
> again. Thanks.


Sorry,wrong test output for this bz. BZ 1882845 is similiar, let me update this with the correct test.

Comment 3 Victor Pickard 2020-10-21 19:05:10 UTC
I am not able to reproduce this failure. I spun up 2 different AWS clusters on cluster-bot with the proxy option as follows:

launch 4.6.0-0.nightly aws,proxy

Then, I ran this specific test (4 times), and it passed each time:

[vpickard@rippleRider$][~/go/src/github.com/openshift/origin] (master +$%>)$ KUBE_SSH_USER=core KUBE_SSH_KEY_PATH=~/openshift/.ssh/id_rsa.pub ./openshift-tests run-test "[sig-network] Internal connectivity for TCP and UDP on ports 9000-9999 is allowed [Suite:openshift/conformance/parallel]"
I1021 15:02:07.454222  243380 test_context.go:429] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
Oct 21 15:02:07.508: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Oct 21 15:02:07.561: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Oct 21 15:02:07.719: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Oct 21 15:02:07.719: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Oct 21 15:02:07.719: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Oct 21 15:02:07.771: INFO: e2e test version: v0.0.0-master+$Format:%h$
Oct 21 15:02:07.803: INFO: kube-apiserver version: v1.19.0+d59ce34
Oct 21 15:02:07.846: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1429
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1429
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:59
[BeforeEach] [sig-network] Internal connectivity
  k8s.io/kubernetes.0/test/e2e/framework/framework.go:174
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename k8s-nettest
Oct 21 15:02:07.957: INFO: About to run a Kube e2e test, ensuring namespace is privileged
Oct 21 15:02:08.365: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[It] for TCP and UDP on ports 9000-9999 is allowed [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/networking/internal_ports.go:38
Oct 21 15:02:08.518: INFO: waiting for daemonset: v1.DaemonSetStatus{CurrentNumberScheduled:0, NumberMisscheduled:0, DesiredNumberScheduled:0, NumberReady:0, ObservedGeneration:0, UpdatedNumberScheduled:0, NumberAvailable:0, NumberUnavailable:0, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}
Oct 21 15:02:13.564: INFO: waiting for daemonset: v1.DaemonSetStatus{CurrentNumberScheduled:6, NumberMisscheduled:0, DesiredNumberScheduled:6, NumberReady:0, ObservedGeneration:1, UpdatedNumberScheduled:6, NumberAvailable:0, NumberUnavailable:6, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}
Oct 21 15:02:18.638: INFO: waiting for daemonset: v1.DaemonSetStatus{CurrentNumberScheduled:6, NumberMisscheduled:0, DesiredNumberScheduled:6, NumberReady:0, ObservedGeneration:1, UpdatedNumberScheduled:6, NumberAvailable:0, NumberUnavailable:6, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}
Oct 21 15:02:23.565: INFO: waiting for daemonset: v1.DaemonSetStatus{CurrentNumberScheduled:6, NumberMisscheduled:0, DesiredNumberScheduled:6, NumberReady:2, ObservedGeneration:1, UpdatedNumberScheduled:6, NumberAvailable:2, NumberUnavailable:4, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}
Oct 21 15:02:28.571: INFO: daemonset ready: v1.DaemonSetStatus{CurrentNumberScheduled:6, NumberMisscheduled:0, DesiredNumberScheduled:6, NumberReady:6, ObservedGeneration:1, UpdatedNumberScheduled:6, NumberAvailable:6, NumberUnavailable:0, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}
[AfterEach] [sig-network] Internal connectivity
  k8s.io/kubernetes.0/test/e2e/framework/framework.go:175
Oct 21 15:02:33.051: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-k8s-nettest-5638" for this suite.
Oct 21 15:02:33.230: INFO: Running AfterSuite actions on all nodes
Oct 21 15:02:33.230: INFO: Running AfterSuite actions on node 1
[vpickard@rippleRider$][~/go/src/github.com/openshift/origin] (master +$%>)$ 




This test also passes in the conformance/parallel test suite:

started: (6/1184/2431) "[sig-network] Internal connectivity for TCP and UDP on ports 9000-9999 is allowed [Suite:openshift/conformance/parallel]"
passed: (55.2s) 2020-10-21T14:50:43 "[sig-network] Internal connectivity for TCP and UDP on ports 9000-9999 is allowed [Suite:openshift/conformance/parallel]"

Please reopen if you see this issue again. Thanks.