test: [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal] is failing especially on the Azure UPI jobs: https://search.ci.openshift.org/?search=The+HAProxy+router+should+pass+the+gRPC+interoperability+tests+&maxAge=96h For example https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/5729/pull-ci-openshift-installer-master-e2e-azure-upi/1506106488593059840/build-log.txt https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/5615/pull-ci-openshift-installer-master-e2e-azure-upi/1506396755300716544/build-log.txt It looks like new router shards are not rolling out: ``` DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying... Mar 23 01:11:14.197: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying... Mar 23 01:11:17.197: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying... Mar 23 01:11:20.197: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying... Mar 23 01:11:20.244: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying... [AfterEach] [sig-network-edge][Conformance][Area:Networking][Feature:Router] github.com/openshift/origin/test/extended/util/client.go:138 STEP: Collecting events from namespace "e2e-test-router-http2-7mbr6". STEP: Found 5 events. Mar 23 01:11:20.290: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for http2: { } Scheduled: Successfully assigned e2e-test-router-http2-7mbr6/http2 to ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2 Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {multus } AddedInterface: Add eth0 [10.128.2.96/23] from openshift-sdn Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {kubelet ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2} Pulled: Container image "registry.build04.ci.openshift.org/ci-op-xq8gq76b/stable@sha256:067d833aa907b6682b037534bef37d344d64072e26fc9abd3a63502a278bea12" already present on machine Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {kubelet ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2} Created: Created container server Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {kubelet ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2} Started: Started container server Mar 23 01:11:20.335: INFO: POD NODE PHASE GRACE CONDITIONS Mar 23 01:11:20.335: INFO: http2 ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2 Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:14 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:44 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:44 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:14 +0000 UTC }] Mar 23 01:11:20.335: INFO: Mar 23 01:11:20.512: INFO: skipping dumping cluster info - cluster too large Mar 23 01:11:20.566: INFO: Deleted {user.openshift.io/v1, Resource=users e2e-test-router-http2-7mbr6-user}, err: <nil> Mar 23 01:11:20.619: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-router-http2-7mbr6}, err: <nil> Mar 23 01:11:20.669: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~4Cc6mrEaInLesaj5AuAO8hXDt-2cpvWPp5MwF04L8Co}, err: <nil> [AfterEach] [sig-network-edge][Conformance][Area:Networking][Feature:Router] github.com/openshift/origin/test/extended/util/client.go:139 STEP: Destroying namespace "e2e-test-router-http2-7mbr6" for this suite. [AfterEach] [sig-network-edge][Conformance][Area:Networking][Feature:Router] github.com/openshift/origin/test/extended/router/http2.go:73 fail [github.com/openshift/origin/test/extended/router/http2.go:157]: new router shard did not rollout Unexpected error: <*errors.errorString | 0xc000346c70>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred failed: (10m10s) 2022-03-23T01:11:20 "[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the http2 tests [Suite:openshift/conformance/parallel/minimal]" ```
Might be the same root cause for these other HAProxy tests: [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the h2spec conformance tests [Suite:openshift/conformance/parallel/minimal] [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the http2 tests [Suite:openshift/conformance/parallel/minimal]
The failing test uses the "DeployNewRouterShard" function defined at <https://github.com/openshift/origin/blob/c088b40487a67f572ab12d140593e08f9de6be2a/test/extended/router/shard/shard.go#L37-L51> to verify that the ingresscontroller is ready. The DeployNewRouterShard function succeeds only if the ingresscontroller is reporting the expected status conditions. The expected status conditions include DNSManaged=True. In the failing CI runs, the ingresscontroller is reporting DNSManaged=False because the cluster DNS config does not specify public and private zones. This raises two questions: 1. Did Azure UPI CI jobs change recently not to configure cluster DNS? 2. Should DeployNewRouterShard check for the DNSManaged=True status condition, or would checking for Available=True suffice? Net Edge will follow up on these questions. I'm setting blocker- for now as this appears to be a problem with the test and not with the functionality under test.
search https://search.ci.openshift.org/?search=The+HAProxy+router+should+pass+the+gRPC+interoperability+tests+&maxAge=168h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job and still seeing high failures ratio for some jobs: one example: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-cloud-controller-manager-operator/181/pull-ci-openshift-cluster-cloud-controller-manager-operator-master-e2e-gcp-ccm/1516409860592242688 : [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal] expand_less 10m16s { fail [github.com/openshift/origin/test/extended/router/grpc-interop.go:107]: new router shard did not rollout Unexpected error: <*errors.errorString | 0xc00037ec60>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred} and https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-okd-4.10-e2e-vsphere/1516319301173252096 3m2s { fail [github.com/openshift/origin/test/extended/util/client.go:302]: Unexpected error: <*errors.errorString | 0xc000346c70>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred}
and error for azure upi: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/27087/rehearse-27087-pull-ci-openshift-installer-master-e2e-azure-upi/1516106152117538816 STEP: Waiting for route hostname to register in DNS Apr 18 20:04:38.338: INFO: host "grpc-interop-h2c.e2e-test-grpc-interop-67cls.apps.ci-op-00l27v7z-c3806.ci.azure.devcluster.openshift.com" resolves as [52.234.42.30], expecting 40.78.5.60, retrying in 1m0s...
still seeing failure in Azure UPI jobs, e.g https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/5951/pull-ci-openshift-installer-release-4.11-e2e-azure-upi/1544336707699085312 : [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal] expand_more : [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the h2spec conformance tests [Suite:openshift/conformance/parallel/minimal] expand_more
*** Bug 2103700 has been marked as a duplicate of this bug. ***
We now have periodic 4.11 azure-upi jobs where you can see the tests permafailing: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-e2e-azure-upi
https://github.com/openshift/origin/pull/27274 is in release-4.12, so we need to verify this BZ against 4.12 and then handle the 4.11.z and 4.10.z backports as separate BZs.
Checked latest 4.12 CI and didn't see the "timed out waiting for the condition" error any more, moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399