Bug 2067323 - [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal]
Summary: [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.12.0
Assignee: Andrew McDermott
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks: 2103700 2103701
TreeView+ depends on / blocked
 
Reported: 2022-03-23 19:04 UTC by Micah Abbott
Modified: 2023-01-17 19:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal]
Last Closed: 2023-01-17 19:47:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26949 0 None Merged Bug 2067323: extended/test/router: omit DNS managed conditions for shards 2022-06-23 16:15:45 UTC
Github openshift origin pull 27274 0 None Merged BUG 2067323: test/extended/router: Drop host lookup for gRPC HTTP/2 h2spec tests 2022-07-13 10:58:01 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:55:25 UTC

Description Micah Abbott 2022-03-23 19:04:28 UTC
test: 
[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal]

is failing especially on the Azure UPI jobs:

https://search.ci.openshift.org/?search=The+HAProxy+router+should+pass+the+gRPC+interoperability+tests+&maxAge=96h

For example

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/5729/pull-ci-openshift-installer-master-e2e-azure-upi/1506106488593059840/build-log.txt

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/5615/pull-ci-openshift-installer-master-e2e-azure-upi/1506396755300716544/build-log.txt


It looks like new router shards are not rolling out:


```
DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying...
Mar 23 01:11:14.197: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying...
Mar 23 01:11:17.197: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying...
Mar 23 01:11:20.197: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying...
Mar 23 01:11:20.244: INFO: ingresscontroller openshift-ingress-operator/e2e-test-router-http2-7mbr6 conditions not met; wanted map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True], got map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True LoadBalancerManaged:True LoadBalancerReady:True PodsScheduled:True Progressing:False Upgradeable:True], retrying...
[AfterEach] [sig-network-edge][Conformance][Area:Networking][Feature:Router]
  github.com/openshift/origin/test/extended/util/client.go:138
STEP: Collecting events from namespace "e2e-test-router-http2-7mbr6".
STEP: Found 5 events.
Mar 23 01:11:20.290: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for http2: { } Scheduled: Successfully assigned e2e-test-router-http2-7mbr6/http2 to ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2
Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {multus } AddedInterface: Add eth0 [10.128.2.96/23] from openshift-sdn
Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {kubelet ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2} Pulled: Container image "registry.build04.ci.openshift.org/ci-op-xq8gq76b/stable@sha256:067d833aa907b6682b037534bef37d344d64072e26fc9abd3a63502a278bea12" already present on machine
Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {kubelet ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2} Created: Created container server
Mar 23 01:11:20.290: INFO: At 2022-03-23 01:01:16 +0000 UTC - event for http2: {kubelet ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2} Started: Started container server
Mar 23 01:11:20.335: INFO: POD    NODE                                           PHASE    GRACE  CONDITIONS
Mar 23 01:11:20.335: INFO: http2  ci-op-xq8gq76b-ba9fa-sb26r-worker-centralus-2  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:14 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:44 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:44 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-03-23 01:01:14 +0000 UTC  }]
Mar 23 01:11:20.335: INFO: 
Mar 23 01:11:20.512: INFO: skipping dumping cluster info - cluster too large
Mar 23 01:11:20.566: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-router-http2-7mbr6-user}, err: <nil>
Mar 23 01:11:20.619: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-router-http2-7mbr6}, err: <nil>
Mar 23 01:11:20.669: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~4Cc6mrEaInLesaj5AuAO8hXDt-2cpvWPp5MwF04L8Co}, err: <nil>
[AfterEach] [sig-network-edge][Conformance][Area:Networking][Feature:Router]
  github.com/openshift/origin/test/extended/util/client.go:139
STEP: Destroying namespace "e2e-test-router-http2-7mbr6" for this suite.
[AfterEach] [sig-network-edge][Conformance][Area:Networking][Feature:Router]
  github.com/openshift/origin/test/extended/router/http2.go:73
fail [github.com/openshift/origin/test/extended/router/http2.go:157]: new router shard did not rollout
Unexpected error:
    <*errors.errorString | 0xc000346c70>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

failed: (10m10s) 2022-03-23T01:11:20 "[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the http2 tests [Suite:openshift/conformance/parallel/minimal]"
```

Comment 1 Micah Abbott 2022-03-23 19:05:34 UTC
Might be the same root cause for these other HAProxy tests:

[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the h2spec conformance tests [Suite:openshift/conformance/parallel/minimal]
[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the http2 tests [Suite:openshift/conformance/parallel/minimal]

Comment 2 Miciah Dashiel Butler Masters 2022-03-24 03:39:03 UTC
The failing test uses the "DeployNewRouterShard" function defined at <https://github.com/openshift/origin/blob/c088b40487a67f572ab12d140593e08f9de6be2a/test/extended/router/shard/shard.go#L37-L51> to verify that the ingresscontroller is ready.  The DeployNewRouterShard function succeeds only if the ingresscontroller is reporting the expected status conditions.  The expected status conditions include DNSManaged=True.  In the failing CI runs, the ingresscontroller is reporting DNSManaged=False because the cluster DNS config does not specify public and private zones.  This raises two questions:

1. Did Azure UPI CI jobs change recently not to configure cluster DNS?

2. Should DeployNewRouterShard check for the DNSManaged=True status condition, or would checking for Available=True suffice?

Net Edge will follow up on these questions.  

I'm setting blocker- for now as this appears to be a problem with the test and not with the functionality under test.

Comment 7 Hongan Li 2022-04-20 02:44:56 UTC
search https://search.ci.openshift.org/?search=The+HAProxy+router+should+pass+the+gRPC+interoperability+tests+&maxAge=168h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

and still seeing high failures ratio for some jobs:


one example:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-cloud-controller-manager-operator/181/pull-ci-openshift-cluster-cloud-controller-manager-operator-master-e2e-gcp-ccm/1516409860592242688

: [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal] expand_less 	10m16s
{  fail [github.com/openshift/origin/test/extended/router/grpc-interop.go:107]: new router shard did not rollout
Unexpected error:
    <*errors.errorString | 0xc00037ec60>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred}

and https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-okd-4.10-e2e-vsphere/1516319301173252096
 	3m2s
{  fail [github.com/openshift/origin/test/extended/util/client.go:302]: Unexpected error:
    <*errors.errorString | 0xc000346c70>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred}

Comment 8 Hongan Li 2022-04-20 09:14:30 UTC
and error for azure upi: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/27087/rehearse-27087-pull-ci-openshift-installer-master-e2e-azure-upi/1516106152117538816 


STEP: Waiting for route hostname to register in DNS
Apr 18 20:04:38.338: INFO: host "grpc-interop-h2c.e2e-test-grpc-interop-67cls.apps.ci-op-00l27v7z-c3806.ci.azure.devcluster.openshift.com" resolves as [52.234.42.30], expecting 40.78.5.60, retrying in 1m0s...

Comment 11 Hongan Li 2022-07-06 07:53:37 UTC
still seeing failure in Azure UPI jobs, e.g
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/5951/pull-ci-openshift-installer-release-4.11-e2e-azure-upi/1544336707699085312

: [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the gRPC interoperability tests [Suite:openshift/conformance/parallel/minimal] expand_more
: [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the h2spec conformance tests [Suite:openshift/conformance/parallel/minimal] expand_more

Comment 12 Andrew McDermott 2022-07-13 11:17:22 UTC
*** Bug 2103700 has been marked as a duplicate of this bug. ***

Comment 14 Rafael Fonseca 2022-08-25 17:42:26 UTC
We now have periodic 4.11 azure-upi jobs where you can see the tests permafailing: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-e2e-azure-upi

Comment 15 Miciah Dashiel Butler Masters 2022-08-30 16:42:13 UTC
https://github.com/openshift/origin/pull/27274 is in release-4.12, so we need to verify this BZ against 4.12 and then handle the 4.11.z and 4.10.z backports as separate BZs.

Comment 16 Hongan Li 2022-09-06 02:40:29 UTC
Checked latest 4.12 CI and didn't see the "timed out waiting for the condition" error any more, moving to verified.

Comment 19 errata-xmlrpc 2023-01-17 19:47:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 20 errata-xmlrpc 2023-01-17 19:55:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.