Bug 1949978

Summary: [sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the h2spec conformance tests [Suite:openshift/conformance/parallel/minimal]
Product: OpenShift Container Platform Reporter: Oleg Bulatov <obulatov>
Component: NetworkingAssignee: Andrew McDermott <amcdermo>
Networking sub component: router QA Contact: jechen <jechen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: amcdermo, aos-bugs, bperkins, jechen
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the h2spec conformance tests [Suite:openshift/conformance/parallel/minimal]
Last Closed: 2021-07-27 23:01:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oleg Bulatov 2021-04-15 14:09:27 UTC
test:
[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should pass the h2spec conformance tests [Suite:openshift/conformance/parallel/minimal] 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network-edge%5C%5D%5C%5BConformance%5C%5D%5C%5BArea%3ANetworking%5C%5D%5C%5BFeature%3ARouter%5C%5D+The+HAProxy+router+should+pass+the+h2spec+conformance+tests+%5C%5BSuite%3Aopenshift%2Fconformance%2Fparallel%2Fminimal%5C%5D

An example of a failed job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws/1382601757305081856

Output:

fail [github.com/openshift/origin/test/extended/router/h2spec.go:155]: Unexpected error:
    <exec.CodeExitError>: {
        Err: {
            s: "error running /usr/bin/kubectl --server=https://api.ci-op-gt9b13hp-abfa2.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/kubeconfig-030368613 --namespace=e2e-test-router-h2spec-h7w64 exec h2spec -- /bin/sh -x -c cat \"/tmp/h2spec-results\":\nCommand stdout:\n\nstderr:\n+ cat /tmp/h2spec-results\ncat: /tmp/h2spec-results: No such file or directory\ncommand terminated with exit code 1\n\nerror:\nexit status 1",
        },
        Code: 1,
    }
    error running /usr/bin/kubectl --server=https://api.ci-op-gt9b13hp-abfa2.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/kubeconfig-030368613 --namespace=e2e-test-router-h2spec-h7w64 exec h2spec -- /bin/sh -x -c cat "/tmp/h2spec-results":
    Command stdout:
    
    stderr:
    + cat /tmp/h2spec-results
    cat: /tmp/h2spec-results: No such file or directory
    command terminated with exit code 1
    
    error:
    exit status 1
occurred

Comment 1 Oleg Bulatov 2021-04-15 14:11:38 UTC
Marking it as a high severity bug, as it has high impact on CI.

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=%5C%5Bsig-network-edge%5C%5D%5C%5BConformance%5C%5D%5C%5BArea%3ANetworking%5C%5D%5C%5BFeature%3ARouter%5C%5D+The+HAProxy+router+should+pass+the+h2spec+conformance+tests+%5C%5BSuite%3Aopenshift%2Fconformance%2Fparallel%2Fminimal%5C%5D&maxAge=168h&context=1&type=bug%2Bjunit&name=4.8&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'of failures match'
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-proxy (all) - 31 runs, 94% failed, 34% of failures match = 32% impact
release-openshift-ocp-installer-e2e-aws-upi-4.8 (all) - 27 runs, 81% failed, 32% of failures match = 26% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-fips (all) - 27 runs, 37% failed, 40% of failures match = 15% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws (all) - 39 runs, 31% failed, 17% of failures match = 5% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-azure (all) - 10 runs, 90% failed, 33% of failures match = 30% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-gcp-rt (all) - 26 runs, 100% failed, 12% of failures match = 12% impact
promote-release-openshift-okd-machine-os-content-e2e-aws-4.8 (all) - 84 runs, 13% failed, 64% of failures match = 8% impact
release-openshift-ocp-installer-e2e-gcp-ovn-4.8 (all) - 28 runs, 86% failed, 17% of failures match = 14% impact
release-openshift-ocp-installer-e2e-aws-ovn-4.8 (all) - 27 runs, 44% failed, 42% of failures match = 19% impact
release-openshift-ocp-installer-e2e-openstack-4.8 (all) - 21 runs, 100% failed, 24% of failures match = 24% impact
promote-release-openshift-machine-os-content-e2e-aws-4.8 (all) - 77 runs, 12% failed, 44% of failures match = 5% impact
periodic-ci-openshift-release-master-okd-4.8-e2e-aws (all) - 42 runs, 62% failed, 15% of failures match = 10% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-rollback (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
release-openshift-origin-installer-e2e-aws-disruptive-4.8 (all) - 4 runs, 75% failed, 33% of failures match = 25% impact
release-openshift-origin-installer-e2e-aws-shared-vpc-4.8 (all) - 4 runs, 75% failed, 33% of failures match = 25% impact
rehearse-17585-pull-ci-openshift-cluster-nfd-operator-release-4.8-e2e-aws (all) - 17 runs, 65% failed, 18% of failures match = 12% impact
release-openshift-ocp-installer-e2e-azure-ovn-4.8 (all) - 27 runs, 81% failed, 5% of failures match = 4% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade (all) - 30 runs, 70% failed, 14% of failures match = 10% impact
rehearse-17717-pull-ci-openshift-openshift-apiserver-release-4.8-e2e-aws (all) - 5 runs, 60% failed, 33% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-workers-rhel7 (all) - 28 runs, 64% failed, 6% of failures match = 4% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-compact (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-compact-serial (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
release-openshift-ocp-installer-e2e-remote-libvirt-ppc64le-4.8 (all) - 14 runs, 93% failed, 8% of failures match = 7% impact
release-openshift-ocp-installer-e2e-remote-libvirt-s390x-4.8 (all) - 14 runs, 71% failed, 10% of failures match = 7% impact
release-openshift-ocp-installer-e2e-remote-libvirt-compact-s390x-4.8 (all) - 13 runs, 92% failed, 8% of failures match = 8% impact

Comment 3 Andrew McDermott 2021-04-16 13:02:10 UTC
I will follow up with another PR that disables the test if running in a proxied environment.

Comment 4 Andrew McDermott 2021-04-16 13:55:47 UTC
Added additional fix https://github.com/openshift/origin/pull/26086, moving back to POST.

Comment 6 Hongan Li 2021-04-19 03:52:33 UTC
checked the search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network-edge%5C%5D%5C%5BConformance%5C%5D%5C%5BArea%3ANetworking%5C%5D%5C%5BFeature%3ARouter%5C%5D+The+HAProxy+router+should+pass+the+h2spec+conformance+tests+%5C%5BSuite%3Aopenshift%2Fconformance%2Fparallel%2Fminimal%5C%5D

and still seeing failures in: 
periodic-ci-openshift-release-master-nightly-4.8-e2e-gcp-rt (all) 
release-openshift-ocp-installer-e2e-aws-mirrors-4.8 (all)
release-openshift-ocp-installer-e2e-aws-upi-4.8 (all)

Comment 7 Andrew McDermott 2021-04-19 08:05:11 UTC
Looking at one result in:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.8/1384016184512352256/artifacts/e2e-aws-upi/e2e.log

  5. Streams and Multiplexing
    5.1. Stream States
        1: idle: Sends a DATA frame
      ✔ 1: idle: Sends a DATA frame
        2: idle: Sends a RST_STREAM frame
      ✔ 2: idle: Sends a RST_STREAM frame
        3: idle: Sends a WINDOW_UPDATE frame
      ✔ 3: idle: Sends a WINDOW_UPDATE frame
        4: idle: Sends a CONTINUATION frame
      ✔ 4: idle: Sends a CONTINUATION frame
        5: half closed (remote): Sends a DATA frame
      ✔ 5: half closed (remote): Sends a DATA frame
        6: half closed (remote): Sends a HEADERS frame
      ✔ 6: half closed (remote): Sends a HEADERS frame
        7: half closed (remote): Sends a CONTINUATION frame
      × 7: half closed (remote): Sends a CONTINUATION frame

Error: dial tcp 54.147.190.229:443: i/o timeout

I see the tests run but are getting a timeout. Investigating.

Comment 8 Andrew McDermott 2021-04-19 08:44:41 UTC
It also looks like quite a lot of the tests run to (almost) completion.
The tests run, but looking through the logs the last action is to delete
the ingresscontroller that was stood up for the test and deleting that 
may be taking too long.

https://bugzilla.redhat.com/show_bug.cgi?id=1912413

Comment 10 jechen 2021-04-21 13:57:21 UTC
checked most recent CI reports (over 17 runs), did not see the h2spec conformance tests fail for HAProxy router any more, change the bug status to verified.

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws

Comment 13 errata-xmlrpc 2021-07-27 23:01:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438