Bug 1929396

Summary: [sig-network-edge] Application behind service load balancer with PDB is not disrupted
Product: OpenShift Container Platform Reporter: Surya Seetharaman <surya>
Component: NetworkingAssignee: Candace Holman <cholman>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: high    
Priority: medium CC: amcdermo, aos-bugs, bbennett, cholman, dgoodwin, jechen, jerzhang, jforce, mfisher, mharri, mmasters, wking
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
Story Points: ---
Clone Of:
: 1932491 1932493 1932629 (view as bug list) Environment:
[sig-network-edge] Application behind service load balancer with PDB is not disrupted
Last Closed: 2022-11-04 15:09:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1943804, 1970985, 1987046, 2015512    
Bug Blocks: 1932491, 1932493, 1932629, 1943566    

Description Surya Seetharaman 2021-02-16 19:19:38 UTC
test:
[sig-network-edge] Application behind service load balancer with PDB is not disrupted 

is flaking frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network-edge%5C%5D+Application+behind+service+load+balancer+with+PDB+is+not+disrupted

Pass rate 50% on gcp according to sippy

Example:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.5-stable-to-4.6-ci/1361717216143216640

disruption_tests: [sig-network-edge] Application behind service load balancer with PDB is not disrupted expand_less
Run #0: Failed expand_less	1h11m46s
Service was unreachable during disruption for at least 9s of 1h8m31s (0%), this is currently sufficient to pass the test/job but not considered completely correct:

Feb 16 17:59:45.229 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service stopped responding to GET requests over new connections
Feb 16 17:59:46.229 - 5s    E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service is not responding to GET requests over new connections
Feb 16 17:59:52.690 I ns/e2e-k8s-service-lb-available-1335 svc/service-test Service started responding to GET requests over new connections
Feb 16 18:01:44.229 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service stopped responding to GET requests over new connections
Feb 16 18:01:44.414 I ns/e2e-k8s-service-lb-available-1335 svc/service-test Service started responding to GET requests over new connections
Feb 16 18:18:00.411 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service stopped responding to GET requests over new connections
Feb 16 18:18:01.229 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service is not responding to GET requests over new connections
Feb 16 18:18:01.401 I ns/e2e-k8s-service-lb-available-1335 svc/service-test Service started responding to GET requests over new connections
Run #1: Passed expand_more

Comment 1 Miciah Dashiel Butler Masters 2021-02-18 17:22:33 UTC
From the Sippy results, I don't see a 50% failure rate.  Note that the Sippy link provided includes test runs where the test did not exceed the threshold to cause a failure.  Ryan will investigate to determine the actual failure rate and severity of the issue.

Comment 2 Jon Jackson 2021-02-24 18:01:28 UTC
Updated this original bug to target 4.8 and cloned to 4.7 and 4.6 for backporting.

Comment 3 Yu Qi Zhang 2021-03-31 18:19:40 UTC
As of Mar. 28 this test is failing 100% on https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade&sort-by-flakiness= although there are also many other issues

This job is monitored as part of our release reporting https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/dashboards/overview#4.8.0-0.ci. Moving to high severity

Comment 16 Hongan Li 2021-06-15 02:16:55 UTC
per Comment 15, assign back for investigation.

Comment 22 Devan Goodwin 2021-12-13 14:42:43 UTC
Test is up to 90% now which is good, but there does look to be a problem for gcp ovn upgrades from 4.9 to 4.10, that line in the sippy graph is pretty consistently below 50%. 

https://sippy.ci.openshift.org/sippy-ng/tests/4.10/analysis?test=[sig-network-edge]%20Application%20behind%20service%20load%20balancer%20with%20PDB%20is%20not%20disrupted

Comment 34 mfisher 2022-11-04 15:09:52 UTC
This issue is stale and closed because it has no activity for a significant amount of time.  If this issue should not be closed please verify the condition still exists on a supported release and submit an updated bug.