Bug 1932491

Summary: [sig-network-edge] Application behind service load balancer with PDB is not disrupted
Product: OpenShift Container Platform Reporter: Jon Jackson <jonjacks>
Component: NetworkingAssignee: aos-network-edge-staff <aos-network-edge-staff>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED CANTFIX Docs Contact:
Severity: low    
Priority: low CC: amcdermo, aos-bugs, cholman, hongli, mmasters, rfredette, surya, wking
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1929396 Environment:
[sig-network-edge] Application behind service load balancer with PDB is not disrupted
Last Closed: 2021-07-01 18:19:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1929396, 1932493, 1932629    
Bug Blocks:    

Description Jon Jackson 2021-02-24 17:56:01 UTC
+++ This bug was initially created as a clone of Bug #1929396 +++

test:
[sig-network-edge] Application behind service load balancer with PDB is not disrupted 

is flaking frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network-edge%5C%5D+Application+behind+service+load+balancer+with+PDB+is+not+disrupted

Pass rate 50% on gcp according to sippy

Example:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.5-stable-to-4.6-ci/1361717216143216640

disruption_tests: [sig-network-edge] Application behind service load balancer with PDB is not disrupted expand_less
Run #0: Failed expand_less	1h11m46s
Service was unreachable during disruption for at least 9s of 1h8m31s (0%), this is currently sufficient to pass the test/job but not considered completely correct:

Feb 16 17:59:45.229 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service stopped responding to GET requests over new connections
Feb 16 17:59:46.229 - 5s    E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service is not responding to GET requests over new connections
Feb 16 17:59:52.690 I ns/e2e-k8s-service-lb-available-1335 svc/service-test Service started responding to GET requests over new connections
Feb 16 18:01:44.229 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service stopped responding to GET requests over new connections
Feb 16 18:01:44.414 I ns/e2e-k8s-service-lb-available-1335 svc/service-test Service started responding to GET requests over new connections
Feb 16 18:18:00.411 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service stopped responding to GET requests over new connections
Feb 16 18:18:01.229 E ns/e2e-k8s-service-lb-available-1335 svc/service-test Service is not responding to GET requests over new connections
Feb 16 18:18:01.401 I ns/e2e-k8s-service-lb-available-1335 svc/service-test Service started responding to GET requests over new connections
Run #1: Passed expand_more

--- Additional comment from Miciah Dashiel Butler Masters on 2021-02-18 12:22:33 EST ---

From the Sippy results, I don't see a 50% failure rate.  Note that the Sippy link provided includes test runs where the test did not exceed the threshold to cause a failure.  Ryan will investigate to determine the actual failure rate and severity of the issue.