Bug 1749448

Summary: [upgrade] During 4.1 to 4.2 upgrade the load balancer availability test reported a failure
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: medium CC: aos-bugs, bbennett
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-05 17:19:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-09-05 15:58:54 UTC
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/336

Sep  5 08:47:09.319: INFO: Poke("http://ab0696926cfb611e9b2000a76ca579a6-1017163478.us-east-1.elb.amazonaws.com:80/echo?msg=hello"): Get http://ab0696926cfb611e9b2000a76ca579a6-1017163478.us-east-1.elb.amazonaws.com:80/echo?msg=hello: EOF
Sep  5 08:47:09.319: INFO: Could not reach HTTP service through ab0696926cfb611e9b2000a76ca579a6-1017163478.us-east-1.elb.amazonaws.com:80 after 2m0s

This test verifies that pods behind a service load balancer remain reachable during an upgrade.  There are two pods and they have a PDB that should ensure that at least one pod is available at all times.  The test does not flake in 4.1 z stream upgrades (which also reboot nodes).

This is a release blocker because either something serious is broken in LB, nodes, machine-config, or PDB such that the pods behind the service aren't reachable.

Comment 1 Ben Bennett 2019-09-05 17:19:38 UTC

*** This bug has been marked as a duplicate of bug 1749446 ***