Bug 1703878 - [upgrade] Pod behind service load balancer becomes unavailable during cluster upgrade
Summary: [upgrade] Pod behind service load balancer becomes unavailable during cluster...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.0
Assignee: Jan Chaloupka
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-29 01:29 UTC by Clayton Coleman
Modified: 2019-06-04 10:48 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:48:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:48:17 UTC

Description Clayton Coleman 2019-04-29 01:29:09 UTC
During 1/5-1/6 e2e runs the upgrade service load balancer test fails (verifying that a pod behind the LB is still reachable continuously during upgrade). This could be multiple things:

1. The PDB part of the test isn't running except on GCE - I will patch this out shortly to ensure we run with PDB tests in our environment
2. Some variation of https://bugzilla.redhat.com/show_bug.cgi?id=1702414 could be impacting us
3. The MCD is not draining properly
4. The test may have an assumption that doesn't hold on OpenShift (if so, we need to discuss how to fix)

Needs investigation to determine whether we are disrupting workloads incorrectly during upgrade.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/883

Apr 28 00:25:37.121: Could not reach HTTP service through ad9d4be89694811e985461212303fb68-66553889.us-east-1.elb.amazonaws.com:80 after 2m0s

github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.(*ServiceTestJig).TestReachableHTTPWithRetriableErrorCodes(0xc0023f0880, 0xc0029d15e0, 0x45, 0x50, 0x909c4e0, 0x0, 0x0, 0x1bf08eb000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/service_util.go:855 +0x33c
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.(*ServiceTestJig).TestReachableHTTP(0xc0023f0880, 0xc0029d15e0, 0x45, 0x50, 0x1bf08eb000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/service_util.go:847 +0x75
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/upgrades.(*ServiceUpgradeTest).test.func1()

Comment 1 Clayton Coleman 2019-04-29 18:20:39 UTC
https://github.com/openshift/origin/pull/22711 will enable PDBs

Comment 2 Jan Chaloupka 2019-05-02 08:09:04 UTC
PR https://github.com/openshift/origin/pull/22711 got merged on May 1, 2019, 5:42 AM GMT+2. I went through PRs that are already merged and all the `ci/prow/e2e-aws-upgrade` tests went green after the timestamp. At the same time all the failed `ci/prow/e2e-aws-upgrade` runs were red and failed for the reason as described in this report. I have also checked new PRs and did not see any `ci/prow/e2e-aws-upgrade` test failing for the reasons mentioned in this report.

Comment 4 Clayton Coleman 2019-05-03 16:40:18 UTC
Seeing this https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1146 in recent runs which looks like a different failure:

May  3 15:34:41.108: Timed out waiting for service "service-test" to have a load balancer

github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.(*ServiceTestJig).waitForConditionOrFail(0xc002ba92c0, 0xc0029edfc0, 0x1f, 0xc0023f01a0, 0xc, 0x1176592e000, 0x4e4686e, 0x14, 0x50911d0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/service_util.go:589 +0x1e9
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.(*ServiceTestJig).WaitForLoadBalancerOrFail(0xc002ba92c0, 0xc0029edfc0, 0x1f, 0xc0023f01a0, 0xc, 0x1176592e000, 0x25)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/service_util.go:548 +0x15d
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/upgrades.(*ServiceUpgradeTest).Setup(0xc001e585d0, 0xc0023a42c0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/upgrades/services.go:52 +0x195
github.com/openshift/origin/test/e2e/upgrade.(*chaosMonkeyAdapter).Test(0xc002475840, 0xc002142960)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/e2e/upgrade/upgrade.go:165 +0x180
github.com/openshift/origin/test/e2e/upgrade.(*chaosMonkeyAdapter).Test-fm(0xc002142960)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/e2e/upgrade/upgrade.go:245 +0x34
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/chaosmonkey.(*chaosmonkey).Do.func1(0xc002142960, 0xc0027b5d50)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/chaosmonkey/chaosmonkey.go:89 +0x76
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/chaosmonkey.(*chaosmonkey).Do
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/chaosmonkey/chaosmonkey.go:86 +0xa7
				

Will spawn a separate bug for that.

Comment 5 Clayton Coleman 2019-05-03 16:48:52 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1706155

Comment 7 errata-xmlrpc 2019-06-04 10:48:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.