Bug 1694172

Summary:	Client received empty watch event type during e2e test
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Master	Assignee:	Tomáš Nožička <tnozicka>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Xingxing Xia <xxia>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	aos-bugs, jokerman, mmccomas
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-04-03 16:13:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-03-29 17:09:18 UTC

Watch event from server had an empty event type, which should never happen.  Might just be part of the test scenario, but we need to prove it isn't some sort of subtle bug lurking in kube (for instance, maybe on restart we send back invalid watch events).

fail [github.com/openshift/origin/test/extended/deployments/util.go:749]: Expected error:
    <*errors.errorString | 0xc420325410>: {
        s: "unexpected event: watch.Event{Type:\"\", Object:runtime.Object(nil)}",
    }
    unexpected event: watch.Event{Type:"", Object:runtime.Object(nil)}
not to have occurred

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/6281

Happening 1/15 or so runs.

Comment 1 Clayton Coleman 2019-03-29 17:10:19 UTC

Also https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/6280 in different test

Comment 2 Tomáš Nožička 2019-04-01 13:27:57 UTC

it was actually a watch closed prematurely, so the real cause was kube-apiserver restart during test

https://github.com/openshift/origin/pull/22440


I do wonder why the overall test for control plane stability invariant didn't go red...

The kube-apiserver was rolling out during e2e:

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/6281/artifacts/e2e-aws/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-7d7794457d-4588c_operator.log.gz

I0329 00:22:56.699922       1 installer_controller.go:314] "ip-10-0-151-80.ec2.internal" is in transition to 5, but has not made progress
I0329 00:23:00.510078       1 status_controller.go:156] clusteroperator/kube-apiserver diff {"status":{"conditions":[{"lastTransitionTime":"2019-03-29T00:20:20Z","reason":"AsExpected","status":"False","type":"Failing"},{"lastTransitionTime":"2019-03-29T00:13:55Z","message":"Progressing: 2 nodes are at revision 5; 1 nodes are at revision 2","reason":"Progressing","status":"True","type":"Progressing"},{"lastTransitionTime":"2019-03-29T00:14:35Z","message":"Available: 3 nodes are active; 2 nodes are at revision 5; 1 nodes are at revision 2","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2019-03-29T00:12:51Z","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0329 00:23:00.523831       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"c99e40a6-51b6-11e9-b977-126e0dd8934a", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for operator kube-apiserver changed: Failing message changed from "StaticPodsFailing: nodes/ip-10-0-151-80.ec2.internal pods/kube-apiserver-ip-10-0-151-80.ec2.internal container=\"kube-apiserver-5\" is not ready" to ""

Comment 4 Xingxing Xia 2019-04-03 08:45:20 UTC

Above https://github.com/openshift/origin/pull/22440 is still open

Comment 5 Tomáš Nožička 2019-04-03 16:13:00 UTC

merged