1754537 – [build-cop] Error level events detected during test run for release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2

Bug 1754537 - [build-cop] Error level events detected during test run for release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2

Summary: [build-cop] Error level events detected during test run for release-openshift...

Keywords:
Status:	CLOSED DUPLICATE of bug 1754523
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Casey Callendrello
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1754523 1755066
TreeView+	depends on / blocked

Reported:	2019-09-23 13:51 UTC by Lokesh Mandvekar
Modified:	2019-09-24 16:41 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-23 17:23:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Lokesh Mandvekar 2019-09-23 13:51:00 UTC

Description of problem:

Monitor cluster while tests execute

184 error level events were detected during this test run.

See: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2/176 (full output exceeded bz char limit)

Comment 1 Kirsten Garrison 2019-09-23 17:08:28 UTC

Looking at the logs I see repeated:
E0923 07:39:24.908443     180 reflector.go:126] github.com/openshift/origin/pkg/monitor/operator.go:126: Failed to list *v1.ClusterOperator: the server could not find the requested resource (get clusteroperators.config.openshift.io)
E0923 07:39:25.834871     180 reflector.go:126] github.com/openshift/origin/pkg/monitor/operator.go:279: Failed to list *v1.ClusterVersion: the server could not find the requested resource (get clusterversions.config.openshift.io) 


Then:
 Sep 23 07:31:52.384 E ns/openshift-sdn pod/sdn-mtjfw node/ip-10-0-147-211.ec2.internal container=sdn container exited with code 255 (Error): /sdn:metrics to [10.0.140.1:9101 10.0.141.43:9101 10.0.142.211:9101 10.0.143.11:9101 10.0.147.211:9101]\nI0923 07:30:52.078852   41011 roundrobin.go:240] Delete endpoint 10.0.159.41:9101 for service "openshift-sdn/sdn:metrics"\nI0923 07:30:52.078922   41011 proxy.go:331] hybrid proxy: syncProxyRules start\nI0923 07:30:52.239794   41011 proxy.go:334] hybrid proxy: mainProxy.syncProxyRules complete\nI0923 07:30:52.301991   41011 proxier.go:367] userspace proxy: processing 0 service events\nI0923 07:30:52.302016   41011 proxier.go:346] userspace syncProxyRules took 62.200113ms\nI0923 07:30:52.302029   41011 proxy.go:337] hybrid proxy: unidlingProxy.syncProxyRules complete\nI0923 07:31:01.205592   41011 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-sdn/sdn:metrics to [10.0.140.1:9101 10.0.141.43:9101 10.0.142.211:9101 10.0.143.11:9101 10.0.147.211:9101 10.0.159.41:9101]\nI0923 07:31:01.205629   41011 roundrobin.go:240] Delete endpoint 10.0.159.41:9101 for service "openshift-sdn/sdn:metrics"\nI0923 07:31:01.205672   41011 proxy.go:331] hybrid proxy: syncProxyRules start\nI0923 07:31:01.364854   41011 proxy.go:334] hybrid proxy: mainProxy.syncProxyRules complete\nI0923 07:31:01.432127   41011 proxier.go:367] userspace proxy: processing 0 service events\nI0923 07:31:01.432157   41011 proxier.go:346] userspace syncProxyRules took 67.277954ms\nI0923 07:31:01.432168   41011 proxy.go:337] hybrid proxy: unidlingProxy.syncProxyRules complete\nI0923 07:31:31.432418   41011 proxy.go:331] hybrid proxy: syncProxyRules start\nI0923 07:31:31.581740   41011 proxy.go:334] hybrid proxy: mainProxy.syncProxyRules complete\nI0923 07:31:31.644981   41011 proxier.go:367] userspace proxy: processing 0 service events\nI0923 07:31:31.645002   41011 proxier.go:346] userspace syncProxyRules took 63.240463ms\nI0923 07:31:31.645011   41011 proxy.go:337] hybrid proxy: unidlingProxy.syncProxyRules complete\nF0923 07:31:51.341946   41011 healthcheck.go:82] SDN healthcheck detected OVS server change, restarting: timed out waiting for the condition\n
Sep 23 07:31:52.530 I ns/openshift-sdn pod/sdn-mtjfw Created container sdn (2 times)
Sep 23 07:31:52.558 I ns/openshift-sdn pod/sdn-mtjfw Started container sdn (2 times) 

Then:

Sep 23 07:44:34.048 W ns/openshift-machine-config-operator pod/etcd-quorum-guard-884c9bc99-gnbt4 Readiness probe failed:  (4 times)
Sep 23 07:44:34.048 W ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-141-43.ec2.internal Readiness probe failed: HTTP probe failed with statuscode: 500 (4 times)
Sep 23 07:44:34.048 W ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-159-41.ec2.internal Liveness probe failed: HTTP probe failed with statuscode: 500 (2 times)
Sep 23 07:44:34.049 W ns/openshift-machine-config-operator pod/etcd-quorum-guard-884c9bc99-gnbt4 MountVolume.SetUp failed for volume "default-token-4qqnz" : Get https://api-int.ci-op-kqtv091h-45560.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/openshift-machine-config-operator/secrets/default-token-4qqnz: read tcp 10.0.141.43:35080->10.0.140.36:6443: use of closed network connection
Sep 23 07:44:34.049 W ns/openshift-machine-config-operator pod/etcd-quorum-guard-884c9bc99-wtshd Readiness probe failed:  (3 times)
Sep 23 07:44:34.049 I node/ip-10-0-141-43.ec2.internal Rolling back pending config rendered-master-b2352fb734390462ba844dce2604c4b6

This seems like some kind of networking issue...

Comment 2 Mrunal Patel 2019-09-23 17:23:38 UTC


*** This bug has been marked as a duplicate of bug 1754523 ***

Note You need to log in before you can comment on or make changes to this bug.