Bug 1822298

Summary:	[sig-builds][Feature:Builds] oc new-app should fail with a --name longer than 58 characters [Suite:openshift/conformance/parallel]
Product:	OpenShift Container Platform	Reporter:	Dan Williams <dcbw>
Component:	kube-controller-manager	Assignee:	Maciej Szulik <maszulik>
Status:	CLOSED DUPLICATE	QA Contact:	zhou ying <yinzhou>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.5	CC:	aos-bugs, gmontero, mfojtik, wzheng
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-09 08:51:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Williams 2020-04-08 16:59:09 UTC

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/24833/pull-ci-openshift-origin-master-e2e-gcp/7111/

Comment 1 Gabe Montero 2020-04-08 19:05:14 UTC

*** Bug 1822303 has been marked as a duplicate of this bug. ***

Comment 2 Gabe Montero 2020-04-08 19:05:42 UTC

*** Bug 1822302 has been marked as a duplicate of this bug. ***

Comment 3 Gabe Montero 2020-04-08 19:06:04 UTC

*** Bug 1822301 has been marked as a duplicate of this bug. ***

Comment 4 Gabe Montero 2020-04-08 19:06:23 UTC

*** Bug 1822300 has been marked as a duplicate of this bug. ***

Comment 5 Gabe Montero 2020-04-08 19:06:56 UTC

*** Bug 1822299 has been marked as a duplicate of this bug. ***

Comment 6 Gabe Montero 2020-04-08 19:25:46 UTC

So the kube-apiserver went degraged:

Apr 08 11:35:32.675 E clusteroperator/kube-apiserver changed Degraded to True: NodeInstaller_InstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 7:\nNodeInstallerDegraded: 

and OCM initally had trouble with leader election at 

fail [github.com/openshift/origin/test/extended/operators/cluster.go:114]: Expected
    <[]string | len:1, cap:1>: [
        "Pod openshift-controller-manager/controller-manager-mwgnw is not healthy: I0408 11:29:14.758226       1 controller_manager.go:39] Starting controllers on 0.0.0.0:8443 (unknown)\nI0408 11:29:14.761078       1 controller_manager.go:50] DeploymentConfig controller using images from \"registry.svc.ci.openshift.org/ci-op-0vjskh7j/stable@sha256:baf34611b723ba5e9b3ead8872fed2c8af700156096054d720d42a057f5f24be\"\nI0408 11:29:14.761264       1 controller_manager.go:56] Build controller using images from \"registry.svc.ci.openshift.org/ci-op-0vjskh7j/stable@sha256:19880395f98981bdfd98ffbfc9e4e878aa085ecf1e91f2073c24679545e41478\"\nI0408 11:29:14.761230       1 standalone_apiserver.go:98] Started health checks at 0.0.0.0:8443\nI0408 11:29:14.766485       1 leaderelection.go:242] attempting to acquire leader lease  openshift-controller-manager/openshift-master-controllers...\n",
    ]
to be empty

though perhaps it recovered, as and 10 to 15 minutes later I see activity in the OCM controller-manager log, but a repeated amount of "...forbidden: unable to create new content in namespace..." across all the test namesapces like 

E0408 11:49:39.086650       1 create_dockercfg_secrets.go:285] error syncing service, it will be tried again on a resync e2e-test-image-layers-x6c66/default: secrets "default-token-w5prw" is forbidden: unable to create new content in namespace e2e-test-image-layers-x6c66 because it is being terminated

where the timestamps correlate to what the sig-builds tests are seeing, which is the OCM is still in progressing==true state over the course of 2 to 3 minutes when they are trying to start.

Apr  8 11:48:29.859: INFO: OCM rollout still progressing or in error: True

there are also no artifacts available for the test run noted in the description (get a 404 on the artifacts link)

Sending to kube-controller-manager to see if they can correlate all the forbidden unable to create stuff I mentioned with cluster-policy-controller.

Feels like a GCP env issue but if folks who own the stack beneath builds can clarify, that would be good.

Comment 7 Maciej Szulik 2020-04-09 08:51:05 UTC

It might be definitely related I see a ton of 

namespace_scc_allocation_controller.go:336] error syncing namespace, it will be retried: Operation cannot be fulfilled on namespaces "e2e-test-operators-qzmwm": the object has been modified; please apply your changes to the latest version and try again

from 11:36:56.425828 until 11:57:38.401832. 

I'm going to close this as duplicate.

*** This bug has been marked as a duplicate of bug 1820687 ***