1691269 – kube-controller-manager operator fail to upgrade

Bug 1691269 - kube-controller-manager operator fail to upgrade

Summary: kube-controller-manager operator fail to upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	4.1.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Michal Fojtik
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-21 09:54 UTC by MinLi
Modified:	2019-06-04 10:46 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:46:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:46:23 UTC

Description MinLi 2019-03-21 09:54:05 UTC

Description of problem:
when upgrade from 4.0.0-0.nightly-2019-03-19-004004 to  4.0.0-0.nightly-2019-03-20-153904, Cluster operator kube-controller-manager is reporting a failure: NodeInstallerFailing: 0 nodes are failing on revision 8

Version-Release number of selected component (if applicable):
4.0.0-0.nightly-2019-03-20-153904

How reproducible:
always 

Steps to Reproduce:
1.install a cluster 4.0 with 4.0.0-0.nightly-2019-03-19-004004

2.upgrade cluster to 4.0.0-0.nightly-2019-03-20-153904
#oc adm upgrade --to 4.0.0-0.nightly-2019-03-20-153904

3.check clusterversion  and clusteroperator 
#oc get clusterversion
#oc get clusteroperator

Actual results:
upgrade is fail, and check cluster-version-operator log, show:

I0321 09:20:59.722478       1 cvo.go:320] Desired version from spec is v1.Update{Version:"4.0.0-0.nightly-2019-03-20-153904", Image:"registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-20-153904"}
I0321 09:20:59.722644       1 cvo.go:297] Finished syncing cluster version "openshift-cluster-version/version" (338.432µs)
E0321 09:21:12.990874       1 task.go:58] error running apply for clusteroperator "kube-controller-manager" (76 of 308): Cluster operator kube-controller-manager is reporting a failure: NodeInstallerFailing: 0 nodes are failing on revision 8:
NodeInstallerFailing: installer: manager
NodeInstallerFailing: I0321 08:34:53.549721       1 cmd.go:308] Writing pod manifest "/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-8/kube-controller-manager-pod.yaml" ...
NodeInstallerFailing: I0321 08:34:53.550244       1 cmd.go:314] Creating directory for static pod manifest "/etc/kubernetes/manifests" ...
NodeInstallerFailing: I0321 08:34:53.550335       1 cmd.go:328] Writing static pod manifest "/etc/kubernetes/manifests/kube-controller-manager-pod.yaml" ...
NodeInstallerFailing: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"kube-controller-manager","namespace":"openshift-kube-controller-manager","creationTimestamp":null,"labels":{"app":"kube-controller-manager","kube-controller-manager":"true","revision":"8"}},"spec":{"volumes":[{"name":"resource-dir","hostPath":{"path":"/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-8"}}],"containers":[{"name":"kube-controller-manager-8","image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2a4f752e5f0f174d0581f330a4a3211a7f68c34a1d593176db051fc90a5f6a3d","command":["hyperkube","kube-controller-manager"],"args":["--openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml","--kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig","-v=2"],"ports":[{"containerPort":10257}],"resources":{"requests":{"cpu":"100m","memory":"200Mi"}},"volumeMounts":[{"name":"resource-dir","mountPath":"/etc/kubernetes/static-pod-resources"}],"livenessProbe":{"httpGet":{"path":"healthz","port":10257,"scheme":"HTTPS"},"initialDelaySeconds":45,"timeoutSeconds":10},"readinessProbe":{"httpGet":{"path":"healthz","port":10257,"scheme":"HTTPS"},"initialDelaySeconds":10,"timeoutSeconds":10},"terminationMessagePolicy":"FallbackToLogsOnError","imagePullPolicy":"IfNotPresent"}],"hostNetwork":true,"tolerations":[{"operator":"Exists"}],"priorityClassName":"system-node-critical"},"status":{}}
NodeInstallerFailing: I0321 08:34:53.617509       1 request.go:530] Throttling request took 66.691145ms, request: POST:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/events
NodeInstallerFailing: 
I0321 09:21:17.522465       1 leaderelection.go:227] successfully renewed lease openshift-cluster-version/version
I0321 09:21:47.530861       1 leaderelection.go:227] successfully renewed lease openshift-cluster-version/version

[root@localhost lyman]# oc get clusteroperator 
NAME                                  VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
...
kube-apiserver                        4.0.0-0.nightly-2019-03-20-153904   True        False         False     42m
kube-controller-manager               4.0.0-0.nightly-2019-03-20-153904   True        True          True      46m
kube-scheduler                        4.0.0-0.nightly-2019-03-20-153904   True        False         False     44m
...


Expected results:
all clusteroperator upgrade succeed.

Additional info:

Comment 1 Xingxing Xia 2019-03-22 06:56:38 UTC

(In reply to MinLi from comment #0)
> How reproducible:
> always
Thanks for reporting it. Did not meet it in my upgrade testing. FYI, per below comments, this and below bugs are probability issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1690088#c5 (same symptom as this bug)
https://bugzilla.redhat.com/show_bug.cgi?id=1690153#c3 (same symptom as this bug)

Comment 2 Michal Fojtik 2019-03-25 10:17:14 UTC

The E0321 09:21:12.990874       1 task.go:58] error running apply for clusteroperator "kube-controller-manager" (76 of 308): Cluster operator kube-controller-manager is reporting a failure: NodeInstallerFailing: 0 nodes are failing on revision 8:

should be fixed by https://github.com/openshift/cluster-kube-controller-manager-operator/pull/198

Comment 3 Wei Sun 2019-03-26 08:35:51 UTC

The fix has been merged,move to the QA status to check if it has been fixed in the latest build.

Comment 4 MinLi 2019-03-26 08:47:26 UTC

verified! upgrade succeed !
 
version:
upgrade from 4.0.0-0.nightly-2019-03-23-222829 to 4.0.0-0.nightly-2019-03-25-141538

Comment 6 errata-xmlrpc 2019-06-04 10:46:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.