Bug 1691119

Summary: machine-api-operator is not reporting failure using clusteroperator
Product: OpenShift Container Platform Reporter: Abhinav Dahiya <adahiya>
Component: Cloud ComputeAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: Jianwei Hou <jhou>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: agarcial, aos-cloud, jhou, wsun, zhsun
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:46:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Abhinav Dahiya 2019-03-20 22:39:41 UTC
Description of problem:

During my work on bare-metal UPI https://github.com/openshift/installer/pull/1416. The cluster is installed using None platform.

Machine API Operator was failing because it does not recognize None as a platform and was silently failing without reporting the error to its ClusterOperator.

oc --config dev-metal/auth/kubeconfig logs machine-api-operator-5f8d8dc78c-5k7dh -n openshift-machine-api
I0320 16:19:06.600839       1 start.go:39] Version: 0.1.0-256-g92bef467-dirty
I0320 16:19:06.602673       1 leaderelection.go:205] attempting to acquire leader lease  openshift-machine-api/machine-api-operator...
I0320 16:19:06.629429       1 leaderelection.go:214] successfully acquired lease openshift-machine-api/machine-api-operator
I0320 16:19:06.631028       1 operator.go:106] Starting Machine API Operator
I0320 16:19:06.731370       1 operator.go:114] Synced up caches
E0320 16:19:06.734565       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:06.742681       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:06.755565       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:06.778517       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:06.822930       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:06.906692       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:07.069804       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:07.392849       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:07.608933       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:08.036477       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:10.599568       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:15.725471       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:25.975529       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:19:46.459374       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:20:27.428836       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:21:49.359209       1 operator.go:176] Failed getting operator config: no platform provider found on install config
E0320 16:21:49.359491       1 operator.go:162] no platform provider found on install config

oc --config dev-metal/auth/kubeconfig get co
NAME                                  VERSION                           AVAILABLE   PROGRESSING   FAILING   SINCE
authentication                        4.0.0-0.alpha-2019-03-20-094557   True        False         False     50m
cluster-autoscaler                                                      True        False         True      2s
console                               4.0.0-0.alpha-2019-03-20-094557   True        False         False     50m
dns                                   4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h19m
image-registry                        4.0.0-0.alpha-2019-03-20-094557   True        False         False     51m
ingress                               4.0.0-0.alpha-2019-03-20-094557   True        False         False     3h14m
kube-apiserver                        4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h15m
kube-controller-manager               4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h13m
kube-scheduler                        4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h12m
machine-config                        4.0.0-0.alpha-2019-03-20-094557   False       True          True      6h19m
marketplace-operator                  4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h15m
monitoring                            4.0.0-0.alpha-2019-03-20-094557   True        False         False     16m
network                               4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h19m
node-tuning                           4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h15m
openshift-apiserver                   4.0.0-0.alpha-2019-03-20-094557   True        False         False     17m
openshift-cloud-credential-operator   4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h19m
openshift-controller-manager          4.0.0-0.alpha-2019-03-20-094557   True        False         False     50m
openshift-samples                     4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h14m
operator-lifecycle-manager            4.0.0-0.alpha-2019-03-20-094557   True        False         False     6h19m
service-ca                            4.0.0-0.alpha-2019-03-20-094557   True        False         False     137m
service-catalog-apiserver             4.0.0-0.alpha-2019-03-20-094557   True        False         False     50m
service-catalog-controller-manager    4.0.0-0.alpha-2019-03-20-094557   True        False         False     137m
storage                                                                 True        False         False     6h15m


oc --config dev-metal/auth/kubeconfig get co | grep machine-api


Machine API Operator should always report status using ClusterOperator when api is available..

Comment 2 Jan Chaloupka 2019-03-26 10:56:18 UTC
PR merged

Comment 4 Wei Sun 2019-04-10 02:58:17 UTC
Please help check if it could be verified against the latest build.

Comment 5 sunzhaohua 2019-04-23 09:26:49 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-04-22-005054   True        False         76m     Cluster version is 4.1.0-0.nightly-2019-04-22-005054



$ oc logs -f machine-api-operator-7d58d4ddbd-f7d9b
I0423 07:39:20.284433       1 start.go:39] Version: 4.1.0-201904211700-dirty
I0423 07:39:20.287034       1 leaderelection.go:205] attempting to acquire leader lease  openshift-machine-api/machine-api-operator...
I0423 07:39:20.299364       1 leaderelection.go:214] successfully acquired lease openshift-machine-api/machine-api-operator
I0423 07:39:20.302118       1 operator.go:121] Starting Machine API Operator
I0423 07:39:20.402328       1 operator.go:129] Synced up caches
I0423 07:39:20.408679       1 status.go:172] machine-api clusterOperator status does not exist, creating &{{ } {machine-api      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] } {} {[] [{operator 4.1.0-0.nightly-2019-04-22-005054}] [{ namespaces  openshift-machine-api}] {[] <nil>}}}
I0423 07:39:20.415343       1 event.go:221] Event(v1.ObjectReference{Kind:"ClusterOperator", Namespace:"", Name:"machine-api", UID:"e80dc31f-659a-11e9-a21d-801844eef6b8", APIVersion:"config.openshift.io/v1", ResourceVersion:"2528", FieldPath:""}): type: 'Normal' reason: 'Status upgrade' Progressing towards operator: 4.1.0-0.nightly-2019-04-22-005054
E0423 07:42:37.267388       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=63, ErrCode=NO_ERROR, debug=""
W0423 07:42:37.313638       1 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Deployment ended with: too old resource version: 2562 (4450)
E0423 07:46:48.442485       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=85, ErrCode=NO_ERROR, debug=""
E0423 07:58:33.095111       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=161, ErrCode=NO_ERROR, debug=""
W0423 07:58:33.123442       1 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Deployment ended with: too old resource version: 15665 (17029)

$ oc get clusteroperator
NAME                                 VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
authentication                       4.1.0-0.nightly-2019-04-22-005054   True        False         False     76m
cloud-credential                     4.1.0-0.nightly-2019-04-22-005054   True        False         False     94m
cluster-autoscaler                   4.1.0-0.nightly-2019-04-22-005054   True        False         False     94m
console                              4.1.0-0.nightly-2019-04-22-005054   True        False         False     78m
dns                                  4.1.0-0.nightly-2019-04-22-005054   True        False         False     94m
image-registry                       4.1.0-0.nightly-2019-04-22-005054   True        False         False     79m
ingress                              4.1.0-0.nightly-2019-04-22-005054   True        False         False     80m
kube-apiserver                       4.1.0-0.nightly-2019-04-22-005054   True        False                   89m
kube-controller-manager              4.1.0-0.nightly-2019-04-22-005054   True        False                   90m
kube-scheduler                       4.1.0-0.nightly-2019-04-22-005054   True        False                   90m
machine-api                          4.1.0-0.nightly-2019-04-22-005054   True        False         False     94m

Comment 7 errata-xmlrpc 2019-06-04 10:46:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758