Bug 1722894

Summary:	MCO is reporting a full text value for Reason - Reason must be a short constant
Product:	OpenShift Container Platform	Reporter:	Antonio Murdaca <amurdaca>
Component:	Machine Config Operator	Assignee:	Christian Glombek <cglombek>
Status:	CLOSED ERRATA	QA Contact:	Micah Abbott <miabbott>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.2.0	CC:	ccoleman, cglombek, kgarriso, miabbott, mnguyen
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1722887	Environment:
Last Closed:	2019-10-16 06:32:20 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1722887
Bug Blocks:

Description Antonio Murdaca 2019-06-21 15:51:36 UTC

+++ This bug was initially created as a clone of Bug #1722887 +++

The machine-config cluster operator is reporting full text string values for reason, which is not what Reason is for.

For instance, a 4.1.0 cluster is reporting:

reason = "timed out waiting for the condition during waitForDeploymentRollout: Deployment machine-config-controller is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1)"

That value should be in "message" - reason must be a camel-case constant with low cardinality like "WaitForRollout" or "Timeout".

Using messages in this field can cause prometheus to report too many series, and the limit is also unbounded which could result in a failure to report metrics.

This is high severity because it could potentially bring down prometheus due to size limits, and is the wrong value.  Needs to be fixed in 4.1.3 or 4.1.4.

Comment 1 Antonio Murdaca 2019-06-24 10:27:28 UTC

https://github.com/openshift/machine-config-operator/pull/876

Comment 3 Michael Nguyen 2019-06-27 19:47:28 UTC

Verified on 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-06-27-041730   True        False         41m     Cluster version is 4.2.0-0.nightly-2019-06-27-041730


$ oc -n openshift-machine-config-operator get deployments/machine-config-operator -oyaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2019-06-27T18:46:58Z"
  generation: 1
  labels:
    k8s-app: machine-config-operator
  name: machine-config-operator
  namespace: openshift-machine-config-operator
  resourceVersion: "2259"
  selfLink: /apis/extensions/v1beta1/namespaces/openshift-machine-config-operator/deployments/machine-config-operator
  uid: f1636c85-990b-11e9-90af-025f5011fcca
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: machine-config-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: machine-config-operator
    spec:
      containers:
      - args:
        - start
        - --images-json=/etc/mco/images/images.json
        env:
        - name: RELEASE_VERSION
          value: 4.2.0-0.nightly-2019-06-27-041730
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e8cc93fc366cec2dc915b1578a6cc49a210a920a3c2d9ccce23669f1e3db6a4b
        imagePullPolicy: IfNotPresent
        name: machine-config-operator
        resources:
          requests:
            cpu: 20m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/ssl/kubernetes/ca.crt
          name: root-ca
        - mountPath: /etc/ssl/etcd/ca.crt
          name: etcd-ca
        - mountPath: /etc/mco/images
          name: images
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/master: ""
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 120
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 120
      volumes:
      - configMap:
          defaultMode: 420
          name: machine-config-operator-images
        name: images
      - hostPath:
          path: /etc/ssl/etcd/ca.crt
          type: ""
        name: etcd-ca
      - hostPath:
          path: /etc/kubernetes/ca.crt
          type: ""
        name: root-ca
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2019-06-27T18:48:33Z"
    lastUpdateTime: "2019-06-27T18:48:33Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2019-06-27T18:46:58Z"
    lastUpdateTime: "2019-06-27T18:48:33Z"
    message: ReplicaSet "machine-config-operator-66cf7c67d" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Comment 4 errata-xmlrpc 2019-10-16 06:32:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922