Bug 1722894 - MCO is reporting a full text value for Reason - Reason must be a short constant
Summary: MCO is reporting a full text value for Reason - Reason must be a short constant
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Christian Glombek
QA Contact: Micah Abbott
URL:
Whiteboard:
Depends On: 1722887
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-21 15:51 UTC by Antonio Murdaca
Modified: 2019-10-16 06:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1722887
Environment:
Last Closed: 2019-10-16 06:32:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:32 UTC

Description Antonio Murdaca 2019-06-21 15:51:36 UTC
+++ This bug was initially created as a clone of Bug #1722887 +++

The machine-config cluster operator is reporting full text string values for reason, which is not what Reason is for.

For instance, a 4.1.0 cluster is reporting:

reason = "timed out waiting for the condition during waitForDeploymentRollout: Deployment machine-config-controller is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1)"

That value should be in "message" - reason must be a camel-case constant with low cardinality like "WaitForRollout" or "Timeout".

Using messages in this field can cause prometheus to report too many series, and the limit is also unbounded which could result in a failure to report metrics.

This is high severity because it could potentially bring down prometheus due to size limits, and is the wrong value.  Needs to be fixed in 4.1.3 or 4.1.4.

Comment 3 Michael Nguyen 2019-06-27 19:47:28 UTC
Verified on 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-06-27-041730   True        False         41m     Cluster version is 4.2.0-0.nightly-2019-06-27-041730


$ oc -n openshift-machine-config-operator get deployments/machine-config-operator -oyaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2019-06-27T18:46:58Z"
  generation: 1
  labels:
    k8s-app: machine-config-operator
  name: machine-config-operator
  namespace: openshift-machine-config-operator
  resourceVersion: "2259"
  selfLink: /apis/extensions/v1beta1/namespaces/openshift-machine-config-operator/deployments/machine-config-operator
  uid: f1636c85-990b-11e9-90af-025f5011fcca
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: machine-config-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: machine-config-operator
    spec:
      containers:
      - args:
        - start
        - --images-json=/etc/mco/images/images.json
        env:
        - name: RELEASE_VERSION
          value: 4.2.0-0.nightly-2019-06-27-041730
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e8cc93fc366cec2dc915b1578a6cc49a210a920a3c2d9ccce23669f1e3db6a4b
        imagePullPolicy: IfNotPresent
        name: machine-config-operator
        resources:
          requests:
            cpu: 20m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/ssl/kubernetes/ca.crt
          name: root-ca
        - mountPath: /etc/ssl/etcd/ca.crt
          name: etcd-ca
        - mountPath: /etc/mco/images
          name: images
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/master: ""
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 120
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 120
      volumes:
      - configMap:
          defaultMode: 420
          name: machine-config-operator-images
        name: images
      - hostPath:
          path: /etc/ssl/etcd/ca.crt
          type: ""
        name: etcd-ca
      - hostPath:
          path: /etc/kubernetes/ca.crt
          type: ""
        name: root-ca
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2019-06-27T18:48:33Z"
    lastUpdateTime: "2019-06-27T18:48:33Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2019-06-27T18:46:58Z"
    lastUpdateTime: "2019-06-27T18:48:33Z"
    message: ReplicaSet "machine-config-operator-66cf7c67d" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Comment 4 errata-xmlrpc 2019-10-16 06:32:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.