Bug 1809693 - MachineConfigDaemonReasonAnnotationKey gives node description beyond 262144 characters
Summary: MachineConfigDaemonReasonAnnotationKey gives node description beyond 262144 c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.z
Assignee: Erica von Buelow
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1792914
Blocks: 1797790
TreeView+ depends on / blocked
 
Reported: 2020-03-03 17:08 UTC by Antonio Murdaca
Modified: 2023-09-07 22:10 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1792914
Environment:
Last Closed: 2020-03-24 14:34:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1433 0 None closed Bug 1809693: [release-4.3] prevent hitting annotation max size limit on nodes 2020-12-20 12:46:59 UTC
Red Hat Product Errata RHBA-2020:0858 0 None None None 2020-03-24 14:34:52 UTC

Comment 5 Michael Nguyen 2020-03-13 18:52:10 UTC
Verified on 4.3.0-0.nightly-2020-03-13-103840. Annotation reads 'machineconfiguration.openshift.io/desiredConfig
      annotation not found on node 'ip-10-0-132-161.ec2.internal''

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-03-13-103840   True        False         42m     Cluster version is 4.3.0-0.nightly-2020-03-13-103840

$ cat file.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-file
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
        filesystem: root
        mode: 0644
        path: /etc/test
$ oc apply -f file.yaml 
machineconfig.machineconfiguration.openshift.io/test-file created
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-master                                                   ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
00-worker                                                   ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
01-master-container-runtime                                 ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
01-master-kubelet                                           ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
01-worker-container-runtime                                 ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
01-worker-kubelet                                           ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
99-master-c3d15c11-1301-42bd-ba19-17231a61fcd0-registries   ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
99-master-ssh                                                                                          2.2.0             24m
99-worker-0191fb05-5b8b-4f57-8f2b-5738248a28c4-registries   ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
99-worker-ssh                                                                                          2.2.0             24m
rendered-master-6167ac3c9e85125683832b5bbfc06fa5            ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
rendered-worker-e59577db473f1c0c325a1f955d5e9af1            ab4d62a3bf3774b77b6f9b04a2028faec1568aca   2.2.0             24m
test-file                                                                                              2.2.0             4s
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT
worker   rendered-worker-e59577db473f1c0c325a1f955d5e9af1   False     True       False      3              0                   0                     0
$ watch oc get node
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT
worker   rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b   True      False      False      3              3                   3                     0
$ oc get node
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-132-161.ec2.internal   Ready    worker   32m   v1.16.2
ip-10-0-132-86.ec2.internal    Ready    worker   33m   v1.16.2
ip-10-0-135-21.ec2.internal    Ready    master   41m   v1.16.2
ip-10-0-135-88.ec2.internal    Ready    master   41m   v1.16.2
ip-10-0-144-81.ec2.internal    Ready    master   41m   v1.16.2
ip-10-0-152-86.ec2.internal    Ready    worker   32m   v1.16.2
$ oc debug node/ip-10-0-132-161.ec2.internal
Starting pod/ip-10-0-132-161ec2internal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# echo degrade >> /etc/test
sh-4.4# cat /etc/test
server foo.example.net maxdelay 0.4 offline
server bar.example.net maxdelay 0.4 offline
server baz.example.net maxdelay 0.4 offline
degrade
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

$ oc -n openshift-machine-config-operator --field-selector spec.nodeName=ip-10-0-132-161.ec2.internal get pods
NAME                          READY   STATUS    RESTARTS   AGE
machine-config-daemon-rhszd   2/2     Running   3          33m



$ oc -n openshift-machine-config-operator  delete pods/machine-config-daemon-rhszd 
pod "machine-config-daemon-rhszd" deleted
$ oc -n openshift-machine-config-operator --field-selector spec.nodeName=ip-10-0-132-161.ec2.internal get pods
NAME                          READY   STATUS    RESTARTS   AGE
machine-config-daemon-lvczh   2/2     Running   0          14s
$ oc -n openshift-machine-config-operator  logs -f machine-config-daemon-lvczh -c machine-config-daemon
I0313 18:37:19.590145   19314 start.go:74] Version: v4.3.7-202003130552-dirty (ab4d62a3bf3774b77b6f9b04a2028faec1568aca)
I0313 18:37:19.595899   19314 start.go:84] Calling chroot("/rootfs")
I0313 18:37:19.596084   19314 rpm-ostree.go:366] Running captured: rpm-ostree status --json
I0313 18:37:19.698701   19314 daemon.go:209] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:de35f4db968ba15b9868e2c3851b445c202dd79fa141db92e225ed2bc53b599a (43.81.202003111633.0)
I0313 18:37:19.700473   19314 metrics.go:106] Registering Prometheus metrics
I0313 18:37:19.700618   19314 metrics.go:111] Starting metrics listener on 127.0.0.1:8797
I0313 18:37:19.701823   19314 update.go:1051] Starting to manage node: ip-10-0-132-161.ec2.internal
I0313 18:37:19.708107   19314 rpm-ostree.go:366] Running captured: rpm-ostree status
I0313 18:37:19.761681   19314 daemon.go:778] State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:de35f4db968ba15b9868e2c3851b445c202dd79fa141db92e225ed2bc53b599a
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.81.202003111633.0 (2020-03-11T16:38:41Z)

  ostree://23527ffc123c6e2bedf3479ff7e96f38d92cec88d5a7951fa56e9d0ec75ddd77
                   Version: 43.81.202001142154.0 (2020-01-14T21:59:51Z)
I0313 18:37:19.761710   19314 rpm-ostree.go:366] Running captured: journalctl --list-boots
I0313 18:37:19.768594   19314 daemon.go:785] journalctl --list-boots:
-2 0f7acaec8aa348b2b8370f47f8d141e9 Fri 2020-03-13 17:59:06 UTC—Fri 2020-03-13 18:00:38 UTC
-1 d685c4ad6a4d4a0ab744514c46ef3d5a Fri 2020-03-13 18:01:05 UTC—Fri 2020-03-13 18:24:53 UTC
 0 f00f81ed174c4592b3b2fbd454c149fe Fri 2020-03-13 18:25:23 UTC—Fri 2020-03-13 18:37:19 UTC
I0313 18:37:19.768614   19314 daemon.go:528] Starting MachineConfigDaemon
I0313 18:37:19.768713   19314 daemon.go:535] Enabling Kubelet Healthz Monitor
I0313 18:37:50.396965   19314 daemon.go:724] Current+desired config: rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
I0313 18:37:50.404043   19314 daemon.go:958] Validating against current config rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
E0313 18:37:50.404167   19314 daemon.go:1350] content mismatch for file /etc/test: server foo.example.net maxdelay 0.4 offline
server bar.example.net maxdelay 0.4 offline
server baz.example.net maxdelay 0.4 offline


A: degrade


B: 

E0313 18:37:50.404221   19314 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
I0313 18:37:52.420293   19314 daemon.go:724] Current+desired config: rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
I0313 18:37:52.425952   19314 daemon.go:958] Validating against current config rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
E0313 18:37:52.426039   19314 daemon.go:1350] content mismatch for file /etc/test: server foo.example.net maxdelay 0.4 offline
server bar.example.net maxdelay 0.4 offline
server baz.example.net maxdelay 0.4 offline


$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT
worker   rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b   False     True       True       3              2                   2                     1


$ oc -n openshift-machine-config-operator scale deployment --replicas=0 machine-config-controller
deployment.extensions/machine-config-controller scaled
$ oc -n openshift-machine-config-operator scale deployment --replicas=0 machine-config-operator
deployment.extensions/machine-config-operator scaled
$ oc get node/ip-10-0-132-161.ec2.internal -o yaml | head -15
apiVersion: v1
kind: Node
metadata:
  annotations:
    machine.openshift.io/machine: openshift-machine-api/ci-ln-6l4sstb-d5d6b-56n5z-worker-us-east-1b-grt4w
    machineconfiguration.openshift.io/currentConfig: rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
    machineconfiguration.openshift.io/desiredConfig: rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
    machineconfiguration.openshift.io/reason: unexpected on-disk state validating
      against rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
    machineconfiguration.openshift.io/state: Degraded
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2020-03-13T18:02:15Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: m4.xlarge
$ oc annotate node ip-10-0-132-161.ec2.internal machineconfiguration.openshift.io/desiredConfig-
node/ip-10-0-132-161.ec2.internal annotated
$ oc get node/ip-10-0-132-161.ec2.internal -o yaml | head -15
apiVersion: v1
kind: Node
metadata:
  annotations:
    machine.openshift.io/machine: openshift-machine-api/ci-ln-6l4sstb-d5d6b-56n5z-worker-us-east-1b-grt4w
    machineconfiguration.openshift.io/currentConfig: rendered-worker-acb3c8bd78bd162d9095e112d7cbd35b
    machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/desiredConfig
      annotation not found on node 'ip-10-0-132-161.ec2.internal'
    machineconfiguration.openshift.io/state: Degraded
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2020-03-13T18:02:15Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: m4.xlarge
    beta.kubernetes.io/os: linux

Comment 7 errata-xmlrpc 2020-03-24 14:34:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0858


Note You need to log in before you can comment on or make changes to this bug.