1814397 – Node goes to degraded status when machine-config-daemon moves a file across filesystems

Bug 1814397 - Node goes to degraded status when machine-config-daemon moves a file across filesystems

Summary: Node goes to degraded status when machine-config-daemon moves a file across f...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Kirsten Garrison
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1817455
TreeView+	depends on / blocked

Reported:	2020-03-17 18:57 UTC by Denys Shchedrivyi
Modified:	2021-04-05 17:36 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1817455 (view as bug list)
Environment:
Last Closed:	2020-08-04 18:05:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
rendered mc (99.18 KB, text/plain) 2020-03-19 21:47 UTC, Denys Shchedrivyi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 1593	0	None	closed	Bug 1814397: fix wrongful backup of files not originally on the system	2021-02-05 17:41:13 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-08-04 18:06:01 UTC

Description Denys Shchedrivyi 2020-03-17 18:57:27 UTC

Description of problem:
 After deleting role label node goes to Degraded state with this reason:

"rename /etc/machine-config-daemon/orig/usr/local/bin/pre-boot-tuning.sh.mcdorig /usr/local/bin/pre-boot-tuning.sh: invalid cross-device link"



Steps to Reproduce:
 

1. Initially I had 2 MCP and one worker node with two labels: worker and worker-rt. MCP successfully applied necessary configuration to the node:

# oc get mcp
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker      rendered-worker-b9370ea93e553d0131f4a19efefd257c      True      False      False      2              2                   2                     0                      3d18h
worker-rt   rendered-worker-rt-82d9e746b12ca24147ef94f06f562f10   True      False      False      1              1                   1                     0                      3d17h

# oc describe node worker-1
Name:               worker-1
Roles:              worker,worker-rt
Annotations:        machine.openshift.io/machine: openshift-machine-api/ostest-worker-0-h9bj8
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-82d9e746b12ca24147ef94f06f562f10
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-rt-82d9e746b12ca24147ef94f06f562f10
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done



2. (important) Make some changes in MC and wait when MCP updated configuration:

# oc get mcp
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker      rendered-worker-b9370ea93e553d0131f4a19efefd257c      True      False      False      2              2                   2                     0                      3d18h
worker-rt   rendered-worker-rt-fc83ce023c6f9cee1180535425bde439   True      False      False      1              1                   1                     0                      3d17h


# oc describe node worker-1
Name:               worker-1
Roles:              worker,worker-rt
Annotations:        machine.openshift.io/machine: openshift-machine-api/ostest-worker-0-h9bj8
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-fc83ce023c6f9cee1180535425bde439
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-rt-fc83ce023c6f9cee1180535425bde439
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done



3. Delete "worker-rt" role: MCP "worker" started updating but node went to Degraded state with this messages:

# oc describe node worker-1
Name:               worker-1
Roles:              worker
Annotations:        machine.openshift.io/machine: openshift-machine-api/ostest-worker-0-h9bj8
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-fc83ce023c6f9cee1180535425bde439
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-b9370ea93e553d0131f4a19efefd257c
                    machineconfiguration.openshift.io/reason:
                      rename /etc/machine-config-daemon/orig/usr/local/bin/pre-boot-tuning.sh.mcdorig /usr/local/bin/pre-boot-tuning.sh: invalid cross-device link
                    machineconfiguration.openshift.io/state: Degraded

# oc get mcp
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   
worker      rendered-worker-b9370ea93e553d0131f4a19efefd257c      False     True       True       3              2                   2                     1                      
worker-rt   rendered-worker-rt-fc83ce023c6f9cee1180535425bde439   True      False      False      0              0                   0                     0                      


Actual results:
 Node goes to degraded state

Expected results:
 After deleting roles configuration should be successfully applied to the node

Comment 1 Artyom 2020-03-18 09:10:45 UTC

I believe it can be related to the fact that we use `os.Rename`, at least under the documentation https://golang.org/pkg/os/#Rename I can see that:
OS-specific restrictions may apply when oldpath and newpath are in different directories.

IMHO we just should copy the file instead of renaming it.

Comment 2 Gowrishankar Rajaiyan 2020-03-18 09:35:52 UTC

# oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-043819e79688d1a1e2cfbc7e6a69434b       True      False      False      3              3                   3                     0                      18h
worker       rendered-worker-4079fe9641ee0f48781eb95aea1a465b       False     True       True       3              2                   2                     1                      18h
worker-cnf   rendered-worker-cnf-0c77ed0969f3bf1bcf0bfbdbc2238208   True      False      False      2              2                   2                     0                      83m


Workaround to recover the node from `degraded` state is to access the node and `mv` those files.

- mv /etc/machine-config-daemon/orig/usr/local/bin/pre-boot-tuning.sh.mcdorig /usr/local/bin/pre-boot-tuning.sh
- mv /etc/machine-config-daemon/orig/usr/local/bin/hugepages-allocation.sh.mcdorig /usr/local/bin/hugepages-allocation.sh
- mv /etc/machine-config-daemon/orig/usr/local/bin/reboot.sh.mcdorig /usr/local/bin/reboot.sh


# oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-043819e79688d1a1e2cfbc7e6a69434b       True      False      False      3              3                   3                     0                      18h
worker       rendered-worker-4079fe9641ee0f48781eb95aea1a465b       True      False      False      3              3                   3                     0                      18h
worker-cnf   rendered-worker-cnf-0c77ed0969f3bf1bcf0bfbdbc2238208   True      False      False      2              2                   2                     0                      103m

Comment 3 Artyom 2020-03-18 10:33:55 UTC

The problem happens, when machine-config-daemon deleting stale data - https://github.com/openshift/machine-config-operator/blob/d5d9a488c1e0e19e1d3044bd0fac90096b0224d6/pkg/daemon/update.go#L796

Comment 4 Kirsten Garrison 2020-03-18 18:38:16 UTC

Can you please provide a must-gather for the cluster?

Comment 6 Kirsten Garrison 2020-03-19 21:33:41 UTC

Thanks for the must gather @Denys

Do you happen to still have a copy of rendered-worker-rt-fc83ce023c6f9cee1180535425bde439 that you can also share?

Comment 7 Denys Shchedrivyi 2020-03-19 21:47:51 UTC

Created attachment 1671615 [details]
rendered mc

Comment 8 Denys Shchedrivyi 2020-03-19 21:49:31 UTC

unfortunately I don't have that cluster anymore, so I provided you must-gather from another cluster (I've reproduced this issue again). 

I think I know what mc you a looking for:

# oc describe node worker-1
Name:               worker-1
Annotations:        machine.openshift.io/machine: openshift-machine-api/ostest-worker-0-4bnrr
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-86682d50b0c8c08d5e062f208911b4e5
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-18e975ccb4af87964fc7011163d607a1
                    machineconfiguration.openshift.io/state: Degraded


I've attached rendered-worker-rt-86682d50b0c8c08d5e062f208911b4e5 to this bz

Comment 9 Kirsten Garrison 2020-03-19 21:54:38 UTC

Gotcha, thanks! and to clarify that rendered worker rt just has some additional kubelet changes via 98-worker-rt/ 99-worker-rt... machine configs?

Comment 10 Kirsten Garrison 2020-03-19 22:08:53 UTC

Ahh just realized the worker-rt pool also picks up the performance-ci MC I think.

Comment 11 Denys Shchedrivyi 2020-03-19 22:17:19 UTC

yes, it should be taken from MC performance-ci and also from KubeletConfig performance-ci. Here KubeletConfig just in case:


# oc describe kubeletconfig performance-ci
Name:         performance-ci
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         KubeletConfig
Metadata:
  Creation Timestamp:  2020-03-19T20:19:27Z
  Finalizers:
    0ab8528e-f280-4015-b04e-e70cd8f1f8e0
    b6d6bd30-5004-4de6-b7d9-f2183cf360bb
  Generation:  2
  Owner References:
    API Version:           performance.openshift.io/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  PerformanceProfile
    Name:                  ci
    UID:                   62ef7006-684f-4b53-bef0-8adf4c4ed9ca
  Resource Version:        44761
  Self Link:               /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/performance-ci
  UID:                     c3ed9451-4015-4436-9125-36ddb1e3ab12
Spec:
  Kubelet Config:
    API Version:  kubelet.config.k8s.io/v1beta1
    Authentication:
      Anonymous:
      Webhook:
        Cache TTL:  0s
      x509:
    Authorization:
      Webhook:
        Cache Authorized TTL:             0s
        Cache Unauthorized TTL:           0s
    Cpu Manager Policy:                   static
    Cpu Manager Reconcile Period:         5s
    Eviction Pressure Transition Period:  0s
    File Check Frequency:                 0s
    Http Check Frequency:                 0s
    Image Minimum GC Age:                 0s
    Kind:                                 KubeletConfiguration
    Kube Reserved:
      Cpu:                              1000m
      Memory:                           500Mi
    Node Status Report Frequency:       0s
    Node Status Update Frequency:       0s
    Reserved System CP Us:              0
    Runtime Request Timeout:            0s
    Streaming Connection Idle Timeout:  0s
    Sync Frequency:                     0s
    System Reserved:
      Cpu:                    1000m
      Memory:                 500Mi
    Topology Manager Policy:  best-effort
    Volume Stats Agg Period:  0s
  Machine Config Pool Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker-rt
Status:
  Conditions:
    Last Transition Time:  2020-03-19T20:19:28Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2020-03-19T20:25:47Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2020-03-19T20:30:15Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2020-03-19T20:34:01Z
    Message:               Success
    Status:                True
    Type:                  Success
Events:                    <none>

Comment 14 Scott Dodson 2020-03-27 02:04:25 UTC

We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges.

Who is impacted?
  Customers upgrading from 4.2.99 to 4.3.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
  All customers upgrading from 4.2.z to 4.3.z fail approximately 10% of the time
What is the impact?
  Up to 2 minute disruption in edge routing
  Up to 90seconds of API downtime
  etcd loses quorum and you have to restore from backup
How involved is remediation?
  Issue resolves itself after five minutes
  Admin uses oc to fix things
  Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression?
  No, it’s always been like this we just never noticed
  Yes, from 4.2.z and 4.3.1

Comment 15 Jack Ottofaro 2020-03-30 21:29:56 UTC

Please provide an update. Having been labeled an UpgradeBlocker means this bug is blocking at least one upgrade path.

Comment 16 Antonio Murdaca 2020-03-31 06:54:04 UTC

(In reply to Scott Dodson from comment #14)
> We're asking the following questions to evaluate whether or not this bug
> warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The
> ultimate goal is to avoid delivering an update which introduces new risk or
> reduces cluster functionality in any way. Sample answers are provided to
> give more context and the UpgradeBlocker flag has been added to this bug. It
> will be removed if the assessment indicates that this should not block
> upgrade edges.
> 
> Who is impacted?
>   Customers upgrading from 4.2.99 to 4.3.z running on GCP with thousands of
> namespaces, approximately 5% of the subscribed fleet
>   All customers upgrading from 4.2.z to 4.3.z fail approximately 10% of the
> time

Customers upgrading from 4.2.x up to 4.4, customers upgrading from 4.3 are **NOT** impacted given nobody changed the MCO deployed etcd scripts (which shouldn't be the case ever)

> What is the impact?
>   Up to 2 minute disruption in edge routing
>   Up to 90seconds of API downtime
>   etcd loses quorum and you have to restore from backup

upgrade blocks and requires manual intervention to fix it and proceed (not trivial)

> How involved is remediation?

Requires manual intervention on nodes and tweaks to MCO deployed MCs (which isn't trivial _at all_)

> Is this a regression?

It's been like this since 4.2 so it's not a regression for 4.4 (in 4.1 we didn't have the functionality that is now breaking the things here)



Last node: the patch for 4.4 and 4.5 that we have pending in the MCO repo will fix the issue directly in 4.4 and doesn't require to pull any upgrade edge. We will however go ahead and backport the fix up to 4.3 at least but again, it doesn't require pulling any edge from our assessment (as of today)

Comment 17 W. Trevor King 2020-04-01 02:45:40 UTC

> Customers upgrading from 4.2.x up to 4.4, customers upgrading from 4.3 are **NOT** impacted given nobody changed the MCO deployed etcd scripts (which shouldn't be the case ever)

It can't be all of those clusters, or we'd have turned this up earlier in CI, etc., right?  But am I understanding right that born-in-4.4 clusters are assumed to be fine for all updates, born-in-4.3 clusters are fine for all updates, and born-in-4.2 clusters are fine updating to 4.3 but then hit this bug some percentage of the time (or 100% of the time for some subset of clusters?) if they subsequently update to an unpatched 4.4?

Comment 18 Antonio Murdaca 2020-04-01 08:19:13 UTC

(In reply to W. Trevor King from comment #17)
> > Customers upgrading from 4.2.x up to 4.4, customers upgrading from 4.3 are **NOT** impacted given nobody changed the MCO deployed etcd scripts (which shouldn't be the case ever)
> 
> It can't be all of those clusters, or we'd have turned this up earlier in
> CI, etc., right?  But am I understanding right that born-in-4.4 clusters are
> assumed to be fine for all updates, born-in-4.3 clusters are fine for all
> updates, and born-in-4.2 clusters are fine updating to 4.3 but then hit this
> bug some percentage of the time (or 100% of the time for some subset of
> clusters?) if they subsequently update to an unpatched 4.4?

- born in 4.4 clusters yes - 4.4 clusters upgrade fine since 4.4.x || 4.5 clusters don't delete files between upgrades (this bug is about pruning files within MCs)
- born in 4.3 clusters yes - 4.3 to 4.4 (and later) delete the etcd files from the MCO but the bug isn't triggering here as it needs an additional MC(s) that modifies those files before triggering. It means it's 99% safe to upgrade 4.3 to 4.4 unless someone does a 4.3 to 4.3-modified-etcd-files to 4.4-drop-etcd files. In that case the bug will kick.
- born in 4.2 upgrading to 4.3 yes - this isn't triggering the bug either as we just modify etcd files, never drop

Now, going from 4.2 to 4.4 breaks because:

- 4.2 ships a set of etcd files
- 4.3 modifies some of those files
- 4.4 delete those files

the bug can be reproduced w/o an upgrade since it's a bug in how the MCD handles deletion and backups of already-on-disk-or-shipped-by-rpms files

The scenario of the upgrade triggers the bug because:

a) 4.3 modifies the etcd files causing the MCD to create a wrong backup of those files (like rpm.save) - the MCD shouldn't have done that and that's the bug
b) when 4.4 kicks in and it wants to delete those files, the MCD will instead try to restore from the backup made above (which is exposing the bug with a cross device link error but that doesn't matter, it can be any)

Comment 21 Michael Nguyen 2020-04-02 20:54:23 UTC

VERIFIED on 4.5.0-0.nightly-2020-04-02-131318

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-04-02-131318   True        False         126m    Cluster version is 4.5.0-0.nightly-2020-04-02-131318

$ oc get mc | grep rendered-worker
rendered-worker-0cb4c0284b0a29d6982c0560ed8676af            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             69m

==== Note: Original rendered MC: rendered-worker-0cb4c0284b0a29d6982c0560ed8676af ====

$ cat << EOF file.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-1
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:;base64,dGVzdCBzdHJpbmcK
        filesystem: root
        mode: 0644
        path: /etc/test
EOF
        
$ oc apply -f file.yaml 
machineconfig.machineconfiguration.openshift.io/test-1 created
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
00-worker                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
01-master-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
01-master-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
01-worker-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
01-worker-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
99-master-ssh                                                                                          2.2.0             71m
99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
99-worker-ssh                                                                                          2.2.0             71m
rendered-master-e75dffc9ae55631a1b34e428fd8f121b            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
rendered-worker-0cb4c0284b0a29d6982c0560ed8676af            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             70m
rendered-worker-30362e12c6b76d9355d974200915c0dc            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             1s
test-1           

==== Note first rendered MC: rendered-worker-30362e12c6b76d9355d974200915c0dc ====
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-0cb4c0284b0a29d6982c0560ed8676af   False     True       False      3              0                   0                     0                      71m
$ watch oc get nodes
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-30362e12c6b76d9355d974200915c0dc   True      False      False      3              3                   3                     0                      84m

$ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done
Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string

Removing debug pod ...
Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string

Removing debug pod ...
Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string

Removing debug pod ...

$ cat << EOF > file2.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-2
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:;base64,dGVzdCBzdHJpbmcgMgo=
        filesystem: root
        mode: 0644
        path: /etc/test
EOF

$ oc apply -f file2.yaml 
machineconfig.machineconfiguration.openshift.io/test-2 created

$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
00-worker                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
01-master-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
01-master-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
01-worker-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
01-worker-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
99-master-ssh                                                                                          2.2.0             94m
99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
99-worker-ssh                                                                                          2.2.0             94m
rendered-master-e75dffc9ae55631a1b34e428fd8f121b            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
rendered-worker-0cb4c0284b0a29d6982c0560ed8676af            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             92m
rendered-worker-30362e12c6b76d9355d974200915c0dc            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             22m
rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             7s
test-1                                                                                                 2.2.0             22m
test-2                                                                                                 2.2.0             12s

==== Note: Second rendered MC rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e ===
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-30362e12c6b76d9355d974200915c0dc   False     True       False      3              1                   1                     0                      96m
$ watch oc get nodes
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e   True      False      False      3              3                   3                     0                      106m
$ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done
Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string 2

Removing debug pod ...
Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string 2

Removing debug pod ...
Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string 2

Removing debug pod ...

$ oc delete mc/test-2
machineconfig.machineconfiguration.openshift.io "test-2" deleted
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
00-worker                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
01-master-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
01-master-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
01-worker-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
01-worker-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
99-master-ssh                                                                                          2.2.0             112m
99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
99-worker-ssh                                                                                          2.2.0             112m
rendered-master-e75dffc9ae55631a1b34e428fd8f121b            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
rendered-worker-0cb4c0284b0a29d6982c0560ed8676af            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             111m
rendered-worker-30362e12c6b76d9355d974200915c0dc            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             40m
rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             18m
test-1                                                                                                 2.2.0             40m

$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e   False     True       False      3              0                   0                     0                      112m
$ watch oc get node

==== Verify mcp/worker rendered config to First rendered MC ====
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-30362e12c6b76d9355d974200915c0dc   True      False      False      3              3                   3                     0                      124m


$ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done
Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string

Removing debug pod ...
Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string

Removing debug pod ...
Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
test string

Removing debug pod ...

$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
00-worker                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-master-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-master-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-worker-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-worker-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
99-master-ssh                                                                                          2.2.0             126m
99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
99-worker-ssh                                                                                          2.2.0             126m
rendered-master-e75dffc9ae55631a1b34e428fd8f121b            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
rendered-worker-0cb4c0284b0a29d6982c0560ed8676af            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
rendered-worker-30362e12c6b76d9355d974200915c0dc            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             55m
rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             32m
test-1                                                                                                 2.2.0             55m
$ oc delete mc/test-1
machineconfig.machineconfiguration.openshift.io "test-1" deleted
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
00-worker                                                   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-master-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-master-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-worker-container-runtime                                 e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
01-worker-kubelet                                           e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
99-master-ssh                                                                                          2.2.0             126m
99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries   e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
99-worker-ssh                                                                                          2.2.0             126m
rendered-master-e75dffc9ae55631a1b34e428fd8f121b            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
rendered-worker-0cb4c0284b0a29d6982c0560ed8676af            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             125m
rendered-worker-30362e12c6b76d9355d974200915c0dc            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             55m
rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e            e49397e1285814307cee815b7d7a044814a5602d   2.2.0             32m


$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-30362e12c6b76d9355d974200915c0dc   False     True       False      3              0                   0                     0                      126m
$ watch oc get nodes

==== Verify mcp/worker rendered config to Original rendered MC ====
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-0cb4c0284b0a29d6982c0560ed8676af   True      False      False      3              3                   3                     0                      136m

$ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done
Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
cat: /host/etc/test: No such file or directory

Removing debug pod ...
Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
cat: /host/etc/test: No such file or directory

Removing debug pod ...
Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
cat: /host/etc/test: No such file or directory

Removing debug pod ...

Comment 23 errata-xmlrpc 2020-08-04 18:05:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 24 W. Trevor King 2021-04-05 17:36:39 UTC

Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475

Note You need to log in before you can comment on or make changes to this bug.