Bug 1814397
Summary: | Node goes to degraded status when machine-config-daemon moves a file across filesystems | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Denys Shchedrivyi <dshchedr> | ||||
Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> | ||||
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.4 | CC: | alukiano, amurdaca, dshchedr, grajaiya, jack.ottofaro, kgarriso, wking | ||||
Target Milestone: | --- | Keywords: | Upgrades | ||||
Target Release: | 4.5.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1817455 (view as bug list) | Environment: | |||||
Last Closed: | 2020-08-04 18:05:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1817455 | ||||||
Attachments: |
|
Description
Denys Shchedrivyi
2020-03-17 18:57:27 UTC
I believe it can be related to the fact that we use `os.Rename`, at least under the documentation https://golang.org/pkg/os/#Rename I can see that: OS-specific restrictions may apply when oldpath and newpath are in different directories. IMHO we just should copy the file instead of renaming it. # oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-043819e79688d1a1e2cfbc7e6a69434b True False False 3 3 3 0 18h worker rendered-worker-4079fe9641ee0f48781eb95aea1a465b False True True 3 2 2 1 18h worker-cnf rendered-worker-cnf-0c77ed0969f3bf1bcf0bfbdbc2238208 True False False 2 2 2 0 83m Workaround to recover the node from `degraded` state is to access the node and `mv` those files. - mv /etc/machine-config-daemon/orig/usr/local/bin/pre-boot-tuning.sh.mcdorig /usr/local/bin/pre-boot-tuning.sh - mv /etc/machine-config-daemon/orig/usr/local/bin/hugepages-allocation.sh.mcdorig /usr/local/bin/hugepages-allocation.sh - mv /etc/machine-config-daemon/orig/usr/local/bin/reboot.sh.mcdorig /usr/local/bin/reboot.sh # oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-043819e79688d1a1e2cfbc7e6a69434b True False False 3 3 3 0 18h worker rendered-worker-4079fe9641ee0f48781eb95aea1a465b True False False 3 3 3 0 18h worker-cnf rendered-worker-cnf-0c77ed0969f3bf1bcf0bfbdbc2238208 True False False 2 2 2 0 103m The problem happens, when machine-config-daemon deleting stale data - https://github.com/openshift/machine-config-operator/blob/d5d9a488c1e0e19e1d3044bd0fac90096b0224d6/pkg/daemon/update.go#L796 Can you please provide a must-gather for the cluster? Thanks for the must gather @Denys Do you happen to still have a copy of rendered-worker-rt-fc83ce023c6f9cee1180535425bde439 that you can also share? Created attachment 1671615 [details]
rendered mc
unfortunately I don't have that cluster anymore, so I provided you must-gather from another cluster (I've reproduced this issue again). I think I know what mc you a looking for: # oc describe node worker-1 Name: worker-1 Annotations: machine.openshift.io/machine: openshift-machine-api/ostest-worker-0-4bnrr machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-86682d50b0c8c08d5e062f208911b4e5 machineconfiguration.openshift.io/desiredConfig: rendered-worker-18e975ccb4af87964fc7011163d607a1 machineconfiguration.openshift.io/state: Degraded I've attached rendered-worker-rt-86682d50b0c8c08d5e062f208911b4e5 to this bz Gotcha, thanks! and to clarify that rendered worker rt just has some additional kubelet changes via 98-worker-rt/ 99-worker-rt... machine configs? Ahh just realized the worker-rt pool also picks up the performance-ci MC I think. yes, it should be taken from MC performance-ci and also from KubeletConfig performance-ci. Here KubeletConfig just in case: # oc describe kubeletconfig performance-ci Name: performance-ci Namespace: Labels: <none> Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: KubeletConfig Metadata: Creation Timestamp: 2020-03-19T20:19:27Z Finalizers: 0ab8528e-f280-4015-b04e-e70cd8f1f8e0 b6d6bd30-5004-4de6-b7d9-f2183cf360bb Generation: 2 Owner References: API Version: performance.openshift.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: PerformanceProfile Name: ci UID: 62ef7006-684f-4b53-bef0-8adf4c4ed9ca Resource Version: 44761 Self Link: /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/performance-ci UID: c3ed9451-4015-4436-9125-36ddb1e3ab12 Spec: Kubelet Config: API Version: kubelet.config.k8s.io/v1beta1 Authentication: Anonymous: Webhook: Cache TTL: 0s x509: Authorization: Webhook: Cache Authorized TTL: 0s Cache Unauthorized TTL: 0s Cpu Manager Policy: static Cpu Manager Reconcile Period: 5s Eviction Pressure Transition Period: 0s File Check Frequency: 0s Http Check Frequency: 0s Image Minimum GC Age: 0s Kind: KubeletConfiguration Kube Reserved: Cpu: 1000m Memory: 500Mi Node Status Report Frequency: 0s Node Status Update Frequency: 0s Reserved System CP Us: 0 Runtime Request Timeout: 0s Streaming Connection Idle Timeout: 0s Sync Frequency: 0s System Reserved: Cpu: 1000m Memory: 500Mi Topology Manager Policy: best-effort Volume Stats Agg Period: 0s Machine Config Pool Selector: Match Labels: machineconfiguration.openshift.io/role: worker-rt Status: Conditions: Last Transition Time: 2020-03-19T20:19:28Z Message: Success Status: True Type: Success Last Transition Time: 2020-03-19T20:25:47Z Message: Success Status: True Type: Success Last Transition Time: 2020-03-19T20:30:15Z Message: Success Status: True Type: Success Last Transition Time: 2020-03-19T20:34:01Z Message: Success Status: True Type: Success Events: <none> We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. Who is impacted? Customers upgrading from 4.2.99 to 4.3.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet All customers upgrading from 4.2.z to 4.3.z fail approximately 10% of the time What is the impact? Up to 2 minute disruption in edge routing Up to 90seconds of API downtime etcd loses quorum and you have to restore from backup How involved is remediation? Issue resolves itself after five minutes Admin uses oc to fix things Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression? No, it’s always been like this we just never noticed Yes, from 4.2.z and 4.3.1 Please provide an update. Having been labeled an UpgradeBlocker means this bug is blocking at least one upgrade path. (In reply to Scott Dodson from comment #14) > We're asking the following questions to evaluate whether or not this bug > warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The > ultimate goal is to avoid delivering an update which introduces new risk or > reduces cluster functionality in any way. Sample answers are provided to > give more context and the UpgradeBlocker flag has been added to this bug. It > will be removed if the assessment indicates that this should not block > upgrade edges. > > Who is impacted? > Customers upgrading from 4.2.99 to 4.3.z running on GCP with thousands of > namespaces, approximately 5% of the subscribed fleet > All customers upgrading from 4.2.z to 4.3.z fail approximately 10% of the > time Customers upgrading from 4.2.x up to 4.4, customers upgrading from 4.3 are **NOT** impacted given nobody changed the MCO deployed etcd scripts (which shouldn't be the case ever) > What is the impact? > Up to 2 minute disruption in edge routing > Up to 90seconds of API downtime > etcd loses quorum and you have to restore from backup upgrade blocks and requires manual intervention to fix it and proceed (not trivial) > How involved is remediation? Requires manual intervention on nodes and tweaks to MCO deployed MCs (which isn't trivial _at all_) > Is this a regression? It's been like this since 4.2 so it's not a regression for 4.4 (in 4.1 we didn't have the functionality that is now breaking the things here) Last node: the patch for 4.4 and 4.5 that we have pending in the MCO repo will fix the issue directly in 4.4 and doesn't require to pull any upgrade edge. We will however go ahead and backport the fix up to 4.3 at least but again, it doesn't require pulling any edge from our assessment (as of today) > Customers upgrading from 4.2.x up to 4.4, customers upgrading from 4.3 are **NOT** impacted given nobody changed the MCO deployed etcd scripts (which shouldn't be the case ever)
It can't be all of those clusters, or we'd have turned this up earlier in CI, etc., right? But am I understanding right that born-in-4.4 clusters are assumed to be fine for all updates, born-in-4.3 clusters are fine for all updates, and born-in-4.2 clusters are fine updating to 4.3 but then hit this bug some percentage of the time (or 100% of the time for some subset of clusters?) if they subsequently update to an unpatched 4.4?
(In reply to W. Trevor King from comment #17) > > Customers upgrading from 4.2.x up to 4.4, customers upgrading from 4.3 are **NOT** impacted given nobody changed the MCO deployed etcd scripts (which shouldn't be the case ever) > > It can't be all of those clusters, or we'd have turned this up earlier in > CI, etc., right? But am I understanding right that born-in-4.4 clusters are > assumed to be fine for all updates, born-in-4.3 clusters are fine for all > updates, and born-in-4.2 clusters are fine updating to 4.3 but then hit this > bug some percentage of the time (or 100% of the time for some subset of > clusters?) if they subsequently update to an unpatched 4.4? - born in 4.4 clusters yes - 4.4 clusters upgrade fine since 4.4.x || 4.5 clusters don't delete files between upgrades (this bug is about pruning files within MCs) - born in 4.3 clusters yes - 4.3 to 4.4 (and later) delete the etcd files from the MCO but the bug isn't triggering here as it needs an additional MC(s) that modifies those files before triggering. It means it's 99% safe to upgrade 4.3 to 4.4 unless someone does a 4.3 to 4.3-modified-etcd-files to 4.4-drop-etcd files. In that case the bug will kick. - born in 4.2 upgrading to 4.3 yes - this isn't triggering the bug either as we just modify etcd files, never drop Now, going from 4.2 to 4.4 breaks because: - 4.2 ships a set of etcd files - 4.3 modifies some of those files - 4.4 delete those files the bug can be reproduced w/o an upgrade since it's a bug in how the MCD handles deletion and backups of already-on-disk-or-shipped-by-rpms files The scenario of the upgrade triggers the bug because: a) 4.3 modifies the etcd files causing the MCD to create a wrong backup of those files (like rpm.save) - the MCD shouldn't have done that and that's the bug b) when 4.4 kicks in and it wants to delete those files, the MCD will instead try to restore from the backup made above (which is exposing the bug with a cross device link error but that doesn't matter, it can be any) VERIFIED on 4.5.0-0.nightly-2020-04-02-131318 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-02-131318 True False 126m Cluster version is 4.5.0-0.nightly-2020-04-02-131318 $ oc get mc | grep rendered-worker rendered-worker-0cb4c0284b0a29d6982c0560ed8676af e49397e1285814307cee815b7d7a044814a5602d 2.2.0 69m ==== Note: Original rendered MC: rendered-worker-0cb4c0284b0a29d6982c0560ed8676af ==== $ cat << EOF file.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-1 spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:;base64,dGVzdCBzdHJpbmcK filesystem: root mode: 0644 path: /etc/test EOF $ oc apply -f file.yaml machineconfig.machineconfiguration.openshift.io/test-1 created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 00-worker e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 01-master-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 01-master-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 01-worker-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 01-worker-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 99-master-ssh 2.2.0 71m 99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m 99-worker-ssh 2.2.0 71m rendered-master-e75dffc9ae55631a1b34e428fd8f121b e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m rendered-worker-0cb4c0284b0a29d6982c0560ed8676af e49397e1285814307cee815b7d7a044814a5602d 2.2.0 70m rendered-worker-30362e12c6b76d9355d974200915c0dc e49397e1285814307cee815b7d7a044814a5602d 2.2.0 1s test-1 ==== Note first rendered MC: rendered-worker-30362e12c6b76d9355d974200915c0dc ==== $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-0cb4c0284b0a29d6982c0560ed8676af False True False 3 0 0 0 71m $ watch oc get nodes $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-30362e12c6b76d9355d974200915c0dc True False False 3 3 3 0 84m $ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string Removing debug pod ... Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string Removing debug pod ... Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string Removing debug pod ... $ cat << EOF > file2.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-2 spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:;base64,dGVzdCBzdHJpbmcgMgo= filesystem: root mode: 0644 path: /etc/test EOF $ oc apply -f file2.yaml machineconfig.machineconfiguration.openshift.io/test-2 created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 00-worker e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 01-master-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 01-master-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 01-worker-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 01-worker-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 99-master-ssh 2.2.0 94m 99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m 99-worker-ssh 2.2.0 94m rendered-master-e75dffc9ae55631a1b34e428fd8f121b e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m rendered-worker-0cb4c0284b0a29d6982c0560ed8676af e49397e1285814307cee815b7d7a044814a5602d 2.2.0 92m rendered-worker-30362e12c6b76d9355d974200915c0dc e49397e1285814307cee815b7d7a044814a5602d 2.2.0 22m rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e e49397e1285814307cee815b7d7a044814a5602d 2.2.0 7s test-1 2.2.0 22m test-2 2.2.0 12s ==== Note: Second rendered MC rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e === $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-30362e12c6b76d9355d974200915c0dc False True False 3 1 1 0 96m $ watch oc get nodes $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e True False False 3 3 3 0 106m $ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string 2 Removing debug pod ... Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string 2 Removing debug pod ... Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string 2 Removing debug pod ... $ oc delete mc/test-2 machineconfig.machineconfiguration.openshift.io "test-2" deleted $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 00-worker e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 01-master-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 01-master-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 01-worker-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 01-worker-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 99-master-ssh 2.2.0 112m 99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m 99-worker-ssh 2.2.0 112m rendered-master-e75dffc9ae55631a1b34e428fd8f121b e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m rendered-worker-0cb4c0284b0a29d6982c0560ed8676af e49397e1285814307cee815b7d7a044814a5602d 2.2.0 111m rendered-worker-30362e12c6b76d9355d974200915c0dc e49397e1285814307cee815b7d7a044814a5602d 2.2.0 40m rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e e49397e1285814307cee815b7d7a044814a5602d 2.2.0 18m test-1 2.2.0 40m $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e False True False 3 0 0 0 112m $ watch oc get node ==== Verify mcp/worker rendered config to First rendered MC ==== $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-30362e12c6b76d9355d974200915c0dc True False False 3 3 3 0 124m $ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string Removing debug pod ... Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string Removing debug pod ... Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` test string Removing debug pod ... $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 00-worker e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-master-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-master-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-worker-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-worker-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 99-master-ssh 2.2.0 126m 99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 99-worker-ssh 2.2.0 126m rendered-master-e75dffc9ae55631a1b34e428fd8f121b e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m rendered-worker-0cb4c0284b0a29d6982c0560ed8676af e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m rendered-worker-30362e12c6b76d9355d974200915c0dc e49397e1285814307cee815b7d7a044814a5602d 2.2.0 55m rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e e49397e1285814307cee815b7d7a044814a5602d 2.2.0 32m test-1 2.2.0 55m $ oc delete mc/test-1 machineconfig.machineconfiguration.openshift.io "test-1" deleted $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 00-worker e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-master-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-master-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-worker-container-runtime e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 01-worker-kubelet e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 99-master-593c34ec-b6bf-4bb0-8e44-5e445b96e0b9-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 99-master-ssh 2.2.0 126m 99-worker-d8ab4209-1e9c-476d-a906-4553719fb210-registries e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m 99-worker-ssh 2.2.0 126m rendered-master-e75dffc9ae55631a1b34e428fd8f121b e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m rendered-worker-0cb4c0284b0a29d6982c0560ed8676af e49397e1285814307cee815b7d7a044814a5602d 2.2.0 125m rendered-worker-30362e12c6b76d9355d974200915c0dc e49397e1285814307cee815b7d7a044814a5602d 2.2.0 55m rendered-worker-941801c22bc9dcaf3f3f206ad4e6e40e e49397e1285814307cee815b7d7a044814a5602d 2.2.0 32m $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-30362e12c6b76d9355d974200915c0dc False True False 3 0 0 0 126m $ watch oc get nodes ==== Verify mcp/worker rendered config to Original rendered MC ==== $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-0cb4c0284b0a29d6982c0560ed8676af True False False 3 3 3 0 136m $ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- cat /host/etc/test; done Starting pod/ip-10-0-134-197us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` cat: /host/etc/test: No such file or directory Removing debug pod ... Starting pod/ip-10-0-145-206us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` cat: /host/etc/test: No such file or directory Removing debug pod ... Starting pod/ip-10-0-167-255us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` cat: /host/etc/test: No such file or directory Removing debug pod ... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |