Created attachment 1741709 [details] installation log Thanks for opening a bug report! Before hitting the button, please fill in as much of the template below as you can. If you leave out information, it's harder to help you. Be ready for follow-up questions, and please respond in a timely manner. If we can't reproduce a bug we might close your issue. If we're wrong, PLEASE feel free to reopen it and explain why. Version: $ openshift-install version openshift-install 4.7.0-0.nightly-2020-12-21-131655 built from commit 6abb3b5a8b687ee38b6c96368c77a305f6f0b563 release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:e5373e096ae81a2372bad8309a28fcc2a9f04b36295ff5e82329e4b5fc6afa7b Platform: Vsphere Please specify: * UPI (semi-manual installation on customized infrastructure) What happened? during installation all nodes become into Ready state, but in the end i saw : # oc get machineconfigpool,nodes -o wide NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE machineconfigpool.machineconfiguration.openshift.io/master rendered-master-afecc5afb2668b5cc4f60f4b3fe96214 True False False 3 3 3 0 3h48m machineconfigpool.machineconfiguration.openshift.io/worker rendered-worker-c255d81fe61657c3062da9e2cfbaee99 False True True 3 0 0 1 3h48m NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/compute-0 Ready,SchedulingDisabled worker 3h43m v1.20.0+87544c5 10.1.160.171 10.1.160.171 Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 node/compute-1 Ready worker 3h43m v1.20.0+87544c5 10.1.160.173 10.1.160.173 Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 node/compute-2 Ready worker 3h42m v1.20.0+87544c5 10.1.160.148 10.1.160.148 Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 node/control-plane-0 Ready master 3h50m v1.20.0+87544c5 10.1.160.182 10.1.160.182 Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 node/control-plane-1 Ready master 3h50m v1.20.0+87544c5 10.1.160.176 10.1.160.176 Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 node/control-plane-2 Ready master 3h50m v1.20.0+87544c5 10.1.160.190 10.1.160.190 Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 # Always at least include the `.openshift_install.log` .openshift_install.log is attached. will also upload the must-gather data. What did you expect to happen? the installation will finished OK How to reproduce it (as minimally and precisely as possible)? $ your-commands-here Anything else we need to know? #Enter text here.
Created attachment 1741712 [details] must-gather data
This bug lacks any info as to what's been done to debug the situation so resetting severity until the engineering teams have been able to triage.
After applying MachineConfig ( worker-chrony-configuration ), compute-2 has DEGRADED flag TRUE $ oc get MachineConfigPool worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-25d1d136587a3c56ee93aa85b1d49eb8 False True True 3 0 0 1 7d > compute-2 is reporting "Unit file nodeip-configuration.service does not exist" $ oc get machineconfigpool worker -o yaml - lastTransitionTime: "2020-12-28T12:19:44Z" message: 'Node compute-2 is reporting: "error enabling unit: Failed to enable unit: Unit file nodeip-configuration.service does not exist.\n"' reason: 1 nodes are reporting degraded status on sync status: "True" type: NodeDegraded - lastTransitionTime: "2020-12-28T12:19:44Z" message: "" reason: "" status: "True" type: Degraded
nodeip-configuration.service deployment was modified here https://github.com/openshift/machine-config-operator/commit/5c2d529bf1abc9c7cbc01dcfc7814c3a59092676 Moving to MCO
*** Bug 1909642 has been marked as a duplicate of this bug. ***
*** Bug 1909570 has been marked as a duplicate of this bug. ***
verified on 4.7.0-0.nightly-2021-01-13-124141, installation on upi-on-vsphere is completed, and machineconfig update is successful. $ ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-01-13-124141 True False 68m Cluster version is 4.7.0-0.nightly-2021-01-13-124141 $ ./oc get machineconfig NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 00-worker 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 01-master-container-runtime 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 01-master-kubelet 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 01-worker-container-runtime 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 01-worker-kubelet 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 99-master-generated-registries 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 99-master-ssh 3.1.0 95m 99-worker-generated-registries 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m 99-worker-ssh 3.1.0 95m rendered-master-cbdfc843feae448daa9c23e9abfb02bb 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m rendered-master-eed4433615759c0700eb64332f014044 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 66m rendered-worker-9c06b1f59d48afdad6cfdd5e0d466eeb 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 91m rendered-worker-bf723a438aae52dd35bff4f2720019eb 69ac8b941b0f29d3cfdfced35aded406d75bc84a 3.2.0 66m $ ./oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-eed4433615759c0700eb64332f014044 True False False 3 3 3 0 92m worker rendered-worker-bf723a438aae52dd35bff4f2720019eb True False False 2 2 2 0 92m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633
Based on the blocker+ status for the child bug 1940585, we're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the ImpactStatementRequested label has been added to this bug. When responding, please remove ImpactStatementRequested and set the ImpactStatementProposed label. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? * example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet * example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? * example: Up to 2 minute disruption in edge routing * example: Up to 90 seconds of API downtime * example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? * example: Issue resolves itself after five minutes * example: Admin uses oc to fix things * example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? * example: No, it has always been like this we just never noticed * example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
Never got the impact statement requested in comment 19, but I don't think we blocked anything on this, and it's been a while, so that's unlikely to change going forward. If folks are still bumping into the issue and think edges need blocking, please restore the UpgradeBlocker keyword.