Bug 1861404
Summary: | MCO panic when upgrading AWS cluster to 4.6 from 4.1 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yang Yang <yanyang> |
Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | amurdaca, juzhao, miabbott, nstielau, wking, wzheng |
Target Milestone: | --- | Keywords: | Upgrades |
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:17:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yang Yang
2020-07-28 14:22:06 UTC
Must gather tarball is online. https://drive.google.com/file/d/1NhdY8_tz8aNgZCJUYxc5Kz_TJFrhprkB/view?usp=sharing 4.6.0-0.nightly-2020-07-25-091217 is old, and since then things like [1] have landed. Although from the attached must-gather, this isn't a vSphere cluster: $ tar -xOz must-gather.local.3184524102915563670/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-611f823bcbd1b627ba7c3b4f558094dd08b4625567a49d56e8687295ae454d26/cluster-scoped-resources/config.openshift.io/infrastructures/cluster.yaml <must-gather.tar.gz | grep -A5 platform platform: AWS platformStatus: aws: region: us-east-2 type: AWS So I'm not sure what's going on there. [1]: https://github.com/openshift/machine-config-operator/pull/1951 This particular panic was already fixed (line 79 in machineconfig.go) https://github.com/openshift/machine-config-operator/pull/1935/files @yangyang since that was an old nightly (that doesn't exist anymore) can you verify that you still see this in your 4.1->...-> 4.6 upgrades using a recent nightly? If not, I will dupe this to the old closed BZ. I still can see the bug with 4.6.0-0.nightly-2020-08-18-055142 which has some must-gather info here: https://bugzilla.redhat.com/show_bug.cgi?id=1866554#c12 @Wenjing that must gather doesn't have any MCO logs.. I still found this issue recently when upgrading from 4.1.0-0.nightly-2020-07-29-210856 -> 4.2.0-0.nightly-2020-08-06-223716 -> 4.3.0-0.nightly-2020-08-10-122110 -> 4.4.0-0.nightly-2020-08-10-180247 -> 4.5.0-0.nightly-2020-08-10-150345 -> 4.6.0-0.nightly-2020-08-10-150008 It's different from bz1858026. In this issue, mco upgrades to 4.5 successfully but upgrade to 4.6 failed. NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-08-10-150345 True True 119m Unable to apply 4.6.0-0.nightly-2020-08-10-150008: the cluster operator machine-config has not yet successfully rolled out https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/upgrade_CI/4281/console How often are you seeing this? Have you reproduced this in any of the recent nightlies available from https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/#4.6.0-0.nightly I'm seeing: 2020-08-11T03:08:25.484223874Z I0811 03:08:25.484186 1 kubelet_config_controller.go:318] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error cannot generate MachineConfigs when no platformStatus.type is set 2020-08-11T03:08:25.773143916Z I0811 03:08:25.773102 1 template_controller.go:366] Error syncing controllerconfig machine-config-controller: failed to create MachineConfig for role master: cannot generate MachineConfigs when no platformStatus.type is set 2020-08-11T03:08:25.806605111Z I0811 03:08:25.806563 1 container_runtime_config_controller.go:369] Error syncing image config openshift-config: could not Create/Update MachineConfig: could not generate origin ContainerRuntime Configs: generateMachineConfigsforRole failed with error cannot generate MachineConfigs when no platformStatus.type is set 2020-08-11T03:08:26.099569478Z I0811 03:08:26.099504 1 kubelet_config_controller.go:318] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error cannot generate MachineConfigs when no platformStatus.type is set But I'm also seeing: platformStatus: aws: region: us-east-2 type: AWS It's reproducible when upgrading from 4.1.38-x86_64 -> 4.2.36-x86_64 -> 4.3.33-x86_64 -> 4.4.17-x86_64 -> 4.5.6-x86_64 -> 4.6.0-0.nightly-2020-08-18-165040. https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/upgrade_CI/4403/console *** Bug 1870548 has been marked as a duplicate of this bug. *** This is caused by our ControllerConfig not syncing the infra object coming from 4.1 unfortunately ( Antonio, I'm trying to decode your last comment. Do we understand the problem? Do we know what a fix might look like? @Nick Yes to both: we understand the problem and understand the path a fix might take. Verified with 4.6.0-0.nightly-2020-09-19-004228 (6+ hrs to go from 4.1 -> 4.6!) ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.41 True False 41m Cluster version is 4.1.41 $ oc adm upgrade --allow-explicit-upgrade=true --allow-upgrade-with-warnings=true --force=true --to-image=quay.io/openshift-release-dev/ocp-release:4.2.36-x86_64 warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image quay.io/openshift-release-dev/ocp-release:4.2.36-x86_64 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.36 True False 6s Cluster version is 4.2.36 $ oc adm upgrade --allow-explicit-upgrade=true --allow-upgrade-with-warnings=true --force=true --to-image=quay.io/openshift-release-dev/ocp-release:4.3.35-x86_64 warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image quay.io/openshift-release-dev/ocp-release:4.3.35-x86_64 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.35 True False 116s Cluster version is 4.3.35 $ oc adm upgrade --allow-explicit-upgrade=true --allow-upgrade-with-warnings=true --force=true --to-image=quay.io/openshift-release-dev/ocp-release:4.4.23-x86_64 warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image quay.io/openshift-release-dev/ocp-release:4.4.23-x86_64 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.23 True False 25m Cluster version is 4.4.23 $ oc adm upgrade --allow-explicit-upgrade=true --allow-upgrade-with-warnings=true --force=true --to-image=quay.io/openshift-release-dev/ocp-release:4.5.10-x86_64 warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image quay.io/openshift-release-dev/ocp-release:4.5.10-x86_64 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.10 True False 17m Cluster version is 4.5.10 $ oc adm upgrade --allow-explicit-upgrade=true --allow-upgrade-with-warnings=true --force=true --to-image=registry.svc.ci.openshift.org/ocp/release@sha256:2f6222aecdfe27eae59131d1e698e06e48bb8bcefec336a39c4d2cd761a621ac warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.svc.ci.openshift.org/ocp/release@sha256:2f6222aecdfe27eae59131d1e698e06e48bb8bcefec336a39c4d2cd761a621ac $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-19-004228 True False 34s Cluster version is 4.6.0-0.nightly-2020-09-19-004228 $ oc describe co/machine-config Name: machine-config Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-09-19T15:50:31Z Generation: 1 Resource Version: 230034 Self Link: /apis/config.openshift.io/v1/clusteroperators/machine-config UID: d938ac7b-fa8f-11ea-97bf-022c5ecfda63 Spec: Status: Conditions: Last Transition Time: 2020-09-19T21:55:43Z Message: Cluster has deployed 4.6.0-0.nightly-2020-09-19-004228 Status: True Type: Available Last Transition Time: 2020-09-19T21:55:43Z Message: Cluster version is 4.6.0-0.nightly-2020-09-19-004228 Status: False Type: Progressing Last Transition Time: 2020-09-19T21:31:53Z Status: False Type: Degraded Last Transition Time: 2020-09-19T17:27:43Z Reason: AsExpected Status: True Type: Upgradeable Extension: Master: all 3 nodes are at latest configuration rendered-master-ea3d50e6005eb46bd31d38c7af0e1890 Worker: all 3 nodes are at latest configuration rendered-worker-89b4196319e54095cad4b003daa21e96 Related Objects: Group: Name: openshift-machine-config-operator Resource: namespaces Group: machineconfiguration.openshift.io Name: Resource: machineconfigpools Group: machineconfiguration.openshift.io Name: Resource: controllerconfigs Group: machineconfiguration.openshift.io Name: Resource: kubeletconfigs Group: machineconfiguration.openshift.io Name: Resource: containerruntimeconfigs Group: machineconfiguration.openshift.io Name: Resource: machineconfigs Group: Name: Resource: nodes Versions: Name: operator Version: 4.6.0-0.nightly-2020-09-19-004228 Events: <none> $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-129-165.us-west-2.compute.internal Ready master 6h9m v1.19.0+7f9e863 ip-10-0-135-149.us-west-2.compute.internal Ready worker 6h4m v1.19.0+7f9e863 ip-10-0-147-43.us-west-2.compute.internal Ready worker 6h3m v1.19.0+7f9e863 ip-10-0-154-82.us-west-2.compute.internal Ready master 6h10m v1.19.0+7f9e863 ip-10-0-171-208.us-west-2.compute.internal Ready master 6h10m v1.19.0+7f9e863 ip-10-0-173-169.us-west-2.compute.internal Ready worker 6h3m v1.19.0+7f9e863 ``` ``` $ oc -n openshift-machine-config-operator logs machine-config-operator-7b58dc6cff-fkvbd I0919 21:46:37.540011 1 start.go:43] Version: 4.6.0-0.nightly-2020-09-19-004228 (Raw: v4.6.0-202009181332.p0-dirty, Hash: c08c048584ef0bf18ab2dd88fdddd93279e1c6a1) I0919 21:46:37.543321 1 leaderelection.go:243] attempting to acquire leader lease openshift-machine-config-operator/machine-config... I0919 21:48:33.252997 1 leaderelection.go:253] successfully acquired lease openshift-machine-config-operator/machine-config W0919 21:48:33.366368 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:48:33.376676 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:48:33.406841 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition I0919 21:48:33.780030 1 operator.go:253] Starting MachineConfigOperator W0919 21:48:33.803280 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:54:40.888815 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition I0919 21:55:43.170205 1 event.go:282] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"d938ac7b-fa8f-11ea-97bf-022c5ecfda63", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorVersionChanged' clusteroperator/machine-config-operator version changed from [{operator 4.5.10}] to [{operator 4.6.0-0.nightly-2020-09-19-004228}] W0919 21:55:43.972427 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:56:05.332981 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:57:25.219700 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:57:43.308575 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:58:13.600420 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 21:58:20.218655 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 22:00:19.114058 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 22:00:20.570039 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 22:00:20.814491 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 22:00:25.672157 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0919 22:00:32.262745 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition ``` Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |