Bug 1897048 - [IPI Baremetal] Upgrade from 4.6.1 to 4.7 nightly is stuck on "the cluster operator machine-config has not yet successfully rolled out"
Summary: [IPI Baremetal] Upgrade from 4.6.1 to 4.7 nightly is stuck on "the cluster op...
Status: CLOSED DUPLICATE of bug 1901376
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Ben Nemec
QA Contact: Johnny Liu
Depends On:
TreeView+ depends on / blocked
Reported: 2020-11-12 07:50 UTC by Ori Michaeli
Modified: 2020-12-16 22:41 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-12-16 22:41:29 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Ori Michaeli 2020-11-12 07:50:23 UTC
Upgrade from 4.6.3 to 4.7.0-0.nightly-2020-11-11-080140

$ ./openshift-baremetal-install version
./openshift-baremetal-install 4.6.3
built from commit a4f0869e0d2a5b2d645f0f28ef9e4b100fa8f779
release image registry.svc.ci.openshift.org/ocp/release@sha256:14986d2b9c112ca955aaa03f7157beadda0bd3c089e5e1d56f28020d2dd55c52


IPI Baremetal

What happened?

Upgrade procedure stuck on "the cluster operator machine-config has not yet successfully rolled out"

What did you expect to happen?

Upgrade procedure to pass successfully.

How to reproduce it (as minimally and precisely as possible)?

1. Mirror release image to the disconnected registry.
2. Create ImageContentSourcePolicy.
3. Create ConfigMap for image signature.
4. Create custom upgrade graph.
5. Point CVO to custom upgrade graph.
6. Upgrade to 4.7 nightly.

Anything else we need to know?

I will attach must-gather

Comment 3 W. Trevor King 2020-11-17 19:27:18 UTC
From comment 1's must-gather:

$ yaml2json <cluster-scoped-resources/config.openshift.io/clusteroperators/machine-config.yaml | jq -r '.status.cond
itions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' | sort
2020-11-11T14:36:06Z Upgradeable=True AsExpected: 
2020-11-11T19:55:08Z Available=False : Cluster not available for 4.7.0-0.nightly-2020-11-11-080140
2020-11-11T19:57:12Z Progressing=True : Working towards 4.7.0-0.nightly-2020-11-11-080140
2020-11-11T20:03:31Z Degraded=True MachineConfigControllerFailed: Unable to apply 4.7.0-0.nightly-2020-11-11-080140: timed out waiting for the condition during waitForControllerConfigToBeCompleted: controllerconfig is not completed: ControllerConfig has not completed: completed(false) running(false) failing(true)

Re-assigning to the machine-config folks.

Comment 4 Alex Crawford 2020-12-03 23:32:05 UTC
From pods/machine-config-controller-744646d477-9r8l6/machine-config-controller/machine-config-controller/logs/current.log:

  2020-11-12T07:42:04.871659204Z I1112 07:42:04.871509       1 template_controller.go:366] Error syncing controllerconfig machine-config-controller: failed to create MachineConfig for role master: failed to execute template: template: /etc/mcc/templates/common/on-prem/files/NetworkManager-resolv-prepender.yaml:52:22: executing "/etc/mcc/templates/common/on-prem/files/NetworkManager-resolv-prepender.yaml" at <.DNS.Spec.BaseDomain>: nil pointer evaluating *v1.DNS.Spec

Comment 5 Kirsten Garrison 2020-12-03 23:53:16 UTC
There's been some churn to this template see: https://github.com/openshift/machine-config-operator/commits/f41b1d2ae7feea9aedfbd62143baefdf950c8569/templates/common/on-prem/files/NetworkManager-resolv-prepender.yaml

And another bug that I think this could be duped into: https://bugzilla.redhat.com/show_bug.cgi?id=1901376

I'll assign this one to the same author, Ben and let him dupe as he sees fit.

Comment 6 Ben Nemec 2020-12-16 22:41:29 UTC
Even though this bug was opened first, there's a bit more discussion over on the other one so let's track this there.

*** This bug has been marked as a duplicate of bug 1901376 ***

Note You need to log in before you can comment on or make changes to this bug.