Bug 1709365

Summary: Installs failing on latest 4.1 nightly (OCP) builds: controller version mismatch for rendered-master
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: ReleaseAssignee: Tim Bielawa <tbielawa>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Fiedler <mifiedle>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, jforrest, jokerman, lmeyer, miabbott, mmccomas, sjenning, smunilla, sponnaga, vlaad
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-13 16:39:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Fiedler 2019-05-13 13:07:23 UTC
Description of problem:

Bug 1707928 is back.   Recent builds cannot be installed:

Installing from release registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-12-052536
level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised"
level=info msg="Consuming \"Install Config\" from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-6mgfbn99-0e31a.origin-ci-int-aws.dev.rhcloud.com:6443..."
level=info msg="API v1.13.4+4550bfb up"
level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=info msg="Destroying the bootstrap resources..."
level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-6mgfbn99-0e31a.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=fatal msg="failed to initialize the cluster: Cluster operator machine-config is reporting a failure: Failed to resync 4.1.0-0.nightly-2019-05-12-052536 because: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-a4054f8d7960c6d88ddefe0f9076b34f expected 9fe5968317159f7f7b74fc4fcfdd76225fafee84 has 3f44c0bb795c6005deeab3fb681f488e6bfbbf10, retrying: timed out waiting for the condition"


Version-Release number of selected component (if applicable): 

4.1.0-0.nightly-2019-05-10-223550 and newer


How reproducible: Always

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.1/198

Comment 2 Luke Meyer 2019-05-14 21:10:29 UTC
https://github.com/openshift/machine-config-operator/pull/743 was to address this issue. This was not really a duplicate. Related things were built from the same commits; but then those were copied into different dist-gits, and the build code was looking at the local git repo and not the source repo. It was solved by having the build code look at env vars indicating the real source commit.