Bug 1734343 - [UPI] [Baremetal] Install fails on machine-config operator
Summary: [UPI] [Baremetal] Install fails on machine-config operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.2.0
Assignee: Steve Milner
QA Contact: David Sanz
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-30 09:56 UTC by David Sanz
Modified: 2019-10-16 06:34 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:34:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:34:20 UTC

Description David Sanz 2019-07-30 09:56:27 UTC
Description of problem:

Machine config operator is not finishing its initialization making install fails:

[root@aux-server ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          23m     Unable to apply 4.2.0-0.nightly-2019-07-30-073644: the cluster operator machine-config is degraded
[root@aux-server ~]# oc describe co machine-config
Name:         machine-config
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-07-30T09:29:50Z
  Generation:          1
  Resource Version:    16727
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/machine-config
  UID:                 944946e2-b2ac-11e9-a0a6-0cc47ab5867c
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-07-30T09:29:50Z
    Message:               Cluster not available for 4.2.0-0.nightly-2019-07-30-073644
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-07-30T09:29:50Z
    Message:               Cluster is bootstrapping 4.2.0-0.nightly-2019-07-30-073644
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-07-30T09:40:42Z
    Message:               Failed to resync 4.2.0-0.nightly-2019-07-30-073644 because: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with "3 nodes are reporting degraded status on sync": "Node master-01.morenod-0730.qe.devcluster.openshift.com is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-182b0a5759e21f6df78e626bffaf8d9a\\\" not found\", Node master-00.morenod-0730.qe.devcluster.openshift.com is reporting: \"parsing booted osImageURL: parsing reference: \\\"<not pivoted>\\\": invalid reference format\", Node master-02.morenod-0730.qe.devcluster.openshift.com is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-182b0a5759e21f6df78e626bffaf8d9a\\\" not found\"", retrying
    Reason:                FailedToSync
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2019-07-30T09:40:42Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:
    Last Sync Error:  pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with "3 nodes are reporting degraded status on sync": "Node master-01.morenod-0730.qe.devcluster.openshift.com is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-182b0a5759e21f6df78e626bffaf8d9a\\\" not found\", Node master-00.morenod-0730.qe.devcluster.openshift.com is reporting: \"parsing booted osImageURL: parsing reference: \\\"<not pivoted>\\\": invalid reference format\", Node master-02.morenod-0730.qe.devcluster.openshift.com is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-182b0a5759e21f6df78e626bffaf8d9a\\\" not found\"", retrying
  Related Objects:
    Group:     
    Name:      openshift-machine-config-operator
    Resource:  namespaces
    Group:     machineconfiguration.openshift.io
    Name:      master
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      worker
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      cluster
    Resource:  controllerconfigs
  Versions:
    Name:     operator
    Version:  4.2.0-0.nightly-2019-07-30-073644
Events:       <none>



Version-Release number of the following components:
Installer: 4.2.0-0.nightly-2019-07-30-073644

How reproducible:

Steps to Reproduce:
1. Trigger installation on baremetal environemnt
2.Wait until cluster installation is completed
3.Check clusteroperator status

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Colin Walters 2019-07-30 16:37:59 UTC
How did you do this baremetal install?  PXE?  Where did you download the bootimage?

Comment 2 Colin Walters 2019-07-30 18:16:33 UTC
Or to say this more directly, you should be using the bootimage pinned by the installer, see also https://github.com/openshift/installer/issues/1399

And https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2

Comment 5 Antonio Murdaca 2019-08-05 09:02:49 UTC
Looks like CustomOrigin in rpm-ostree output is empty :/ 

	if len(bootedDeployment.CustomOrigin) > 0 {

The check above default to <not pivoted>

Comment 7 Antonio Murdaca 2019-08-06 10:18:54 UTC
Can you provide MCD logs or must-gather please

Comment 16 David Sanz 2019-08-12 09:23:05 UTC
Verified on 4.2.0-0.nightly-2019-08-10-002649 + 42.80.20190809.0

Comment 18 errata-xmlrpc 2019-10-16 06:34:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.