Bug 1864677 - AWS install with custom service endpoints fails with machine-config not completing.
Summary: AWS install with custom service endpoints fails with machine-config not compl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: aos-install
QA Contact: Yunfei Jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-03 19:43 UTC by Abhinav Dahiya
Modified: 2020-10-27 16:23 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:23:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3831 0 None closed Bug 1864677: bootkube.sh: update cluster-config-operator to generate bootstrap configmap 2020-11-17 07:08:20 UTC
Github openshift machine-config-operator pull 1973 0 None closed Bug 1864677: operator/bootstrap: use correct key for generated cloud conf 2020-11-17 07:08:20 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:23:27 UTC

Description Abhinav Dahiya 2020-08-03 19:43:05 UTC
Description of problem:
When using AWS install with service endpoints, machine-config operator fails to complete due to mimatch in the configuration of the control-plane nodes.

```
Unable to apply 4.6.0-0.nightly-2020-08-03-054919: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with "3 nodes are reporting degraded status on sync": "Node ip-10-0-222-164.ca-central-1.compute.internal is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-ddd5e0c6170a64a9525d862a998ad685\\\" not found\", Node ip-10-0-144-144.ca-central-1.compute.internal is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-ddd5e0c6170a64a9525d862a998ad685\\\" not found\", Node ip-10-0-189-142.ca-central-1.compute.internal is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-ddd5e0c6170a64a9525d862a998ad685\\\" not found\"
```

The reasoning seems to be because the MCO is serving the cloud provider config from the bootstrap based on the infrastructure spec but the in-cluster users the cloud provider config based on the generated Configmap in openshift-config-managed/kube-cloud-config .

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
````
$ yq m -CP -x aws-install-config.yaml elide-install-config.yaml      apiVersion: v1
baseDomain: devcluster.openshift.com
controlPlane:
  name: master
  replicas: 3
compute:
- name: worker
  replicas: 3
metadata:
  name: adahiya-2
platform:
  aws:
    region: ca-central-1
    serviceEndpoints:
    - name: sns
      url: https://localhost:4567
pullSecret: ""
sshKey: ""
```
```
➜  $ ./bin/openshift-install --dir dev create cluster
INFO Consuming Install Config from target directory
INFO Credentials loaded from the "default" profile in file "/home/adahiya/.aws/credentials"
WARNING Found override for release image. Please be warned, this is not advised
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s for the Kubernetes API at https://api.adahiya-2.devcluster.openshift.com:6443...
INFO API v4.6.0-202008011154.p0-dirty up
INFO Waiting up to 30m0s for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at https://api.adahiya-2.devcluster.openshift.com:6443 to initialize...
INFO Cluster operator insights Disabled is False with AsExpected:
INFO Cluster operator machine-config Progressing is True with : Working towards 4.6.0-0.nightly-2020-08-03-054919
ERROR Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Unable to apply 4.6.0-0.nightly-2020-08-03-054919: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with "3 nodes are reporting degraded status on sync": "Node ip-10-0-222-164.ca-central-1.compute.internal is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-ddd5e0c6170a64a9525d862a998ad685\\\" not found\", Node ip-10-0-144-144.ca-central-1.compute.internal is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-ddd5e0c6170a64a9525d862a998ad685\\\" not found\", Node ip-10-0-189-142.ca-central-1.compute.internal is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-ddd5e0c6170a64a9525d862a998ad685\\\" not found\"", retrying
INFO Cluster operator machine-config Available is False with : Cluster not available for 4.6.0-0.nightly-2020-08-03-054919
FATAL failed to initialize the cluster: Cluster operator machine-config is still updating
```

Expected results:

MCO should be using the generated config map on the bootstrap host for correctness.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Yunfei Jiang 2020-08-17 10:38:45 UTC
verified. PASS.
version: 4.6.0-0.nightly-2020-08-16-072105

NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-config                             4.6.0-0.nightly-2020-08-16-072105   True        False         False      60m

Comment 5 errata-xmlrpc 2020-10-27 16:23:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.