Bug 2064693 - [IPI][OSP] Openshift-install fails to find the shiftstack cloud defined in clouds.yaml in the current directory
Summary: [IPI][OSP] Openshift-install fails to find the shiftstack cloud defined in cl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.11.0
Assignee: Stephen Finucane
QA Contact: Itay Matza
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-16 11:05 UTC by Itay Matza
Modified: 2022-11-22 08:05 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:54:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5947 0 None open Bug 2064693: Restore ability to use local clouds.yaml 2022-05-30 13:16:55 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:54:54 UTC

Description Itay Matza 2022-03-16 11:05:31 UTC
Version:
OSP: RHOS-16.1-RHEL-8-20211126.n.1

OCP-installer: 
$ openshift-install version
openshift-install 4.11.0-0.nightly-2022-03-15-060211
built from commit d5016a08312a971130ab8ea9377da5d5bcc7ec88
release image registry.ci.openshift.org/ocp/release@sha256:6ad8c47e3aead32b8438aa7b32849ca72f38506031d4b1a1d6a828b38d706885
release architecture amd64


Platform: Openstack


Please specify: IPI


What happened?
Defined shiftstack cloud in clouds.yaml in the current directory:
```
$ grep -A 6 "shiftstack\"" install-config.yaml 
    cloud:            "shiftstack"
    externalNetwork:  "nova"
    region:           "regionOne"
    computeFlavor:    "m4.xlarge"
    lbFloatingIP:     "10.46.44.161"
    ingressFloatingIP:     "10.46.44.170"
    externalDNS:      ["10.46.0.31"]

$ grep -A 5 shiftstack clouds.yaml
#BEGIN shiftstack PARAMETERS
 shiftstack:
    auth:
        auth_url: https://10.46.44.140:13000
        password: not_the_real_pass
        project_domain_name: Default
        project_name: shiftstack
        project_id: d3d1c9d6f81a4cd9a51d8fba57a26206
        user_domain_name: Default
        username: shiftstack_user
    cacert: /etc/pki/ca-trust/source/anchors/undercloud-cacert.pem
    identity_api_version: '3'
    region_name: regionOne
#END shiftstack PARAMETERS
```

But OCP-installer fails to find the shiftstack cloud:
```
2022-03-16 09:13:32.547 | level=error msg=Error: cloud shiftstack does not exist in clouds.yaml
2022-03-16 09:13:32.550 | level=error
2022-03-16 09:13:32.552 | level=error msg=  with provider["openshift/local/openstack"],
2022-03-16 09:13:32.555 | level=error msg=  on main.tf line 5, in provider "openstack":
2022-03-16 09:13:32.557 | level=error msg=   5: provider "openstack" {
2022-03-16 09:13:32.560 | level=error
2022-03-16 09:13:32.563 | level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: exit status 1
2022-03-16 09:13:32.565 | level=fatal
2022-03-16 09:13:32.568 | level=fatal msg=Error: cloud shiftstack does not exist in clouds.yaml
2022-03-16 09:13:32.571 | level=fatal
2022-03-16 09:13:32.574 | level=fatal msg=  with provider["openshift/local/openstack"],
2022-03-16 09:13:32.577 | level=fatal msg=  on main.tf line 5, in provider "openstack":
2022-03-16 09:13:32.580 | level=fatal msg=   5: provider "openstack" {
```

What did you expect to happen?
The installer looks at the cloud.yaml finding the correct cloud name.
Used openshift_puddle 4.10.0-0.nightly-2022-03-15-095541 in the same environment, and the installation passed.


How to reproduce it (as minimally and precisely as possible)?
1. Create cloud.yaml and install-config.yaml files
3. Execute openshift-install create the cluster -
> $ openshift-install create cluster --log-level debug --dir ostest/

Additional info:
Tried to change the cloud name to openstck (on both install-config.yaml and clouds.yaml), but it still failed:
> ERROR Error: cloud openstack does not exist in clouds.yaml


Workaround:
Set the OS_CLIENT_CONFIG_FILE environment variable before executing the openshift-install create:
```
$ pwd
/home/stack
$ export OS_CLIENT_CONFIG_FILE=/home/stack/clouds.yaml
```
And the installer is progressing:

```
DEBUG OpenShift Installer 4.11.0-0.nightly-2022-03-15-060211
DEBUG Built from commit d5016a08312a971130ab8ea9377da5d5bcc7ec88
INFO Waiting up to 20m0s (until 10:58AM) for the Kubernetes API at https://api.ostest.shiftstack.com:6443...                                                                                                      
DEBUG Still waiting for the Kubernetes API: Get "https://api.ostest.shiftstack.com:6443/version": dial tcp 10.46.44.161:6443: i/o timeout                                                                         
INFO API v1.23.3+f017760 up                                                             
INFO Waiting up to 30m0s (until 11:12AM) for bootstrapping to complete...
```

Comment 5 Itay Matza 2022-03-24 11:58:16 UTC
A comment regarding the workaround - 
The workaround only works when using the full path:
`$ export OS_CLIENT_CONFIG_FILE=/home/stack/clouds.yaml`

Note that the workaround fails in the case of using a relative path.

Comment 6 Stephen Finucane 2022-04-20 12:03:35 UTC
Finally got around to testing this. I've included test notes below but tl;dr: things appear to be working as expected with recent builds. This was either a temporary failure that has since been addressed or, if it's still ongoing, an issue with your environment (perhaps you have set OS_CLIENT_CONFIG_FILE in your environment or something?). I'm going to close this as NOTABUG, but please reopen if this is still an issue and you think I've missed something obvious.

---

Created a sample `install-config.yaml` and a local `clouds.yaml` file:

    $ cat install-config.yaml
    ---
    apiVersion: v1
    baseDomain: foo.example
    metadata:
      name: bug-2064693
    controlPlane:
      name: master
      platform:
        openstack:
          type: m1.xlarge
      replicas: 2
    compute:
      - name: worker
        platform:
          openstack:
            type: m1.large
        replicas: 1
    platform:
      openstack:
        cloud: wow
        clusterOSImage: rhcos-4.11
        externalNetwork: external
        apiFloatingIP: 10.0.101.101
        ingressFloatingIP: 10.0.101.102
    networking:
      clusterNetwork:
        - cidr: 10.128.0.0/14
          hostSubnetLength: 9
      serviceNetwork:
        - 172.30.0.0/16
      machineNetwork:
        - cidr: 10.196.0.0/16
    pullSecret: '{"auths": {"cloud.openshift.com": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}, "quay.io": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}, "registry.connect.redhat.com": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}, "registry.redhat.io": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}}}'
    sshKey: ''

    $ cat clouds.yaml
    clouds:
      foobar:
        auth:
          auth_url: http://10.0.108.84/identity
          password: password
          project_domain_id: default
          project_name: demo
          user_domain_id: default
          username: demo
        identity_api_version: '3'
        region_name: RegionOne
        volume_api_version: '3'

Then ran with latest nightly build:

    $ ./openshift-install 4.11.0-0.ci-2022-04-20-054911
    built from commit b4bdb62f0ebf1825676e525d94edabcb1c0e9b3e
    release image registry.ci.openshift.org/ocp/release@sha256:e73124845d56f2fcfe367426f60fc5acc7fed174f3e9ee034de1c9178c70c2d4
    release architecture amd64

    $ ./openshift-install --log-level debug create cluster
    DEBUG OpenShift Installer 4.11.0-0.ci-2022-04-20-054911
    DEBUG Built from commit b4bdb62f0ebf1825676e525d94edabcb1c0e9b3e
    DEBUG Fetching Metadata...
    DEBUG Loading Metadata...
    DEBUG   Loading Cluster ID...
    DEBUG     Loading Install Config...
    DEBUG       Loading SSH Key...
    DEBUG       Loading Base Domain...
    DEBUG         Loading Platform...
    DEBUG       Loading Cluster Name...
    DEBUG         Loading Base Domain...
    DEBUG         Loading Platform...
    DEBUG       Loading Networking...
    DEBUG         Loading Platform...
    DEBUG       Loading Pull Secret...
    DEBUG       Loading Platform...
    ^C

So it hangs, but that makes sense since the cloud doesn't exist. I then updated
to reference to a non-existent cloud:

    $ sed -i 's/foobar/wow/' install-config.yaml

    $ ./openshift-install --log-level debug create cluster
    DEBUG OpenShift Installer 4.11.0-0.ci-2022-04-20-054911
    DEBUG Built from commit b4bdb62f0ebf1825676e525d94edabcb1c0e9b3e
    DEBUG Fetching Metadata...
    DEBUG Loading Metadata...
    DEBUG   Loading Cluster ID...
    DEBUG     Loading Install Config...
    DEBUG       Loading SSH Key...
    DEBUG       Loading Base Domain...
    DEBUG         Loading Platform...
    DEBUG       Loading Cluster Name...
    DEBUG         Loading Base Domain...
    DEBUG         Loading Platform...
    DEBUG       Loading Networking...
    DEBUG         Loading Platform...
    DEBUG       Loading Pull Secret...
    DEBUG       Loading Platform...
    ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: failed to create a network client: cloud wow does not exist in clouds.yaml

So it fails, but that's correct behavior. For what it's worth, this is also the behavior I see on the oldest CI build I can find:

    $ ./openshift-install version
    ./openshift-install 4.11.0-0.ci-2022-04-17-145742
    built from commit d907e1459681cc4a2f5ce141318d40260bd14500
    release image registry.ci.openshift.org/ocp/release@sha256:b4d85306605ba7a2ff82a8d2ee799cd4c932b79a41b001f2ee9b64da976c2584
    release architecture amd64

    $ ./openshift-install --log-level debug create cluster
    DEBUG OpenShift Installer 4.11.0-0.ci-2022-04-17-145742
    DEBUG Built from commit d907e1459681cc4a2f5ce141318d40260bd14500
    DEBUG Fetching Metadata...
    DEBUG Loading Metadata...
    DEBUG   Loading Cluster ID...
    DEBUG     Loading Install Config...
    DEBUG       Loading SSH Key...
    DEBUG       Loading Base Domain...
    DEBUG         Loading Platform...
    DEBUG       Loading Cluster Name...
    DEBUG         Loading Base Domain...
    DEBUG         Loading Platform...
    DEBUG       Loading Networking...
    DEBUG         Loading Platform...
    DEBUG       Loading Pull Secret...
    DEBUG       Loading Platform...
    ^C

Comment 7 Itay Matza 2022-04-25 08:41:46 UTC
Hey Stephen,

I tried again with OCP 4.11.0-0.nightly-2022-04-24-135651, and this bug is still valid - 
```
2022-04-25 08:24:55.133 | level=error msg=Error: cloud shiftstack does not exist in clouds.yaml
2022-04-25 08:24:55.135 | level=error
2022-04-25 08:24:55.138 | level=error msg=  with provider["openshift/local/openstack"],
2022-04-25 08:24:55.140 | level=error msg=  on main.tf line 5, in provider "openstack":
2022-04-25 08:24:55.143 | level=error msg=   5: provider "openstack" {
2022-04-25 08:24:55.145 | level=error
2022-04-25 08:24:55.148 | level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "masters" stage: failed to create cluster: failed to apply Terraform: exit status 1
2022-04-25 08:24:55.150 | level=error
2022-04-25 08:24:55.153 | level=error msg=Error: cloud shiftstack does not exist in clouds.yaml
2022-04-25 08:24:55.155 | level=error
2022-04-25 08:24:55.157 | level=error msg=  with provider["openshift/local/openstack"],
2022-04-25 08:24:55.159 | level=error msg=  on main.tf line 5, in provider "openstack":
2022-04-25 08:24:55.162 | level=error msg=   5: provider "openstack" {
```

IMHO, it is not an environmental issue, as this failure reproduced with the same CI job and environment we are using for other OCP versions.


Did you manage to use the clouds.yaml without using an environment variable?
According to the documentation, clouds.yaml in the current directory is supported when not using the OS_CLIENT_CONFIG_FILE environment variable - https://docs.openshift.com/container-platform/4.10/installing/installing_openstack/installing-openstack-installer-custom.html#installation-osp-describing-cloud-parameters_installing-openstack-installer-custom

Reopening the BZ.

Comment 9 Stephen Finucane 2022-04-26 11:32:09 UTC
(In reply to Itay Matza from comment #7)
> IMHO, it is not an environmental issue, as this failure reproduced with the
> same CI job and environment we are using for other OCP versions.

I attempted to reproduce this again using 4.11.0-0.nightly-2022-04-24-135651 and the same 'clouds.yaml' and 'install-config.yaml' from comment 6 (with cloudname "foobar"). I still cannot do so:

  $ ./openshift-install version
  ./openshift-install 4.11.0-0.nightly-2022-04-24-135651
  built from commit 9cf0c5a963bf983ccf997fed46e7bcde81a02569
  release image registry.ci.openshift.org/ocp/release@sha256:3cfd57e4c7cff0807b7811a3a885b336955e1f7b4c646b17975307c350830879
  release architecture amd64

  $ ls clouds.yaml 
  clouds.yaml

  $ ls /etc/openstack
  ls: cannot access '/etc/openstack': No such file or directory

  $ ls ~/.config/openstack
  ls: cannot access '/home/stephenfin/.config/openstack': No such file or directory

  $ env | grep OS_

  $ ./openshift-install --log-level debug create cluster
  DEBUG OpenShift Installer 4.11.0-0.nightly-2022-04-24-135651 
  DEBUG Built from commit 9cf0c5a963bf983ccf997fed46e7bcde81a02569 
  DEBUG Fetching Metadata...                         
  DEBUG Loading Metadata...                          
  DEBUG   Loading Cluster ID...                      
  DEBUG     Loading Install Config...                
  DEBUG       Loading SSH Key...                     
  DEBUG       Loading Base Domain...                 
  DEBUG         Loading Platform...                  
  DEBUG       Loading Cluster Name...                
  DEBUG         Loading Base Domain...               
  DEBUG         Loading Platform...                  
  DEBUG       Loading Networking...                  
  DEBUG         Loading Platform...                  
  DEBUG       Loading Pull Secret...                 
  DEBUG       Loading Platform...
  ... # hangs here

I then tried to create 'clouds.yaml' files in both '/etc/openstack' and '~/.config/openstack/clouds.yaml' *without* the cloud that 'install-config.yaml' referenced, in the thinking that perhaps the presence of these files was causing terraform to ignore '$PWD/clouds.yaml'. No luck:

  $ mkdir ~/.config/openstack -p
  $ cp clouds.yaml ~/.config/openstack/
  $ sed -i 's/foobar/wow/' ~/.config/openstack/clouds.yaml

  $ sudo mkdir -p /etc/openstack
  $ sudo cp ./clouds.yaml /etc/openstack/clouds.yaml
  $ sudo sed -i 's/foobar/wow/' /etc/openstack/clouds.yaml

  $ cat install-config.yaml | grep -B 2 cloud:
  platform:
    openstack:
      cloud: foobar
  $ cat clouds.yaml | grep -B 2 foobar
  clouds:                                                                                                                                                                                                            
      foobar:
  $ cat ~/.config/openstack/clouds.yaml | grep -B 2 foobar
  $ cat /etc/openstack/clouds.yaml | grep -B 2 foobar

  $ ./openshift-install --log-level debug create cluster
  DEBUG OpenShift Installer 4.11.0-0.nightly-2022-04-24-135651 
  DEBUG Built from commit 9cf0c5a963bf983ccf997fed46e7bcde81a02569 
  DEBUG Fetching Metadata...                         
  DEBUG Loading Metadata...                          
  DEBUG   Loading Cluster ID...                      
  DEBUG     Loading Install Config...                
  DEBUG       Loading SSH Key...                     
  DEBUG       Loading Base Domain...                 
  DEBUG         Loading Platform...                  
  DEBUG       Loading Cluster Name...                
  DEBUG         Loading Base Domain...               
  DEBUG         Loading Platform...                  
  DEBUG       Loading Networking...                  
  DEBUG         Loading Platform...                  
  DEBUG       Loading Pull Secret...                 
  DEBUG       Loading Platform...
  ... # hangs here

Am I missing something obvious?

> Did you manage to use the clouds.yaml without using an environment variable?

See above. There are no environment variables set. openshift-installer is able to find the local clouds.yaml without issue.

> According to the documentation, clouds.yaml in the current directory is
> supported when not using the OS_CLIENT_CONFIG_FILE environment variable -
> https://docs.openshift.com/container-platform/4.10/installing/
> installing_openstack/installing-openstack-installer-custom.html#installation-
> osp-describing-cloud-parameters_installing-openstack-installer-custom

And that's exactly the behavior I'm seeing locally. This has to be related to the environment in the CI system. My thinking is that the CI job is configuring 'OS_CLIENT_CONFIG_FILE' or another variable as part of its execution. Can you please inspect the configuration for this CI to see if there are any references to 'OS_CLIENT_CONFIG_FILE'. You might also wish insert a call to e.g. 'env | grep OS_' before the call to 'openshift-installer'. I can't think of anything else that would cause this behavior unless I am misunderstanding your issue.

Comment 10 Martin Simka 2022-04-26 11:48:47 UTC
Hi, I cannot verify now with more recent versions, because we applied a workaround to our CI (set OS_CLIENT_CONFIG_FILE), but we (EAP QE) saw this bug with 4.11.0-0.nightly-2022-04-12-072444 [1] exactly as reported here. `clouds.yaml` file in current directory seems to be ignored, the file doesn't exist in `~/.config/openstack/` or `/etc/openstack`. The only `OS_` variable we set is `OS_CLOUD` because we execute some OpenStack commands before and after openshift-installer. 

[1] https://openshift-release-artifacts.apps.ci.l2s4.p1.openshiftapps.com/4.11.0-0.nightly-2022-04-12-072444/openshift-install-linux-4.11.0-0.nightly-2022-04-12-072444.tar.gz

Comment 11 Stephen Finucane 2022-04-26 13:55:47 UTC
Itay and I hopped on a tmux session to discuss this. This is valid: you simply need to wait until later in the installation process for it to appear (initial validation passes, as noted above). I'll try to investigate what has changed here. I'm guessing terraform itself.

Comment 12 Stephen Finucane 2022-04-27 16:53:11 UTC
Looks like a regression due to how we manage our providers. I bisected this (with 4fc9fa88c, the branch point for 4.11, as a known good commit and master as a known bad commit) and and ended up with the following first bad commit:

  09cd3f503baf9a8ce5bbe7843f0fee9976e74ced is the first bad commit
  commit 09cd3f503baf9a8ce5bbe7843f0fee9976e74ced
  Author: staebler <staebler>
  Date:   Wed Dec 22 10:25:54 2021 -0500

      terraform: unpack providers from binary data
    
      Unpack the providers needed for completing a stage from the embedded
      data in the installer's binary. This replaces the previous method of
      creating symlinks to the installer binary, where the installer binary
      masqueraded as each of the terraform providers.

   data/unpack.go                        |   8 ++-
   hack/build.sh                         |  34 ++++++++++
   pkg/terraform/init.go                 |  89 ++++++++++++++++++++++++++
   pkg/terraform/providers/.gitignore    |   2 +
   pkg/terraform/providers/mirror/README |   4 ++
   pkg/terraform/providers/providers.go  |  89 +++++++++++++++++++++++---
   pkg/terraform/terraform.go            | 114 ++--------------------------------
   7 files changed, 220 insertions(+), 120 deletions(-)
   create mode 100644 pkg/terraform/init.go
   create mode 100644 pkg/terraform/providers/.gitignore
   create mode 100644 pkg/terraform/providers/mirror/README

I will work with the installer team to figure out what the implications of this are and why it's affecting us like this.

Comment 13 Stephen Finucane 2022-05-24 11:04:03 UTC
Discussed this with @Patrick Dillon a few weeks back. Quoting:

> Prior to 4.11 (and those commits you identified) the installer used Terraform as a library,
> so the `$PWD` for Terraform was the same as the `$PWD` for the Installer. In 4.11 (in order to
> upgrade Terraform) we are now embedding the Terraform binary in the installer and extracting it.
> So Terraform (and the providers) (can) have a different `$PWD` than `openshift-install`.
> I say can because the behavior has shifted. Originally we were extracting the terraform binary
> to `/tmp` now we're extracting it to the cluster install dir. This is still a bit of a WIP as
> we work through bugs.
> So ATM I would expect you would be able to reproduce the BZ you mentioned when you do
> `openshift-install create cluster --dir <install_dir>` but that it would work when you do
> `openshift-install create cluster` because in the latter case the terraform binary would be
> in the `$PWD` of the installer.
> If you want to preserve the behavior that the installer loads clouds.yaml from the $PWD of the
> installer, one fix could potentially be to pass the path in to your terraform configs.

I'm looking into passing this configuration to the installer.

Comment 17 Itay Matza 2022-06-27 07:18:06 UTC
Verified with OCP 4.11.0-0.nightly-2022-06-25-081133 on top of RHOS-16.1-RHEL-8-20220329.n.1:

- Verified with IPI and IPI-Proxy installation types.
- Verified with and without the "OS_CLIENT_CONFIG_FILE" environment variable.


Verification steps:
A. The OCP cluster installed successfully when using IPI/IPI-Proxy and the clouds.yaml in the current directory -

1. Create the cloud.yaml and install-config.yaml files in the current directory.
2. Execute openshift-install and create the cluster:
> $ openshift-install create cluster --log-level debug --dir ostest/
3. The installer looks at the cloud.yaml, finds the correct cloud name, and the OCP cluster is installed successfully.


B. The OCP cluster installed successfully when using the "OS_CLIENT_CONFIG_FILE" environment variable -

1. Create the cloud.yaml and install-config.yaml files in the current directory.
2. Set the OS_CLIENT_CONFIG_FILE environment variable using the full path:
> $ export OS_CLIENT_CONFIG_FILE=/home/stack/clouds.yaml
3. Execute openshift-install and create the cluster:
> $ openshift-install create cluster --log-level debug --dir ostest/
4. The installer looks at the cloud.yaml, finds the correct cloud name, and the OCP cluster is installed successfully.

Comment 20 errata-xmlrpc 2022-08-10 10:54:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 21 prasedenica 2022-11-22 08:05:47 UTC
I've finally found what I was looking for. I appreciate you sharing.https://ducklife.online


Note You need to log in before you can comment on or make changes to this bug.