Bug 2064693
Summary: | [IPI][OSP] Openshift-install fails to find the shiftstack cloud defined in clouds.yaml in the current directory | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Itay Matza <imatza> |
Component: | Installer | Assignee: | Stephen Finucane <stephenfin> |
Installer sub component: | OpenShift on OpenStack | QA Contact: | Itay Matza <imatza> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | aos-bugs, jstourac, msimka, pprinett, prasedenica89, stephenfin |
Version: | 4.11 | Keywords: | Reopened, Triaged |
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:54:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Itay Matza
2022-03-16 11:05:31 UTC
A comment regarding the workaround - The workaround only works when using the full path: `$ export OS_CLIENT_CONFIG_FILE=/home/stack/clouds.yaml` Note that the workaround fails in the case of using a relative path. Finally got around to testing this. I've included test notes below but tl;dr: things appear to be working as expected with recent builds. This was either a temporary failure that has since been addressed or, if it's still ongoing, an issue with your environment (perhaps you have set OS_CLIENT_CONFIG_FILE in your environment or something?). I'm going to close this as NOTABUG, but please reopen if this is still an issue and you think I've missed something obvious. --- Created a sample `install-config.yaml` and a local `clouds.yaml` file: $ cat install-config.yaml --- apiVersion: v1 baseDomain: foo.example metadata: name: bug-2064693 controlPlane: name: master platform: openstack: type: m1.xlarge replicas: 2 compute: - name: worker platform: openstack: type: m1.large replicas: 1 platform: openstack: cloud: wow clusterOSImage: rhcos-4.11 externalNetwork: external apiFloatingIP: 10.0.101.101 ingressFloatingIP: 10.0.101.102 networking: clusterNetwork: - cidr: 10.128.0.0/14 hostSubnetLength: 9 serviceNetwork: - 172.30.0.0/16 machineNetwork: - cidr: 10.196.0.0/16 pullSecret: '{"auths": {"cloud.openshift.com": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}, "quay.io": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}, "registry.connect.redhat.com": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}, "registry.redhat.io": {"auth": "dXNlcm5hbWU6cGFzc3dvcmQ=", "email": "username"}}}' sshKey: '' $ cat clouds.yaml clouds: foobar: auth: auth_url: http://10.0.108.84/identity password: password project_domain_id: default project_name: demo user_domain_id: default username: demo identity_api_version: '3' region_name: RegionOne volume_api_version: '3' Then ran with latest nightly build: $ ./openshift-install 4.11.0-0.ci-2022-04-20-054911 built from commit b4bdb62f0ebf1825676e525d94edabcb1c0e9b3e release image registry.ci.openshift.org/ocp/release@sha256:e73124845d56f2fcfe367426f60fc5acc7fed174f3e9ee034de1c9178c70c2d4 release architecture amd64 $ ./openshift-install --log-level debug create cluster DEBUG OpenShift Installer 4.11.0-0.ci-2022-04-20-054911 DEBUG Built from commit b4bdb62f0ebf1825676e525d94edabcb1c0e9b3e DEBUG Fetching Metadata... DEBUG Loading Metadata... DEBUG Loading Cluster ID... DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Loading Pull Secret... DEBUG Loading Platform... ^C So it hangs, but that makes sense since the cloud doesn't exist. I then updated to reference to a non-existent cloud: $ sed -i 's/foobar/wow/' install-config.yaml $ ./openshift-install --log-level debug create cluster DEBUG OpenShift Installer 4.11.0-0.ci-2022-04-20-054911 DEBUG Built from commit b4bdb62f0ebf1825676e525d94edabcb1c0e9b3e DEBUG Fetching Metadata... DEBUG Loading Metadata... DEBUG Loading Cluster ID... DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Loading Pull Secret... DEBUG Loading Platform... ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: failed to create a network client: cloud wow does not exist in clouds.yaml So it fails, but that's correct behavior. For what it's worth, this is also the behavior I see on the oldest CI build I can find: $ ./openshift-install version ./openshift-install 4.11.0-0.ci-2022-04-17-145742 built from commit d907e1459681cc4a2f5ce141318d40260bd14500 release image registry.ci.openshift.org/ocp/release@sha256:b4d85306605ba7a2ff82a8d2ee799cd4c932b79a41b001f2ee9b64da976c2584 release architecture amd64 $ ./openshift-install --log-level debug create cluster DEBUG OpenShift Installer 4.11.0-0.ci-2022-04-17-145742 DEBUG Built from commit d907e1459681cc4a2f5ce141318d40260bd14500 DEBUG Fetching Metadata... DEBUG Loading Metadata... DEBUG Loading Cluster ID... DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Loading Pull Secret... DEBUG Loading Platform... ^C Hey Stephen, I tried again with OCP 4.11.0-0.nightly-2022-04-24-135651, and this bug is still valid - ``` 2022-04-25 08:24:55.133 | level=error msg=Error: cloud shiftstack does not exist in clouds.yaml 2022-04-25 08:24:55.135 | level=error 2022-04-25 08:24:55.138 | level=error msg= with provider["openshift/local/openstack"], 2022-04-25 08:24:55.140 | level=error msg= on main.tf line 5, in provider "openstack": 2022-04-25 08:24:55.143 | level=error msg= 5: provider "openstack" { 2022-04-25 08:24:55.145 | level=error 2022-04-25 08:24:55.148 | level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "masters" stage: failed to create cluster: failed to apply Terraform: exit status 1 2022-04-25 08:24:55.150 | level=error 2022-04-25 08:24:55.153 | level=error msg=Error: cloud shiftstack does not exist in clouds.yaml 2022-04-25 08:24:55.155 | level=error 2022-04-25 08:24:55.157 | level=error msg= with provider["openshift/local/openstack"], 2022-04-25 08:24:55.159 | level=error msg= on main.tf line 5, in provider "openstack": 2022-04-25 08:24:55.162 | level=error msg= 5: provider "openstack" { ``` IMHO, it is not an environmental issue, as this failure reproduced with the same CI job and environment we are using for other OCP versions. Did you manage to use the clouds.yaml without using an environment variable? According to the documentation, clouds.yaml in the current directory is supported when not using the OS_CLIENT_CONFIG_FILE environment variable - https://docs.openshift.com/container-platform/4.10/installing/installing_openstack/installing-openstack-installer-custom.html#installation-osp-describing-cloud-parameters_installing-openstack-installer-custom Reopening the BZ. (In reply to Itay Matza from comment #7) > IMHO, it is not an environmental issue, as this failure reproduced with the > same CI job and environment we are using for other OCP versions. I attempted to reproduce this again using 4.11.0-0.nightly-2022-04-24-135651 and the same 'clouds.yaml' and 'install-config.yaml' from comment 6 (with cloudname "foobar"). I still cannot do so: $ ./openshift-install version ./openshift-install 4.11.0-0.nightly-2022-04-24-135651 built from commit 9cf0c5a963bf983ccf997fed46e7bcde81a02569 release image registry.ci.openshift.org/ocp/release@sha256:3cfd57e4c7cff0807b7811a3a885b336955e1f7b4c646b17975307c350830879 release architecture amd64 $ ls clouds.yaml clouds.yaml $ ls /etc/openstack ls: cannot access '/etc/openstack': No such file or directory $ ls ~/.config/openstack ls: cannot access '/home/stephenfin/.config/openstack': No such file or directory $ env | grep OS_ $ ./openshift-install --log-level debug create cluster DEBUG OpenShift Installer 4.11.0-0.nightly-2022-04-24-135651 DEBUG Built from commit 9cf0c5a963bf983ccf997fed46e7bcde81a02569 DEBUG Fetching Metadata... DEBUG Loading Metadata... DEBUG Loading Cluster ID... DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Loading Pull Secret... DEBUG Loading Platform... ... # hangs here I then tried to create 'clouds.yaml' files in both '/etc/openstack' and '~/.config/openstack/clouds.yaml' *without* the cloud that 'install-config.yaml' referenced, in the thinking that perhaps the presence of these files was causing terraform to ignore '$PWD/clouds.yaml'. No luck: $ mkdir ~/.config/openstack -p $ cp clouds.yaml ~/.config/openstack/ $ sed -i 's/foobar/wow/' ~/.config/openstack/clouds.yaml $ sudo mkdir -p /etc/openstack $ sudo cp ./clouds.yaml /etc/openstack/clouds.yaml $ sudo sed -i 's/foobar/wow/' /etc/openstack/clouds.yaml $ cat install-config.yaml | grep -B 2 cloud: platform: openstack: cloud: foobar $ cat clouds.yaml | grep -B 2 foobar clouds: foobar: $ cat ~/.config/openstack/clouds.yaml | grep -B 2 foobar $ cat /etc/openstack/clouds.yaml | grep -B 2 foobar $ ./openshift-install --log-level debug create cluster DEBUG OpenShift Installer 4.11.0-0.nightly-2022-04-24-135651 DEBUG Built from commit 9cf0c5a963bf983ccf997fed46e7bcde81a02569 DEBUG Fetching Metadata... DEBUG Loading Metadata... DEBUG Loading Cluster ID... DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Loading Pull Secret... DEBUG Loading Platform... ... # hangs here Am I missing something obvious? > Did you manage to use the clouds.yaml without using an environment variable? See above. There are no environment variables set. openshift-installer is able to find the local clouds.yaml without issue. > According to the documentation, clouds.yaml in the current directory is > supported when not using the OS_CLIENT_CONFIG_FILE environment variable - > https://docs.openshift.com/container-platform/4.10/installing/ > installing_openstack/installing-openstack-installer-custom.html#installation- > osp-describing-cloud-parameters_installing-openstack-installer-custom And that's exactly the behavior I'm seeing locally. This has to be related to the environment in the CI system. My thinking is that the CI job is configuring 'OS_CLIENT_CONFIG_FILE' or another variable as part of its execution. Can you please inspect the configuration for this CI to see if there are any references to 'OS_CLIENT_CONFIG_FILE'. You might also wish insert a call to e.g. 'env | grep OS_' before the call to 'openshift-installer'. I can't think of anything else that would cause this behavior unless I am misunderstanding your issue. Hi, I cannot verify now with more recent versions, because we applied a workaround to our CI (set OS_CLIENT_CONFIG_FILE), but we (EAP QE) saw this bug with 4.11.0-0.nightly-2022-04-12-072444 [1] exactly as reported here. `clouds.yaml` file in current directory seems to be ignored, the file doesn't exist in `~/.config/openstack/` or `/etc/openstack`. The only `OS_` variable we set is `OS_CLOUD` because we execute some OpenStack commands before and after openshift-installer. [1] https://openshift-release-artifacts.apps.ci.l2s4.p1.openshiftapps.com/4.11.0-0.nightly-2022-04-12-072444/openshift-install-linux-4.11.0-0.nightly-2022-04-12-072444.tar.gz Itay and I hopped on a tmux session to discuss this. This is valid: you simply need to wait until later in the installation process for it to appear (initial validation passes, as noted above). I'll try to investigate what has changed here. I'm guessing terraform itself. Looks like a regression due to how we manage our providers. I bisected this (with 4fc9fa88c, the branch point for 4.11, as a known good commit and master as a known bad commit) and and ended up with the following first bad commit: 09cd3f503baf9a8ce5bbe7843f0fee9976e74ced is the first bad commit commit 09cd3f503baf9a8ce5bbe7843f0fee9976e74ced Author: staebler <staebler> Date: Wed Dec 22 10:25:54 2021 -0500 terraform: unpack providers from binary data Unpack the providers needed for completing a stage from the embedded data in the installer's binary. This replaces the previous method of creating symlinks to the installer binary, where the installer binary masqueraded as each of the terraform providers. data/unpack.go | 8 ++- hack/build.sh | 34 ++++++++++ pkg/terraform/init.go | 89 ++++++++++++++++++++++++++ pkg/terraform/providers/.gitignore | 2 + pkg/terraform/providers/mirror/README | 4 ++ pkg/terraform/providers/providers.go | 89 +++++++++++++++++++++++--- pkg/terraform/terraform.go | 114 ++-------------------------------- 7 files changed, 220 insertions(+), 120 deletions(-) create mode 100644 pkg/terraform/init.go create mode 100644 pkg/terraform/providers/.gitignore create mode 100644 pkg/terraform/providers/mirror/README I will work with the installer team to figure out what the implications of this are and why it's affecting us like this. Discussed this with @Patrick Dillon a few weeks back. Quoting:
> Prior to 4.11 (and those commits you identified) the installer used Terraform as a library,
> so the `$PWD` for Terraform was the same as the `$PWD` for the Installer. In 4.11 (in order to
> upgrade Terraform) we are now embedding the Terraform binary in the installer and extracting it.
> So Terraform (and the providers) (can) have a different `$PWD` than `openshift-install`.
> I say can because the behavior has shifted. Originally we were extracting the terraform binary
> to `/tmp` now we're extracting it to the cluster install dir. This is still a bit of a WIP as
> we work through bugs.
> So ATM I would expect you would be able to reproduce the BZ you mentioned when you do
> `openshift-install create cluster --dir <install_dir>` but that it would work when you do
> `openshift-install create cluster` because in the latter case the terraform binary would be
> in the `$PWD` of the installer.
> If you want to preserve the behavior that the installer loads clouds.yaml from the $PWD of the
> installer, one fix could potentially be to pass the path in to your terraform configs.
I'm looking into passing this configuration to the installer.
Verified with OCP 4.11.0-0.nightly-2022-06-25-081133 on top of RHOS-16.1-RHEL-8-20220329.n.1: - Verified with IPI and IPI-Proxy installation types. - Verified with and without the "OS_CLIENT_CONFIG_FILE" environment variable. Verification steps: A. The OCP cluster installed successfully when using IPI/IPI-Proxy and the clouds.yaml in the current directory - 1. Create the cloud.yaml and install-config.yaml files in the current directory. 2. Execute openshift-install and create the cluster: > $ openshift-install create cluster --log-level debug --dir ostest/ 3. The installer looks at the cloud.yaml, finds the correct cloud name, and the OCP cluster is installed successfully. B. The OCP cluster installed successfully when using the "OS_CLIENT_CONFIG_FILE" environment variable - 1. Create the cloud.yaml and install-config.yaml files in the current directory. 2. Set the OS_CLIENT_CONFIG_FILE environment variable using the full path: > $ export OS_CLIENT_CONFIG_FILE=/home/stack/clouds.yaml 3. Execute openshift-install and create the cluster: > $ openshift-install create cluster --log-level debug --dir ostest/ 4. The installer looks at the cloud.yaml, finds the correct cloud name, and the OCP cluster is installed successfully. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 I've finally found what I was looking for. I appreciate you sharing.https://ducklife.online |